Professional Documents
Culture Documents
Group 6
Group 6
Group 6
Season-8
Group - 6
Through this project, our focus is to build a multiple linear regression model which can predict the total
tackle points that a defender can score based on his previous performances and the factors that affect his
performance.
Table of contents
Problem Statement:
In professional kabaddi league, defenders play a pivotal role in preventing the opposite
team’s raiders from scoring points. Analysing and estimating a defender’s ability to
accumulate tackle points can provide best insights and strategic decision-making for team
management. To tackle this problem (predicting a defender’s ability of picking points) , a
predictive model that accurately forecasts the tackle points, that a defender can likely
achieve in kabaddi matches can be developed.
Objective:
The objective of this project is to predict the total tackle points scored by a defender by
taking the various parameters like total tackles, height, weight, age, super tackles, high
5’s,matches played, auction price, position, average time on mat into consideration.
1. Matches:
Total number of matches played by a player in Pro Kabaddi League season 8, which
can reflect their experience.
2. Total Tackles:
3. Height:
Height of the player, which may influence their playing style and tackling ability.
4. Weight:
Weight of the player, which can affect their performance and physical strength.
5. Age:
6. Super Tackles:
A Super Tackle is awarded to the defending team in Pro Kabaddi League (PKL) when
they successfully tackle the raider while having three or lesser players on the mat.
On such an occasion, the defending team scores two points instead of the usual one.
7. High 5s:
High 5 is achieved when a defender scores five or more tackle points in a single
match.
8. Average time on mat:
This metric calculates the average amount of time that a defender is on the Kabaddi
mat (playing field) in matches he played. The time on mat includes moments when
the defender is actively involved in defensive actions, such as attempting tackles,
defending against raiders, and assisting teammates in defensive maneuvers.
9. Position:
Shows the position of defender on the mat. It can be Right Cover, Right Corner, Left
Cover, Left Corner
Defender Positions
10. Auction Price:
Indicates the price for which the player was bought by that particular team or
retained or the price offered to him for acting as a replacement for the player who
quit the season for some reason
• Dependent Variable:
Indicates the total number of points earned by the defender for each successful
tackle made by him for Pro Kabaddi League Season 8
Methodology:
1. We gathered team wise data about the defenders who played Pro kabaddi League
season 8
2. Data was collected based on following parameters
– Matches played
– Total Tackles
– Height
– Weight
– Age
– Position
– Super tackles
– high 5s
– auction price
3. Data Processing:
• We removed the null or not available data and replaced those values with mean
values
• Dummy variables were created for the categorical variable we have in our data set
which is related to position where the defender is defending from
4. Data Analysis:
• Performed linearity check between each independent variable and the dependent
variable we have in our data set
• For each model we performed model evaluation to see if the model is fit for the
project
Importing Libraries
The Following libraries are imported and used in this project
library(tidyverse)
library(ggplot2)
library(coefplot)
library(car)
recode
some
library(corrplot)
Importing Data
Data was imported through a CSV file
data_q <- read.csv('kabaddi_Group_6.csv')
print(colnames(data_q))
head(data_q)
Linearity Check:
We are plotting the data points as a scatter plot to check the linearity if the predictor
variables in the regression have a straight-line relationship with the outcome variable.
var1 <- lm(Total_Tackle_Points ~ Matches , data = data_q)
plot(data_q$Matches, data_q$Total_Tackle_Points,main = "Total tackle Points
Vs Matches",xlab = "Matches", ylab = "Total Tackle Points")
abline(var1, col = "red")
var2 <- lm(Total_Tackle_Points ~`Total_Tackles` , data = data_q)
plot(data_q$Total_Tackles,data_q$Total_Tackle_Points,main = "Total tackle
Points Vs Total Tackles",xlab = "Total_Tackles", ylab = "Total Tackle
Points")
abline(var2, col = "red")
• P Value Model
Model 1 : Correlation Model
• We are dropping matches, high 5s and avg time on mat because total takles is highly
correlated with dependent variable
Call:
lm(formula = Total_Tackle_Points ~ Total_Tackles + Height.cm. +
Weight.kg. + Super_Tackles + Age + Auction_Price.Lakhs. +
Position, data = data_q)
Residuals:
Min 1Q Median 3Q Max
-10.5436 -3.0054 -0.3907 2.6888 20.9114
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -63.12533 27.58561 -2.288 0.0261 *
Total_Tackles 0.43074 0.02037 21.144 < 2e-16 ***
Height.cm. 0.28609 0.15254 1.876 0.0661 .
Weight.kg. 0.17360 0.16250 1.068 0.2901
Super_Tackles 2.02372 0.37345 5.419 1.43e-06 ***
Age -0.11754 0.16179 -0.726 0.4707
Auction_Price.Lakhs. 0.02148 0.02984 0.720 0.4747
PositionLeft_Cover -4.80335 2.22309 -2.161 0.0352 *
PositionRight_Corner 2.04536 1.94121 1.054 0.2967
PositionRight_Cover 0.92843 1.89199 0.491 0.6256
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Coefficient Plot:
Used for plotting the coefficients of fitted values
coefplot(cor_model)
Residual vs Fitted Values Plot:
This Plot gives us how residuals are distributed. If they are randomly distributed then it’s a
good model.
In output graph below,the data points are randomly distributed without following any
pattern. Which indicates that the model is efficient enough to predict.
ggplot(aes(x=.fitted, y=.resid), data = cor_model) + geom_point() +
geom_hline(yintercept = 0) + geom_smooth(se = FALSE) + labs(x="Fitted
Values", y="Residuals")
As we can observe the plot is a bell shaped curve indicating that the residuals are normally
distributed
Evaluation of Model 1:
A model is to be evaluated to prove that it’s best model for prediction. The evaluation is
based on various parameters.
1. Residuals vs Fitted Values:
This Plot gives us how residuals are distributed against fitted values. If they are
randomly distributed then it’s a good model
#Extract residuals
residuals <- residuals(cor_model)
# Plot residuals vs. fitted values
plot(cor_model$fitted.values, residuals, main = "Residuals vs. Fitted
Values", xlab = "Fitted Values", ylab = "Residuals")
abline(h = 0, col = "red", lty = 4) # Add a horizontal line at y = 0
if we observe the Residuals are randomly distributed without following a fixed pattern so
this is an indication that the model can predict
2. Histogram of Residuals
# Plot a histogram of residuals
hist(residuals, main = "Histogram of Residuals", xlab = "Residuals")
If we observe the plot of the histogram indicates a bell shaped curve. Which indicates that
the residuals are normally distributed
3. Check for the Normality of Residuals Using Normal Q-Q Plot
In Normal Q-Q Plot. Q stands for Quantile. The points on the Q-Q Plot provides an indication
of univariate normality of the data set. If the data is normally distributed the points fall on
the 45 degree referenced line if not, they deviate from the reference line
# Plot a Q-Q plot to check normality of residuals
qqnorm(residuals, main = "Normal Q-Q Plot")
qqline(residuals, col = 4)
In the diagram above, the quantile values of the standard normal distribution are plotted
on the x-axis in the Normal QQ plot, and the corresponding quantile values of the dataset
are plotted on the y-axis. You can see that most of the points fall close to the 45-degree
reference line
4. Check For Heteroskedasticity
Start: AIC=213.03
Total_Tackle_Points ~ Matches + Total_Tackles + Height.cm. +
Weight.kg. + Avg.time_on_mat... + Super_Tackles + Age + High_5s +
Auction_Price.Lakhs. + Position
Step: AIC=211.41
Total_Tackle_Points ~ Matches + Total_Tackles + Height.cm. +
Weight.kg. + Avg.time_on_mat... + Super_Tackles + Age + High_5s +
Position
Step: AIC=209.26
Total_Tackle_Points ~ Matches + Total_Tackles + Height.cm. +
Weight.kg. + Super_Tackles + High_5s + Position
If we observe There are three independent variables dropped which indicates that the P-
Value of those variables are greater than 0.05. The dropped variables are:
1. Auction Price
2. Age
Call:
lm(formula = Total_Tackle_Points ~ Matches + Total_Tackles +
Height.cm. + Weight.kg. + Super_Tackles + High_5s + Position,
data = data_q)
Residuals:
Min 1Q Median 3Q Max
-11.8446 -2.3352 -0.9767 2.1291 16.0434
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -65.20823 24.49777 -2.662 0.01022 *
Matches 0.24771 0.15095 1.641 0.10662
Total_Tackles 0.33539 0.03573 9.388 6.09e-13 ***
Height.cm. 0.25126 0.13827 1.817 0.07474 .
Weight.kg. 0.23570 0.12855 1.834 0.07224 .
Super_Tackles 1.73615 0.35047 4.954 7.55e-06 ***
High_5s 2.72547 0.79268 3.438 0.00113 **
PositionLeft_Cover -2.83246 2.05119 -1.381 0.17300
PositionRight_Corner 2.41419 1.68470 1.433 0.15762
PositionRight_Cover 1.83309 1.72166 1.065 0.29174
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Coefficient Plot
Used for plotting the coefficients of fitted values
coefplot(back_model)
Residual vs Fitted Values Plot:
This Plot gives us how residuals are distributed. If they are randomly distributed then it’s a
good model
ggplot(aes(x=.fitted, y=.resid), data = back_model) + geom_point() +
geom_hline(yintercept = 0) + geom_smooth(se = FALSE) + labs(x="Fitted
Values", y="Residuals")
As we can observe the plot is a bell shaped curve indicating that the residuals are normally
distributed.
Evaluation of Model 2:
A model is to be evaluated to prove that it’s best model for prediction. The evaluation is
based on various parameters.
1. Residuals vs Fitted Values:
This Plot gives us how residuals are distributed against fitted values. If they are
randomly distributed then it’s a good model
#Extract residuals
residuals <- residuals(back_model)
# Plot residuals vs. fitted values
plot(back_model$fitted.values, residuals, main = "Residuals vs. Fitted
Values", xlab = "Fitted Values", ylab = "Residuals")
abline(h = 0, col = "red", lty = 4) # Add a horizontal line at y = 0
if we observe the Residuals are randomly distributed without following a fixed pattern so
this is an indication that the model can predict
2. Histogram of Residuals
# Plot a histogram of residuals
hist(residuals, main = "Histogram of Residuals", xlab = "Residuals")
If we observe the plot of the histogram indicates a bell shaped curve. Which indicates that
the residuals are normally distributed
3. Check for the Normality of Residuals Using Normal Q-Q Plot
In Normal Q-Q Plot. Q stands for Quantile. The points on the Q-Q Plot
provides an indication of univariate normality of the data set. If the data
is normally distributed the points fall on the 45 degree referenced line if
not, they deviate from the reference line
In the diagram above, the quantile values of the standard normal distribution are plotted
on the x-axis in the Normal QQ plot, and the corresponding quantile values of the dataset
are plotted on the y-axis. You can see that most of the points fall close to the 45-degree
reference line
4. Check For Heteroskedasticity
Residual vs leverage:
A residuals vs. leverage plot is a type of diagnostic plot that allows us to identify influential
observations in a regression model.
Model 3: P Value Model
• This model is developed in order to check the significance of P- value
#---- Model 4: Checking for the significance of p-
value----------------------------------------------
pval_model <- lm(Total_Tackle_Points ~ `Total_Tackles` + `Super_Tackles` +
`High_5s`, data = data_q)
summary(pval_model)
Call:
lm(formula = Total_Tackle_Points ~ Total_Tackles + Super_Tackles +
High_5s, data = data_q)
Residuals:
Min 1Q Median 3Q Max
-11.2347 -2.2619 -0.3919 2.2107 22.5962
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.91587 1.31554 -0.696 0.488996
Total_Tackles 0.35613 0.02659 13.395 < 2e-16 ***
Super_Tackles 1.77184 0.34667 5.111 3.52e-06 ***
High_5s 2.82733 0.75436 3.748 0.000403 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
If we observe the above summary, there are 7 independent variables with p value greater
than 0.05 and hence they are dropped. The dropped variables are:
• Average time on mat
• Position
• Height
• Weight
• Age
• Auction Price
• Matches
Average Variable Plots:
These Plots are for checking the partial correlation in multi linear regression and it is
plotted between the independent variable and dependent variable
avPlots(pval_model)
Coefficient Plot
Used for plotting the coefficients of fitted values
coefplot(pval_model)
Residual vs Fitted Values Plot:
This Plot gives us how residuals are distributed. If they are randomly distributed then it’s a
good model
ggplot(aes(x=.fitted, y=.resid), data = pval_model) + geom_point() +
geom_hline(yintercept = 0) + geom_smooth(se = FALSE) + labs(x="Fitted
Values", y="Residuals")
Here, in our graph the data points are randomly distributed without following any pattern.
Which indicates that the model is efficient enough to predict
Standardized Residual Plot:
we plot density plots to visually check whether or not the residuals are normally
distributed. If the plot is roughly bell-shaped, then the residuals likely follow a normal
distribution and as compared with the output the density plot roughly follows a bell shape,
which ensures that the residuals are more normally distributed.
# Plot standardized residuals
plot(density(resid_std), main="Standardized Residuals
Plot",xlab="Standardized Residuals", ylab="Density")
abline(h = 0, v = 0) # Add reference lines
abline(v = c(-2, 2), col = "red")
abline(h = 0, col = "blue")
As we can observe the plot is a bell shaped curve indicating that the residuals are normally
distributed
Evaluation of Model 3:
A model is to be evaluated to prove that it’s best model for prediction. The evaluation is
based on various parameters.
1. Residuals vs Fitted Values:
This Plot gives us how residuals are distributed against fitted values. If they are
randomly distributed then it’s a good model
#Extract residuals
residuals <- residuals(pval_model) # Plot residuals vs. fitted values
plot(pval_model$fitted.values, residuals, main = "Residuals vs. Fitted
Values", xlab = "Fitted Values", ylab = "Residuals")
abline(h = 0, col = "red", lty = 4) # Add a horizontal line at y = 0
if we observe the Residuals are randomly distributed without following a fixed pattern so
this is an indication that the model can predict
2. Histogram of Residuals
# Plot a histogram of residuals
hist(residuals, main = "Histogram of Residuals", xlab = "Residuals")
If we observe the plot of the histogram indicates a bell shaped curve. Which indicates that
the residuals are normally distributed
3. Check for the Normality of Residuals Using Normal Q-Q Plot
In Normal Q-Q Plot. Q stands for Quantile. The points on the Q-Q Plot provides an
indication of univariate normality of the data set. If the data is normally distributed
the points fall on the 45 degree referenced line if not, they deviate from the
reference line
# Plot a Q-Q plot to check normality of residuals
qqnorm(residuals, main = "Normal Q-Q Plot")
qqline(residuals, col = 4)
In the diagram above, the quantile values of the standard normal distribution are plotted
on the x-axis in the Normal QQ plot, and the corresponding quantile values of the dataset
are plotted on the y-axis. You can see that most of the points fall close to the 45-degree
reference line
4. Check For Heteroskedasticity
• the maximum likelihood estimate of the model (how well the model reproduces the
data).
• The best-fit model according to AIC is the one that explains the greatest amount of
variation using the fewest possible independent variables.
AIC(cor_model,back_model,pval_model)
df AIC
cor_model 11 404.9449
back_model 11 392.8809
pval_model 5 400.2233
We conclude that Backward Elimination model is the best model as per AIC
Tech Resources:
• R Language and R studio
References:
1. https://www.prokabaddi.com/
2. https://www.kabaddiadda.com/
3. https://www.sportzcraazy.com/
4. https://kabaddian.com/
5. https://www.wikiwiki.in/
6. https://www.sportskeeda.com/
7. https://wikisportsbio.com/kabaddi/
8. https://www.news18.com/
9. https://prokabaddiarena.com/
10. https://www.kabaddiadda.com/tournament/87-pro-kabaddi-league-season-8/
auction-summary