Professional Documents
Culture Documents
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
Vairachilai
Department of CSE
CVR College Of Engineering
Mangalpalli
Telangana
Machine Learning
Machine learning (ML) is a branch of artificial intelligence (AI) that enables computers to self-
learn and improve over time from the data given, without being explicitly programmed .
Machine learning algorithms are able to detect and learn from patterns in data and make
16-04-2022 S.Vairachilai 2
Machine Learning
Tom Mitchell provides a more modern definition: “A computer program is said to learn from
experience E with respect to some class of tasks T and performance measure P, if its
16-04-2022 S.Vairachilai 3
Machine Learning Algorithm
Machine
Learning Algorithm
Regression Classification
Regression Analysis
(predictor/explanatory/input/cause) variables
(response/outcome/output/effect) variable
Regression Analysis
Simple Linear Regression (SLR)
σ (𝒙 − 𝒙
ഥ) (𝒚 − 𝒚
ഥ)
𝐬𝐥𝐨𝐩𝐞 (𝜷𝟏 ) =
ෝ = 𝜷𝟎 + (𝜷𝟏 ∗ 𝐱)
𝒚 σ(𝒙 − 𝒙ഥ)𝟐
1 14
•𝑥𝑛 is independent (predictor/explanatory/input/cause) variables
3 24
•𝛽0 & 𝛽1 are parameters of the regression model (Regression Coefficients)
2 18
•𝛽0 intercept parameter
1 17
1 17 𝑥ҧ = 2
ෝ = 𝜷𝟎 + (𝜷𝟏 ∗ 𝐱)
𝒚
3 27
ഥ) (𝒚 − 𝒚
(𝒙 − 𝒙 ഥ) = 𝟏 − 𝟐 ∗ 𝟏𝟒 − 𝟐𝟎 + 𝟑 − 𝟐 ∗ 𝟐𝟒 − 𝟐𝟎 + 𝟐 − 𝟐 ∗ 𝟏𝟖 − 𝟐𝟎 + 𝟏 − 𝟐 ∗ 𝟏𝟕 − 𝟐𝟎 + 𝟑 − 𝟐 ∗ (𝟐𝟕 − 𝟐𝟎)
= 6 + 4 +0 +3 +7 =20
ഥ)𝟐 =((𝟏 − 𝟐)𝟐 +(𝟑 − 𝟐)𝟐 +(𝟐 − 𝟐)𝟐 +(𝟏 − 𝟐)𝟐 +(𝟑 − 𝟐)𝟐 ) =( 1+1+0+1+1)=4
σ(𝒙 − 𝒙
ෝ = 10 + (5 * x)
𝒚
𝟐𝟎
𝐬𝐥𝐨𝐩𝐞 (𝜷𝟏 ) = =5 Estimated Regression Equation
𝟒
• We have a large variety of vehicles tailored to suit all budgets and needs, priding
ourselves on our knowledge and innovation within the motor trade industry.
Number of TV Ads (x) Number of Cars Sold (y)
1 14
3 24
2 18
1 17
3 27
Multiple Linear Regression(MLR) Analysis
Simple Linear Regression (SLR)
ෝ = 𝜷𝟎 + (𝜷𝟏 ∗ 𝐱)
𝒚
σ (𝒙 − 𝒙
ഥ) (𝒚 − 𝒚
ഥ)
𝐬𝐥𝐨𝐩𝐞 (𝜷𝟏 ) = Intercept(𝛃𝟎 ) = 𝐲ത − (𝐬𝐥𝐨𝐩𝐞 ∗ 𝐱ത)
σ(𝒙 − 𝒙ഥ )𝟐
Row Vector: A matrix having only one row is called a row vector
Column Vector: A matrix having only one column is called a column vector
𝟓
Eg: 2 * 1 matrix 𝑨 = 𝟒
is a column vector because it has only one column
Scalars: A matrix having only one row and one column is called a scalar.
Eg: 1 * 1 matrix 𝑨 = [ 𝟑] is a scalar. In other words, a scalar is a single number.
Transpose Matrix :
Linear Regression Analysis in Matrix form
ෝ = 𝜷𝟎 + (𝜷𝟏 ∗ 𝒙)
𝒚 ෝ Dependent Variable
𝒚 1 𝒙𝟏
ෝ𝟏
𝒚
x Independent Variable ෝ𝟐
𝒚 1 𝒙𝟐
ෝ𝟏 = 𝜷𝟎 + (𝜷𝟏 ∗ 𝒙𝟏 )
𝒚 𝜷𝟎 Intercept Parameter ෝ𝟑
𝒚 1 𝒙𝟑 𝛽0
ෝ𝟐 = 𝜷𝟎 + (𝜷𝟏 ∗ 𝒙𝟐 )
𝒚 = *
𝜷𝟏 Slope Parameter . . . 𝛽1
ෝ𝟑 = 𝜷𝟎 + (𝜷𝟏 ∗ 𝒙𝟑 )
𝒚 . . .
. . . .
. 𝑦ො n × 1 Column Vector 1 𝒙𝒏
ෝ𝒏
𝒚
. 𝒙𝒏 n × 2 Matrix
ෝ𝒏 = 𝜷𝟎 + (𝜷𝟏 ∗ 𝒙𝒏 )
𝒚 β 2 × 1 Column Vector
𝑌 = 𝑿β
ෝ = 𝜷𝟎 + (𝜷𝟏 ∗ 𝐱)
𝒚
SLR Model in Matrix Form
ෝ = 𝜷𝟎 + (𝜷𝟏 ∗ 𝐱)
𝒚
𝒚 𝒏 𝒙 𝛽0
=
𝛽1
𝑦ො n × 1 Column Vector 𝒙 𝒚 𝒙 𝒙𝟐
𝒙𝒏 n × 2 Matrix
β 2 × 1 Column Vector
𝟏𝟎𝟎 𝟓 𝟏𝟎
𝟐𝟐𝟎 =
𝟏𝟎 𝟐𝟒
Number of TV Number of Cars
Ads (x) Sold (y)
5𝜷𝟎 +10𝜷𝟏 =100 Equation (1)
1 14
1𝟎𝜷𝟎 +2𝟒𝜷𝟏 =220 Equation (2)
3 24
Solve the linear equation (1) & (2)
2 18
𝐈𝐧𝐭𝐞𝐫𝐜𝐞𝐩𝐭 (𝛃𝟎 ) = 10 Slope( 𝛃𝟏 ) = 𝟓
1 17
ෝ = 10 + 5 * x
𝒚
3 27
Estimated Regression Equation
MLR Model in Matrix Form
ෝ n × 1 Column Vector
𝒚
X n × (k+1) Matrix
𝒚
𝐲ො = 𝛃𝟎 + (𝛃𝟏 ∗ 𝐱 𝟏 ) + (𝛃𝟐 ∗ 𝐱 𝟐 ) + (𝛃𝟑 ∗ 𝐱 𝟑 )
𝒏 𝒙 𝛽0
=
𝒙 𝒚
𝒙 𝒙𝟐 𝛽1
𝒚 𝑛 𝒙𝟏 𝒙𝟐 𝒙𝟑
𝛽0
𝒙 𝟏 𝒙𝟏𝟐 𝒙𝟏 ∗ 𝒙𝟐 𝒙𝟏 ∗ 𝒙𝟑
𝒙𝟏 ∗ 𝒚 𝛽1
𝐲ො = 𝛃𝟎 + (𝛃𝟏 ∗ 𝐱𝟏 ) + (𝛃𝟐 ∗ 𝐱𝟐 ) = 𝒙𝟐 𝒙𝟐 ∗ 𝒙𝟏 𝒙𝟐𝟐 𝒙𝟐 ∗ 𝒙𝟑
𝒙𝟐 ∗ 𝒚 𝛽2
𝒙𝟑 𝒙𝟑 ∗ 𝒙𝟏 𝒙𝟑 ∗ 𝒙𝟐 𝒙𝟑𝟐
𝒚 𝒙𝟑 ∗ 𝒚 𝛽𝑘
𝒏 𝒙𝟏 𝒙𝟐 𝛽0
𝒙𝟏 ∗ 𝒚 = 𝒙𝟏𝟐
𝒙𝟏 𝒙𝟏 ∗ 𝒙𝟐 𝛽1
𝒙𝟐 ∗𝒚 𝒙𝟐𝟐
𝑌 =𝑿 ∗β
𝒙𝟐 𝒙 𝟐 ∗ 𝒙𝟏 𝛽2 𝑿𝑻 ∗ 𝑌 = 𝑿𝑻 ∗ 𝑿β
Example of Multiple Linear Regression
Shows three performance measures for 5 students.
IQ ( 𝒙𝟏 ) Study Hours(𝒙𝟐 ) Test Score( y)
Independent Variables 110 40 100
• IQ 120 30 90
• Study Hours
100 20 80
Dependent Variable 90 0 70
• Test Score
80 10 60
Multiple Linear Regression
ෝ = 𝜷𝟎 + (𝜷𝟏 ∗ 𝒙𝟏 ) +(𝜷𝟐 ∗ 𝒙𝟐 )
𝒚 𝒚 𝒏 𝒙𝟏 𝒙𝟐
𝛽0
𝒙𝟏 ∗ 𝒚 = 𝒙𝟏 𝒙𝟏𝟐 𝒙𝟏 ∗ 𝒙𝟐 𝛽1
𝑌 =𝑿 ∗β
𝑿𝑻 ∗ 𝑌 = 𝑿𝑻 ∗ 𝑿β 𝛽2
𝒙 𝟐 ∗ 𝒚 𝒙 𝟐 𝒙 𝟐 ∗ 𝒙𝟏 𝒙𝟐𝟐
Example of Multiple Linear Regression
IQ ( 𝒙𝟏 ) Study Hours(𝒙𝟐 ) Test Score( y)
110 40 100
120 30 90
100 20 80
Root Mean Square Error (RMSE) Root Mean Square Error (RMSE)
Adjusted R Squared
• Tells how important is a particular feature to the model.
• Adjusted 𝑅 2 impose the penalty for adding additional independent variable to a model.
• Adjusted 𝑅 2 increases only if the new independent variable improves the model.
• Allows for the number of regressors, and may either increase or decrease.
16-04-2022 21
Correlation Coefficient R
• Correlation is a statistical measure that indicates the extent to which two or more variables move together
• Redundancies can be detected by correlation analysis. Values ranges between -1 (perfect negative correlation)
to +1 (perfect positive correlation).
Corrélation coefficient for a population Corrélation coefficient for a population
𝑪𝒐𝒗(𝒙, 𝒚) 𝑪𝒐𝒗(𝒙, 𝒚) 𝑪𝒐𝒗(𝒙, 𝒚)
𝑹= ρ𝒙𝒚 = 𝒓𝒙𝒚 =
𝒗𝒂𝒓(𝒙) 𝒗𝒂𝒓(𝒚) 𝝈 𝒙𝝈 𝒚 𝒔𝒙 𝒔𝒚
• Cohen (1992) proposed these guidelines for the interpretation of a correlation coefficient:
16-04-2022 22
Correlation Coefficient R
Positive Correlation: A positive correlation indicates that the variables increase or decrease together. (both
variables move in the same direction)
Negative Correlation: A negative correlation indicates that if one variable increases, the other decreases, and
vice versa. (both variables move in the opposite direction)
𝑺𝑺𝑬
𝑹𝟐 = 𝟏 - 𝐒𝐒𝐓 = 𝐒𝐒𝐑 + 𝐒𝐒𝐄 𝑪𝒐𝒗(𝒙, 𝒚)
𝑺𝑺𝑻 𝑹=
𝒗𝒂𝒓(𝒙) 𝒗𝒂𝒓(𝒚)
SSE=σ𝒏𝒊=𝟏 𝒚𝒊 − 𝒚
ෝ 𝟐 SSE = Σ(actual-predicted)²
SST= Σ(actual-mean)²
SST=σ𝒏𝐢=𝟏 𝒚𝒊 − 𝒚 𝟐
16-04-2022 26
ANOVA (Analysis of Variance)
• F-tests to test the overall significance for a regression model
Total DFT=n-1 𝒏
𝒚𝒊 − 𝒚 𝟐
𝒊=𝟏
If the p value is higher than the significance level, the results are not statistically significant.
If the p value is lower than the significance level, the results are statistically significant.
Independent variables has explanatory power
16-04-2022 27
Adjusted 𝑹𝟐
Tells how important is a particular feature to the model.
𝟐
𝟐
𝟏 − 𝑹 ∗ (𝒏 − 𝟏)
𝐀𝐝𝐣𝐮𝐬𝐭𝐞𝐝 𝑹 = 𝟏 −
(𝒏 − 𝒑 − 𝟏)
• n is the number of data sample.
𝟐
MSR
𝐀𝐝𝐣𝐮𝐬𝐭𝐞𝐝 𝐑 = 𝟏 − SSR= σ𝒏𝒊=𝟏 𝒚ෝ𝒊 − 𝒚 𝟐
𝐌𝐒𝐄
SSE=σ𝒏𝒊=𝟏 𝒚𝒊 − 𝒚
ෝ 𝟐
DFR =p
MSR=SSR/DFR
DFE =n-p
DFT=n-1 MSE=SSE/DFE SST=σ𝒏𝐢=𝟏 𝒚𝒊 − 𝒚 𝟐
16-04-2022 28
Standard Error of Estimate(SEE)
• Smaller SSE value indicates that the observations are closer to the fitted line
• High SSE value indicates that the observations are far away the fitted line
DFR =p
MSE=SSE/DFE DFE =n-p
DFT=n-1
16-04-2022 29
Regression Model Assumptions and Diagnostics
• Linear Relationship
• Hetroscedacity
Breusch-Pagan Test
• Auto Correlation
• Multicollinearity
• Scatter plots is used to detect non-linearity, unequal error variances, and outliers.
• Predicted values are taken on the y-axis, and residuals are then plotted on the x-axis.
• Good residuals & fitted plot should be relatively shapeless without clear patterns in the
data, no obvious outliers, and be generally symmetrically distributed around the 0 line
Residuals equally spread around a Residual plots are used to look for
horizontal line without distinct patterns, underlying patterns model has a problem.
model has no problem.
16-04-2022 32
Correlation Coefficient R /Multiple R
• Values ranges between -1 (perfect negative correlation) to +1 (perfect positive correlation).
𝑪𝒐𝒗(𝒙, 𝒚)
𝑹=
𝒗𝒂𝒓(𝒙) 𝒗𝒂𝒓(𝒚)
• Cohen (1992) proposed these guidelines for the interpretation of a correlation coefficient:
16-04-2022 33