Professional Documents
Culture Documents
VRN 3 5 16
VRN 3 5 16
VRN 3 5 16
V.Ravindranath
Outline
1 Regression
Introduction
2 Introduction to R-Software and R-Studio
Windows in R Studio
Commands in R
Plotting by using R
House Prices Problem without Explanatory Variables
3 Simple Linear Regression
Least Squares Method
Correlation
Testing of the Correlation Coefficient
Linear Regression and ANOVA
Outline
1 Regression
Introduction
2 Introduction to R-Software and R-Studio
Windows in R Studio
Commands in R
Plotting by using R
House Prices Problem without Explanatory Variables
3 Simple Linear Regression
Least Squares Method
Correlation
Testing of the Correlation Coefficient
Linear Regression and ANOVA
Applications of Regression
Life Sciences
1 Predicting corbon emissions in the environment.
2 Pesticide concentration in the soil.
3 Body temperature vs heart rate.
4 Blood glucose levels with the drug dosage level.
Applications of Regression
Life Sciences
1 Predicting corbon emissions in the environment.
2 Pesticide concentration in the soil.
3 Body temperature vs heart rate.
4 Blood glucose levels with the drug dosage level.
Applications of Regression
Life Sciences
1 Predicting corbon emissions in the environment.
2 Pesticide concentration in the soil.
3 Body temperature vs heart rate.
4 Blood glucose levels with the drug dosage level.
Applications of Regression
Life Sciences
1 Predicting corbon emissions in the environment.
2 Pesticide concentration in the soil.
3 Body temperature vs heart rate.
4 Blood glucose levels with the drug dosage level.
Applications of Regression
Life Sciences
1 Predicting corbon emissions in the environment.
2 Pesticide concentration in the soil.
3 Body temperature vs heart rate.
4 Blood glucose levels with the drug dosage level.
Applications of Regression
Life Sciences
1 Predicting corbon emissions in the environment.
2 Pesticide concentration in the soil.
3 Body temperature vs heart rate.
4 Blood glucose levels with the drug dosage level.
Applications of Regression
Life Sciences
1 Predicting corbon emissions in the environment.
2 Pesticide concentration in the soil.
3 Body temperature vs heart rate.
4 Blood glucose levels with the drug dosage level.
Applications of Regression
Life Sciences
1 Predicting corbon emissions in the environment.
2 Pesticide concentration in the soil.
3 Body temperature vs heart rate.
4 Blood glucose levels with the drug dosage level.
Applications of Regression
Life Sciences
1 Predicting corbon emissions in the environment.
2 Pesticide concentration in the soil.
3 Body temperature vs heart rate.
4 Blood glucose levels with the drug dosage level.
Applications of Regression
Life Sciences
1 Predicting corbon emissions in the environment.
2 Pesticide concentration in the soil.
3 Body temperature vs heart rate.
4 Blood glucose levels with the drug dosage level.
Applications of Regression
Life Sciences
1 Predicting corbon emissions in the environment.
2 Pesticide concentration in the soil.
3 Body temperature vs heart rate.
4 Blood glucose levels with the drug dosage level.
Applications of Regression
Life Sciences
1 Predicting corbon emissions in the environment.
2 Pesticide concentration in the soil.
3 Body temperature vs heart rate.
4 Blood glucose levels with the drug dosage level.
Outline
1 Regression
Introduction
2 Introduction to R-Software and R-Studio
Windows in R Studio
Commands in R
Plotting by using R
House Prices Problem without Explanatory Variables
3 Simple Linear Regression
Least Squares Method
Correlation
Testing of the Correlation Coefficient
Linear Regression and ANOVA
Introduction to R-Studio
https://cran.r-project.org/, https://www.rstudio.com
Introduction to R-Studio
Top Window
Introduction to R-Studio
Source Window
Introduction to R-Studio
Console Window
Introduction to R-Studio
History Window
Introduction to R-Studio
Plots Window
Outline
1 Regression
Introduction
2 Introduction to R-Software and R-Studio
Windows in R Studio
Commands in R
Plotting by using R
House Prices Problem without Explanatory Variables
3 Simple Linear Regression
Least Squares Method
Correlation
Testing of the Correlation Coefficient
Linear Regression and ANOVA
Outline
1 Regression
Introduction
2 Introduction to R-Software and R-Studio
Windows in R Studio
Commands in R
Plotting by using R
House Prices Problem without Explanatory Variables
3 Simple Linear Regression
Least Squares Method
Correlation
Testing of the Correlation Coefficient
Linear Regression and ANOVA
Outline
1 Regression
Introduction
2 Introduction to R-Software and R-Studio
Windows in R Studio
Commands in R
Plotting by using R
House Prices Problem without Explanatory Variables
3 Simple Linear Regression
Least Squares Method
Correlation
Testing of the Correlation Coefficient
Linear Regression and ANOVA
Outline
1 Regression
Introduction
2 Introduction to R-Software and R-Studio
Windows in R Studio
Commands in R
Plotting by using R
House Prices Problem without Explanatory Variables
3 Simple Linear Regression
Least Squares Method
Correlation
Testing of the Correlation Coefficient
Linear Regression and ANOVA
Procedure
1 Assume Y = β0 + β1 x
2 Compute
P the Normal
P Equations
P Y = β0 N P
+ β1 xP
x Y = β0 x + β1 x2
3 Solve the equations to get β̂0 and β̂1
Procedure
1 Assume Y = β0 + β1 x
2 Compute
P the Normal
P Equations
P Y = β0 N P
+ β1 xP
x Y = β0 x + β1 x2
3 Solve the equations to get β̂0 and β̂1
Procedure
1 Assume Y = β0 + β1 x
2 Compute
P the Normal
P Equations
P Y = β0 N P
+ β1 xP
x Y = β0 x + β1 x2
3 Solve the equations to get β̂0 and β̂1
Procedure
1 Assume Y = β0 + β1 x
2 Compute the Normal
P P Equations
P Y = β0 N P
+ β1 xP
x Y = β0 x + β1 x2
3 Solve the equations to get β̂0 and β̂1
Procedure
1 Assume Y = β0 + β1 x
2 Compute the Normal
P P Equations
P Y = β0 N P
+ β1 xP
x Y = β0 x + β1 x2
3 Solve the equations to get β̂0 and β̂1
Procedure
1 Assume Y = β0 + β1 x
2 Compute the Normal
P P Equations
P Y = β0 N P
+ β1 xP
x Y = β0 x + β1 x2
3 Solve the equations to get β̂0 and β̂1
Procedure
1 Assume Y = β0 + β1 x
2 Compute the Normal
P P Equations
P Y = β0 N P
+ β1 xP
x Y = β0 x + β1 x2
3 Solve the equations to get β̂0 and β̂1
Procedure
1 Assume Y = β0 + β1 x
2 Compute the Normal
P P Equations
P Y = β0 N P
+ β1 xP
x Y = β0 x + β1 x2
3 Solve the equations to get β̂0 and β̂1
Procedure
1 Assume Y = β0 + β1 x
2 Compute the Normal
P P Equations
P Y = β0 N P
+ β1 xP
x Y = β0 x + β1 x2
3 Solve the equations to get β̂0 and β̂1
Procedure
1 Assume Y = β0 + β1 x
2 Compute the Normal
P P Equations
P Y = β0 N P
+ β1 xP
x Y = β0 x + β1 x2
3 Solve the equations to get β̂0 and β̂1
Procedure
1 Assume Y = β0 + β1 x
2 Compute the Normal
P P Equations
P Y = β0 N P
+ β1 xP
x Y = β0 x + β1 x2
3 Solve the equations to get β̂0 and β̂1
Procedure
1 Assume Y = β0 + β1 x
2 Compute the Normal
P P Equations
P Y = β0 N P
+ β1 xP
x Y = β0 x + β1 x2
3 Solve the equations to get β̂0 and β̂1
Model Fitness
SST-> Total SS in obs Y = (y − y )2 = 16382.18
P
1
1552.88
4 SST = SSR + SSE
5 1 = SSR/SST + SSE/SST = 0.9052+0.0948
6 Multiple R 2 =0.9052 or 90.5%
Interpretation of R
Adding one variable reduced the residual error from 100% to
9.5%. Adding good variables improves model efficiency.
VRNath, JNTU KAKINADA RM-STATS-2016
Least Squares Method
Regression
Correlation
Introduction to R-Software and R-Studio
Testing of the Correlation Coefficient
Simple Linear Regression
Linear Regression and ANOVA
Model Fitness
SST-> Total SS in obs Y = (y − y )2 = 16382.18
P
1
1552.88
4 SST = SSR + SSE
5 1 = SSR/SST + SSE/SST = 0.9052+0.0948
6 Multiple R 2 =0.9052 or 90.5%
Interpretation of R
Adding one variable reduced the residual error from 100% to
9.5%. Adding good variables improves model efficiency.
VRNath, JNTU KAKINADA RM-STATS-2016
Least Squares Method
Regression
Correlation
Introduction to R-Software and R-Studio
Testing of the Correlation Coefficient
Simple Linear Regression
Linear Regression and ANOVA
Model Fitness
SST-> Total SS in obs Y = (y − y )2 = 16382.18
P
1
1552.88
4 SST = SSR + SSE
5 1 = SSR/SST + SSE/SST = 0.9052+0.0948
6 Multiple R 2 =0.9052 or 90.5%
Interpretation of R
Adding one variable reduced the residual error from 100% to
9.5%. Adding good variables improves model efficiency.
VRNath, JNTU KAKINADA RM-STATS-2016
Least Squares Method
Regression
Correlation
Introduction to R-Software and R-Studio
Testing of the Correlation Coefficient
Simple Linear Regression
Linear Regression and ANOVA
Model Fitness
SST-> Total SS in obs Y = (y − y )2 = 16382.18
P
1
1552.88
4 SST = SSR + SSE
5 1 = SSR/SST + SSE/SST = 0.9052+0.0948
6 Multiple R 2 =0.9052 or 90.5%
Interpretation of R
Adding one variable reduced the residual error from 100% to
9.5%. Adding good variables improves model efficiency.
VRNath, JNTU KAKINADA RM-STATS-2016
Least Squares Method
Regression
Correlation
Introduction to R-Software and R-Studio
Testing of the Correlation Coefficient
Simple Linear Regression
Linear Regression and ANOVA
Model Fitness
SST-> Total SS in obs Y = (y − y )2 = 16382.18
P
1
1552.88
4 SST = SSR + SSE
5 1 = SSR/SST + SSE/SST = 0.9052+0.0948
6 Multiple R 2 =0.9052 or 90.5%
Interpretation of R
Adding one variable reduced the residual error from 100% to
9.5%. Adding good variables improves model efficiency.
VRNath, JNTU KAKINADA RM-STATS-2016
Least Squares Method
Regression
Correlation
Introduction to R-Software and R-Studio
Testing of the Correlation Coefficient
Simple Linear Regression
Linear Regression and ANOVA
Model Fitness
SST-> Total SS in obs Y = (y − y )2 = 16382.18
P
1
1552.88
4 SST = SSR + SSE
5 1 = SSR/SST + SSE/SST = 0.9052+0.0948
6 Multiple R 2 =0.9052 or 90.5%
Interpretation of R
Adding one variable reduced the residual error from 100% to
9.5%. Adding good variables improves model efficiency.
VRNath, JNTU KAKINADA RM-STATS-2016
Least Squares Method
Regression
Correlation
Introduction to R-Software and R-Studio
Testing of the Correlation Coefficient
Simple Linear Regression
Linear Regression and ANOVA
Model Fitness
SST-> Total SS in obs Y = (y − y )2 = 16382.18
P
1
1552.88
4 SST = SSR + SSE
5 1 = SSR/SST + SSE/SST = 0.9052+0.0948
6 Multiple R 2 =0.9052 or 90.5%
Interpretation of R
Adding one variable reduced the residual error from 100% to
9.5%. Adding good variables improves model efficiency.
VRNath, JNTU KAKINADA RM-STATS-2016
Least Squares Method
Regression
Correlation
Introduction to R-Software and R-Studio
Testing of the Correlation Coefficient
Simple Linear Regression
Linear Regression and ANOVA
Model Fitness
SST-> Total SS in obs Y = (y − y )2 = 16382.18
P
1
1552.88
4 SST = SSR + SSE
5 1 = SSR/SST + SSE/SST = 0.9052+0.0948
6 Multiple R 2 =0.9052 or 90.5%
Interpretation of R
Adding one variable reduced the residual error from 100% to
9.5%. Adding good variables improves model efficiency.
VRNath, JNTU KAKINADA RM-STATS-2016
Least Squares Method
Regression
Correlation
Introduction to R-Software and R-Studio
Testing of the Correlation Coefficient
Simple Linear Regression
Linear Regression and ANOVA
Model Fitness
SST-> Total SS in obs Y = (y − y )2 = 16382.18
P
1
1552.88
4 SST = SSR + SSE
5 1 = SSR/SST + SSE/SST = 0.9052+0.0948
6 Multiple R 2 =0.9052 or 90.5%
Interpretation of R
Adding one variable reduced the residual error from 100% to
9.5%. Adding good variables improves model efficiency.
VRNath, JNTU KAKINADA RM-STATS-2016
Least Squares Method
Regression
Correlation
Introduction to R-Software and R-Studio
Testing of the Correlation Coefficient
Simple Linear Regression
Linear Regression and ANOVA
Model Fitness
SST-> Total SS in obs Y = (y − y )2 = 16382.18
P
1
1552.88
4 SST = SSR + SSE
5 1 = SSR/SST + SSE/SST = 0.9052+0.0948
6 Multiple R 2 =0.9052 or 90.5%
Interpretation of R
Adding one variable reduced the residual error from 100% to
9.5%. Adding good variables improves model efficiency.
VRNath, JNTU KAKINADA RM-STATS-2016
Least Squares Method
Regression
Correlation
Introduction to R-Software and R-Studio
Testing of the Correlation Coefficient
Simple Linear Regression
Linear Regression and ANOVA
Model Fitness
SST-> Total SS in obs Y = (y − y )2 = 16382.18
P
1
1552.88
4 SST = SSR + SSE
5 1 = SSR/SST + SSE/SST = 0.9052+0.0948
6 Multiple R 2 =0.9052 or 90.5%
Interpretation of R
Adding one variable reduced the residual error from 100% to
9.5%. Adding good variables improves model efficiency.
VRNath, JNTU KAKINADA RM-STATS-2016
Least Squares Method
Regression
Correlation
Introduction to R-Software and R-Studio
Testing of the Correlation Coefficient
Simple Linear Regression
Linear Regression and ANOVA
Model Fitness
SST-> Total SS in obs Y = (y − y )2 = 16382.18
P
1
1552.88
4 SST = SSR + SSE
5 1 = SSR/SST + SSE/SST = 0.9052+0.0948
6 Multiple R 2 =0.9052 or 90.5%
Interpretation of R
Adding one variable reduced the residual error from 100% to
9.5%. Adding good variables improves model efficiency.
VRNath, JNTU KAKINADA RM-STATS-2016
Least Squares Method
Regression
Correlation
Introduction to R-Software and R-Studio
Testing of the Correlation Coefficient
Simple Linear Regression
Linear Regression and ANOVA
Model Fitness
SST-> Total SS in obs Y = (y − y )2 = 16382.18
P
1
1552.88
4 SST = SSR + SSE
5 1 = SSR/SST + SSE/SST = 0.9052+0.0948
6 Multiple R 2 =0.9052 or 90.5%
Interpretation of R
Adding one variable reduced the residual error from 100% to
9.5%. Adding good variables improves model efficiency.
VRNath, JNTU KAKINADA RM-STATS-2016
Least Squares Method
Regression
Correlation
Introduction to R-Software and R-Studio
Testing of the Correlation Coefficient
Simple Linear Regression
Linear Regression and ANOVA
Model Fitness
SST-> Total SS in obs Y = (y − y )2 = 16382.18
P
1
1552.88
4 SST = SSR + SSE
5 1 = SSR/SST + SSE/SST = 0.9052+0.0948
6 Multiple R 2 =0.9052 or 90.5%
Interpretation of R
Adding one variable reduced the residual error from 100% to
9.5%. Adding good variables improves model efficiency.
VRNath, JNTU KAKINADA RM-STATS-2016
Least Squares Method
Regression
Correlation
Introduction to R-Software and R-Studio
Testing of the Correlation Coefficient
Simple Linear Regression
Linear Regression and ANOVA
Model Fitness
SST-> Total SS in obs Y = (y − y )2 = 16382.18
P
1
1552.88
4 SST = SSR + SSE
5 1 = SSR/SST + SSE/SST = 0.9052+0.0948
6 Multiple R 2 =0.9052 or 90.5%
Interpretation of R
Adding one variable reduced the residual error from 100% to
9.5%. Adding good variables improves model efficiency.
VRNath, JNTU KAKINADA RM-STATS-2016
Least Squares Method
Regression
Correlation
Introduction to R-Software and R-Studio
Testing of the Correlation Coefficient
Simple Linear Regression
Linear Regression and ANOVA
Model Fitness
SST-> Total SS in obs Y = (y − y )2 = 16382.18
P
1
1552.88
4 SST = SSR + SSE
5 1 = SSR/SST + SSE/SST = 0.9052+0.0948
6 Multiple R 2 =0.9052 or 90.5%
Interpretation of R
Adding one variable reduced the residual error from 100% to
9.5%. Adding good variables improves model efficiency.
VRNath, JNTU KAKINADA RM-STATS-2016
Least Squares Method
Regression
Correlation
Introduction to R-Software and R-Studio
Testing of the Correlation Coefficient
Simple Linear Regression
Linear Regression and ANOVA
Model Fitness
SST-> Total SS in obs Y = (y − y )2 = 16382.18
P
1
1552.88
4 SST = SSR + SSE
5 1 = SSR/SST + SSE/SST = 0.9052+0.0948
6 Multiple R 2 =0.9052 or 90.5%
Interpretation of R
Adding one variable reduced the residual error from 100% to
9.5%. Adding good variables improves model efficiency.
VRNath, JNTU KAKINADA RM-STATS-2016
Least Squares Method
Regression
Correlation
Introduction to R-Software and R-Studio
Testing of the Correlation Coefficient
Simple Linear Regression
Linear Regression and ANOVA
Model Fitness
SST-> Total SS in obs Y = (y − y )2 = 16382.18
P
1
1552.88
4 SST = SSR + SSE
5 1 = SSR/SST + SSE/SST = 0.9052+0.0948
6 Multiple R 2 =0.9052 or 90.5%
Interpretation of R
Adding one variable reduced the residual error from 100% to
9.5%. Adding good variables improves model efficiency.
VRNath, JNTU KAKINADA RM-STATS-2016
Least Squares Method
Regression
Correlation
Introduction to R-Software and R-Studio
Testing of the Correlation Coefficient
Simple Linear Regression
Linear Regression and ANOVA
Outline
1 Regression
Introduction
2 Introduction to R-Software and R-Studio
Windows in R Studio
Commands in R
Plotting by using R
House Prices Problem without Explanatory Variables
3 Simple Linear Regression
Least Squares Method
Correlation
Testing of the Correlation Coefficient
Linear Regression and ANOVA
Correlation
Introduction
Examples
(Height, Weight), (Man Hours, Production), (Training Hours,
Defects)
Types of Correlation
or
Examples
(training hours, no of errors), (weight of vehicle, speed)
Types of Correlation
or
Examples
(training hours, no of errors), (weight of vehicle, speed)
Types of Correlation
Scatter Diagram
Correlation Coefficient
Properties
Bounded
It is symmetric and bounded by -1 and +1.
1 Near +1 means strong positive correlation
2 Near -1 means strong negative correlation
3 Near +0 means weak positive correlation
4 Near -0 means weak negative correlation
Invariance
If U=a X+b and V=c Y+d then
a.c
Corr (X , Y ) = Corr (U, V )
|a|.|c|
Pearson Coefficient
Doing with R
Conclusion
Strong Positive Correlation
VRNath, JNTU KAKINADA RM-STATS-2016
Least Squares Method
Regression
Correlation
Introduction to R-Software and R-Studio
Testing of the Correlation Coefficient
Simple Linear Regression
Linear Regression and ANOVA
Rank Correlation
Spearman Rank Correlation
The Method
1 Assign ranks from smallest to largest (or otherwise)
2 If there are ties repeat the rank
3
P PP
Rx Ry − Rxn Ry
k = rh i h i
n Rx2 − ( Rx )2 n Ry2 − ( Ry )2
P P P P
Remark
Approximately same as Pearson coeffient. It is simple and
robust.
VRNath, JNTU KAKINADA RM-STATS-2016
Least Squares Method
Regression
Correlation
Introduction to R-Software and R-Studio
Testing of the Correlation Coefficient
Simple Linear Regression
Linear Regression and ANOVA
Outline
1 Regression
Introduction
2 Introduction to R-Software and R-Studio
Windows in R Studio
Commands in R
Plotting by using R
House Prices Problem without Explanatory Variables
3 Simple Linear Regression
Least Squares Method
Correlation
Testing of the Correlation Coefficient
Linear Regression and ANOVA
t-statistic
1 H0 : ρ = 0, H1 : ρ > 0
2 Compute r
n−2
tc = |r |
1 − r2
3 Reject the hypothesis if tc > tα
Testing Correlation
Doing with R
> height
[1] 150 155 158 160 165 168 170
>n=length(height); weight
[1] 48 58 57 62 67 70 72
>
r=cov(height,weight)/sqrt(var(height)*var(weight));r
[1] 0.9833692
> tc=r*sqrt((n-2)/(1-r2̂));tc
[1] 12.10720
> qt(0.995,n-2)
[1] 4.032143
> if(tc>qt(0.995,n-2))print("Significant")
else print("no correlation")
[1] "Significant Correlation"
VRNath, JNTU KAKINADA RM-STATS-2016
Least Squares Method
Regression
Correlation
Introduction to R-Software and R-Studio
Testing of the Correlation Coefficient
Simple Linear Regression
Linear Regression and ANOVA
Outline
1 Regression
Introduction
2 Introduction to R-Software and R-Studio
Windows in R Studio
Commands in R
Plotting by using R
House Prices Problem without Explanatory Variables
3 Simple Linear Regression
Least Squares Method
Correlation
Testing of the Correlation Coefficient
Linear Regression and ANOVA
Doing with R
> mat<-c(39,43,21,64,57,47,28,75,34,52)
> cal<-c(65,78,52,82,92,89,73,98,56,75)
> sxy=cov(mat,cal)
> b=sxy/var(mat);b
[1] 0.7655618
> a=mean(cal)-b*mean(mat);a
[1] 40.78416
Analysis of Variance
One way classification
Salient Points
1 Used to compare the means of several groups
2 Simple when compared to C2k two mean problems
3 Null hypothesis is all means are same
H0 : µ1 = µ2 = · · · = µk
4 Alternate hypothesis is atleast two means are different
H1 : µi 6= µj , i 6= j
5 Groups are formed with factors and levels
Analysis of Variance
One Way
The Method
T= ki=1 nj=1
P P i
1 Xij
2
2 Correction Factor (CF) = Tn
P 2 P 2
X Xkj
3 SSR= n1 1j + · · · + nk − CF
Pk Pni
i=1 j=1 Xij2
4 TSS= n − CF
5 SSE = TSS-SSE
6 DOF for SSR = k-1
7 DOF for TSS = n-1
SS
8 MS = DOF
Analysis of Variance
One Way Table
One Table
Source SS DOF MS F
Rows SSR r-1 SSR/(r-1) MSR/MSE
Error SSE n-r SSE/(n-r)
Total SST n-1