REGRESSION AND CORRELATION Assignment Recovered

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

CORRELATION ∧REGRESSION :WORKSHOP

1. Tabulated is the information on the price per share and the dividend for a sample of 30 companies.

Price/share Dividend
Company
X Y
1 20.00 3.14
2 22.01 3.36
3 31.39 0.46
4 33.57 7.99
5 35.86 0.77
6 36.12 8.46
7 36.16 7.62
8 37.99 8.03
9 38.85 6.33
10 39.65 7.96
11 43.44 8.95
12 49.08 9.61
13 53.73 11.11
14 54.41 13.28
15 55.10 10.22
16 57.06 9.53
17 57.40 12.60
18 58.30 10.43
19 59.51 7.97
20 60.60 9.19
21 64.01 16.50
22 64.66 16.10
23 64.74 13.76
24 64.95 10.54
25 66.43 21.15
26 68.18 14.30
27 69.56 24.42
28 74.90 11.54
29 77.91 17.65
30 80.00 17.36

1. What can you say about the correlation coefficient r ?

Solution :
Price/share Dividend
Company
X Y
XY X2 Y2
1 20.00 3.14 62.8 400 9.8596
2 22.01 3.36 73.9536 484.4401 11.2896
3 31.39 0.46 14.4394 985.3321 0.2116
4 33.57 7.99 268.2243 1126.9449 63.8401
5 35.86 0.77 27.6122 1285.9396 0.5929
6 36.12 8.46 305.5752 1304.6544 71.5716
7 36.16 7.62 275.5392 1307.5456 58.0644
8 37.99 8.03 305.0597 1443.2401 64.4809
9 38.85 6.33 245.9205 1509.3225 40.0689
10 39.65 7.96 315.614 1572.1225 63.3616
11 43.44 8.95 388.788 1887.0336 80.1025
12 49.08 9.61 471.6588 2408.8464 92.3521
13 53.73 11.11 596.9403 2886.9129 123.4321
14 54.41 13.28 722.5648 2960.4481 176.3584
15 55.10 10.22 563.122 3036.01 104.4484
16 57.06 9.53 543.7818 3255.8436 90.8209
17 57.40 12.60 723.24 3294.76 158.76
18 58.30 10.43 608.069 3398.89 108.7849
19 59.51 7.97 474.2947 3541.4401 63.5209
20 60.60 9.19 556.914 3672.36 84.4561
21 64.01 16.50 1056.165 4097.2801 272.25
22 64.66 16.10 1041.026 4180.9156 259.21
23 64.74 13.76 890.8224 4191.2676 189.3376
24 64.95 10.54 684.573 4218.5025 111.0916
25 66.43 21.15 1404.9945 4412.9449 447.3225
26 68.18 14.30 974.974 4648.5124 204.49
27 69.56 24.42 1698.6552 4838.5936 596.3364
28 74.90 11.54 864.346 5610.01 133.1716
29 77.91 17.65 1375.1115 6069.9681 311.5225
30 80.00 17.36 1388.8 6400 301.3696
TOTAL 1575.57 320.33 18923.5791 90430.0813 4292.4793

n
n ∑ x i y i−( ∑ x i )( ∑ y i )
i=1
r=
2 2
√ [ n∑ x −(∑ x ) ][ n∑ y −(∑ y ) ]
i
2
i i
2
i

30 ( 18923.5791 )−(1575.57)(320.33)
r= 2 2
√ [ 30 ( 90430.0813 )−( 1575.57 ) ][ 30 ( 4292.4793 ) −( 320.33 ) ]
r =0.81135844468
This suggests a strong positive correlation between the two variables
r 2=(0.81135844468)2

r 2=0.65830252575
This means 65.830252575 % of the variation in the values of Y (Dividend ) is accounted for by a linear
relationship with X (Price per Share).

2. What is the slope and the y-intercept or the values of a and b ?

Solution :
Solving for the slope :

n
n ∑ xi y i−( ∑ xi )( ∑ y i )
i=1
b= 2
n ∑ x i 2− ( ∑ x i )

30 ( 18923.5791 )−(1575.57)(320.33)
b=
30 ( 90430.0813 )−( 1575.57 )2
b=0.27336252024
Solving for the intercept :
a= ý−b x́

ý=
∑ yi
n
320.33
ý=
30

ý=10.67766666667

x́=
∑ xi
n
1575.57
x́=
30
x́=52.519
a=10.67766666667−(0.27336252024)(52.519)
a=−3.67905953381
3. What is the standard error estimate?

SSE
se 2=
n−2

SSE=∑ ( y i−a−b x i )2

Price/share Dividend 2
Company
X Y ( y i−a−b x i )
1 20.00 3.14 1.827387935
2 22.01 3.36 1.045200481
3 31.39 0.46 19.72949813
4 33.57 7.99 6.21145829
5 35.86 0.77 28.66232248
6 36.12 8.46 5.131155104
7 36.16 7.62 2.000161926
8 37.99 8.03 1.753022073
9 38.85 6.33 0.373411883
10 39.65 7.96 0.640377041
11 43.44 8.95 0.568805068
12 49.08 9.61 0.016274857
13 53.73 11.11 0.010259934
14 54.41 13.28 4.348913267
15 55.10 10.22 1.353069876
16 57.06 9.53 5.707348986
17 57.40 12.60 0.345803845
18 58.30 10.43 3.341493997
19 59.51 7.97 21.33279643
20 60.60 9.19 13.66565875
21 64.01 16.50 7.188429275
22 64.66 16.10 4.424455588
23 64.74 13.76 0.06678607
24 64.95 10.54 12.50213721
25 66.43 21.15 44.48339516
26 68.18 14.30 0.434013592
27 69.56 24.42 82.5183773
28 74.90 11.54 27.62336231
29 77.91 17.65 0.000985056
30 80.00 17.36 0.688803833
TOTAL 1575.57 320.33 297.9951657

297.9951657
se 2=
30−2
se 2=10.64268449
2
√ s =√ 10.64268449
e

se =3.262312752

4.Write the equation of the regression line.

^y =a+bx
^y =−3.67905953381+0.27336252024 x
5. What point does the line passes through?

Looking at the given graph, one will have to take the predicted value to zero to determine the value x
where the line passes through the x-axis (Price per Share).

^y =−3.67905953381+0.27336252024 x
0=−3.67905953381+ 0.27336252024 x
3.67905953381
x=
0.27336252024
3.67905953381
x=
0.27336252024
x=13.45853678324
x ≈ 13.49 $
This means investing at 13.49$ is predicted to gain nothing in your dividend.

6. Interpret the result.

Based on the result on 1, there is a strong positive correlation between the two variable and having
positive slope in our equation, which in turn, means as our price per share increases there is also an
expected dividend increase.

Another Solution using Microsoft excel Data Analysis

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.811358445
R Square 0.658302526
Adjusted R Square 0.646099045
Standard Error 3.262312752
Observations 30

ANOVA
df SS MS F Significance F
Regression 1 574.1071709 574.1071709 53.94383076 5.35824E-08
Residual 28 297.9951657 10.64268449
Total 29 872.1023367

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -3.679059534 2.043449507 -1.800416169 0.082577716 -7.864876099 0.506757031 -7.864876099 0.506757031
X 0.27336252 0.037219289 7.344646401 5.35824E-08 0.197122262 0.349602779 0.197122262 0.349602779
2. Acid rain affects the environment by increasing the acidity of lakes and streams to dangerous levels,
damaging trees and soil, accelerating the decay of building materials and paints, and destroying national
monuments. The goal of the Environmental Protection Agency’s (EPA) Acid Rain Program is to achieve
environmental health benefits by reducing the emissions of the primary causes of acid rain: sulfur
dioxide and nitrogen oxides. You work for the EPA and you want to determine whether is a significant
correlation between the average concentrations of sulfur dioxide and nitrogen dioxide.

AVERAGE AVERAGE
Sulfur dioxide Concentration Nitrogen dioxide Concentration

x y
4.6 15.6
4.4 15.4
4.0 14.9
4.0 14.5
3.8 13.5
3.9 13.4
3.4 12.7
3.3 12.3
2.9 11.4
2.4 10.5
2.2 10.2

Analyzing data.

A. The data in the table show the average concentrations of sulfur dioxide (in parts per billion) and
nitrogen dioxide (in parts per billion) for 11 years. Construct a scatter plot of the data and make a
conclusion about the type of correlation between the average concentrations of sulfur dioxide and
nitrogen dioxide.
Based on the scatter plot we
have; one can see there is a
strong positive correlation between sulfur dioxide and nitrogen dioxide.

B. Calculate the correlation coefficient r and verify your conclusion in part (A).

AVERAGE AVERAGE
Sulfur dioxide Concentration Nitrogen dioxide Concentration
xy x2 y2
x y
4.6 15.6 71.76 21.16 243.36
4.4 15.4 67.76 19.36 237.16
4.0 14.9 59.6 16 222.01
4.0 14.5 58 16 210.25
3.8 13.5 51.3 14.44 182.25
3.9 13.4 52.26 15.21 179.56
3.4 12.7 43.18 11.56 161.29
3.3 12.3 40.59 10.89 151.29
2.9 11.4 33.06 8.41 129.96
2.4 10.5 25.2 5.76 110.25
2.2 10.2 22.44 4.84 104.04
TOTAL 38.9 144.4 525.15 143.63 1931.42

n
n ∑ x i y i−( ∑ x i )( ∑ y i )
i=1
r=
2 2
√ [ n∑ x −(∑ x ) ][ n∑ y −(∑ y ) ]
i
2
i i
2
i
11 ( 525.15 )−(38.9)(144.4)
r= 2 2
√ [ 11 ( 143.63 )−( 38.9 ) ][ 11 ( 1931.42 )−( 144.4 ) ]
r =0.98336349121
With our r ≈ 1 this support our conclusion that there is a strong positive association/correlation
between sulfur and nitrogen dioxide.

C. Test the significance of the correlation coefficient found in part (B ). Use α =0.05 .

Testing ρ=0
Solution :
1. H 0 : ρ=0

2. H 1 : ρ≠ 0

3. α =0.05

4. Degrees of freedom: v=9

5. Critical region: t ←2.262∧t> 2.262

6. Computation :

r √n−2
t=
√1−r 2
0.98336349121 √ 11−2
t= 2
√ 1−( 0.98336349121 )
t=16.240637804577
7. Decision:

t statistic >t tabulated

Reject the hypothesis of no linear relationship

D. Find the equation of the regression line for the average concentrations of sulfur dioxide and nitrogen
dioxide. Add the graph of the regression line to your scatter plot in part (a). Does the regression line
appear to be a good fit?

^y =a+bx
Solving for the slope :
n
n ∑ xi y i−( ∑ xi )( ∑ y i )
i=1
b= 2
2
n ∑ x i −( ∑ x i )

11 (525.15 )−(38.9)(144.4)
b=
11(143.63)−(38.9)2
b=2.390434764988
Solving for the intercept :
a= ý−b x́

ý=
∑ yi
n
144.4
ý=
11

ý=13.12727272727

x́=
∑ xi
n
38.9
x́=
11
x́=3.53636363636
a=13.12727272727−(2.390434764988)( 3.53636363636)
a=4.67382614921
Regression Line
^y =a+bx
^y =4.67382614921+ 2.390434764988 x
AVERAGE AVERAGE
E. Can you use the equation of
Sulfur Nitrogen
the regression dioxide dioxide 2
line to predict
the average Concentration Concentration ( y i−a−b x i ) concentration
of nitrogen dioxide given
the average x y concentration
of sulfur 4.6 15.6 0.004876108 dioxide? Why
or Why not? 4.4 15.4 0.043371558
4.0 14.9 0.441471813
F. Find the 4.0 14.5 0.069925051 coefficient of
determination 2 and the
3.8 13.5 0.066295444 r
3.9 13.4 0.355839429
standard error of estimate s .
3.4 12.7 0.010262492 e

Interpret your 3.3 12.3 0.068780408 result.


2.9 11.4 0.042471082
Solution : 2.4 10.5 0.007944815
2.2 10.2 0.071407182
TOTAL 38.9 144.4 1.182645384

r =0.98336349121

r 2=(0.98336349121)2

r 2=0.967003756
This means 96.67003756% of the variation in the values of y (nitrogen dioxide) is accounted for by a
linear relationship with x (sulfur dioxide ).

Standard Error Estimate


SSE
se 2=
n−2
SSE=∑ ( y i−a−b x i )2

1.182645384
se 2=
11−2

se 2=0.131405
2
√ s =√ 0.131405
e

se =0.3624968965384

This means a very small typical difference between the observed∧ predicted values clustered
around ∈the trend line which is also supported by a strong correlation coefficient value .

Another solution using MS EXCEL DATA ANALYSIS

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.983363491
R Square 0.967003756
Adjusted R Square 0.963337506
Standard Error 0.362498335
Observations 11

ANOVA
df SS MS F Significance F
Regression 1 34.6591728 34.6591728 263.7583163 0.0000000564698959
Residual 9 1.182645384 0.131405043
Total 10 35.84181818

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 4.673815947 0.531863978 8.78761515 0.000010375720 3.470656041 5.876975854 3.470656041 5.876975854
x 2.39043765 0.147188656 16.2406378 0.000000056470 2.057473778 2.723401522 2.057473778 2.723401522
WORKSHOP
CORRELATION ∧REGRESSION : APPLICATION

ECONOMIC GENERAL
LEARNERS' NAMES AGE
STATUS AVERAGE

MALE
ALMENDRAS FRANCIS THEO SABADO 4 13 85

AMODIA JOSE CARLO B 2 15 82

ANCERO CLARK JUSTINE CAPON 1 14 82

ARRADAZA JERRED COMPOSO 3 14 87

BALANZA CHRIST ALLEN CASAS 3 13 82

BANSEL IAN ANGEL SUAYBAGUIO 2 15 79

CAPUNO RAYMOND O 2 16 82
COLOTARIO BRAIN CASPER RETONA 1 14 87

CORALES STEVE JOHN ARCENAL 2 14 84

DAMILES CARLOS MIGUEL COSIDO 3 13 90

DUMDUM JACOB GABRIEL ABARACA 3 14 85

GORGONIO EPIFANIO JR RUIDAS 2 15 83

GUERRERO JOHN KIRBY JUMAMOY 3 14 84

HAMILI ROMER NAYLON 2 14 81

LOBO PRINCE DEOU OMAY 4 15 84

MABANSAG JEHOZAPHAT EMMANUEL 4 13 88


MACALISANG CHANRIS MICHAEL 14 85
3
ARIPAL
MALACASTE ACE CHRISTIAN 14 92
4
HURTADO
MARCOJOS ANTHONY JOHN 13 82
2
MIGSANING
MICABALO CHRISTIAN LOYD 14 84
4
BEJERANO
MINTA CHARLES VINCENT BAJENTING 2 14 86

OREDA DANNIEL JULES CASERO 3 13 83

PEREZ REYMON PAREJA 2 15 84

PLIÑOS JERALD JAMES NADA 3 14 88

SUMAMPAN ORGIE SENANGOTE 2 14 80


TOLEDO SOLOMON GABRIEL 14 88
3
ORCULLO
TUMANGDAY JULIUS CESAR BOTERO 2 14 84
FEMALE
AKOL LIAN GRACE ELIZAR 2 14 87

ALADJA KAYE LORAINE PABUAYA 3 13 92

ANOBA JOVY JUATON 3 14 90

AQUINO BLESSHELLE TORREGOSA 3 13 90

BARBOSA ROSE MARY BITOON 2 13 91

CAPILLANES JESYL JOY BARCEBAL 3 13 86

CASTEÑEDA JULY 2 13 90

CERVANTES KETH RUSSLE ISIDTO 3 14 94

DUNGCO PRINCESS SUNDAE PAICAN 3 14 90

ENRIQUEZ LIAH KATE YGOÑA 2 14 85

FERRER HEZEKIAH 3 13 88

FLORES MARIA THERESA ARANCANA 2 15 90


LUGA KYLA MARIELLE PUSTA 3 15 84

MENDEZ SHEENA SALI 3 13 90

PIGUERRA RHEA ANGELA DIZON 3 13 86

PIÑONES GWEN CHIE CUÑADO 2 14 91

POROL INGRID NICOLE SILAGAN 3 14 87

RAQUIL KRISTELLE JUMAMIL 2 14 92

RAYOS HEAVEN FAIT PAMPLONA 3 13 87


SANTILLAN MARY ANTONETTE 15 86
4
CASTRO
SON CRISTEL DIANNE BOISER 3 14 90

TIOZON MARIZ HEART OLAÑA 3 13 90

VILLAREAL SHAIRRA CORBITA 2 14 86


VILLARUBIA SAMANTHA FAITH 14 87
3
TALARION
ZANORIA KRISTINA PAULA LOPEZ 6 13 92

ECONOMIC STATUS
MONTHLY INCOME
CLASSES
(FAMILY OF 5 )
1 POOR LESS THAN 10,481.00
2 LOW INCOME BUT NOT BETWEEN 10,481 AND 20,962.00
POOR
3 LOWER MIDDLE INCOME BETWEEN 20,962.00 AND 41,924.00
4 MIDDLE MIDDLE INCOME BETWEEN 41,924.00AND 73,367.00
5 UPPER MIDDLE INCOME BETWEEN 73,367.00 AND 125,772.00
6 UPPER INCOME BETWEEN 125,772.00AND 209,620
7 RICH MORE THAN 209,620.00
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.370903543
R Square 0.137569438
Adjusted R Square 0.120320827
Standard Error 3.339841944
Observations 52

ANOVA
df SS MS F Significance F
Regression 1 88.96509727 88.96509727 7.975681983 0.006791433
Residual 50 557.7272104 11.15454421
Total 51 646.6923077

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 111.266621 8.754683685 12.7093822 2.79465E-17 93.68232128 128.8509207 93.68232128 128.8509207
X -1.780671693 0.630521559 -2.824124994 0.006791433 -3.047111515 -0.514231871 -3.047111515 -0.514231871
We can conclude the following based on the statistical data we have:

a. since r =0.370903543there is a weak association/correlation between students’ age and academic


performance

b. with a negative b coefficient this would mean as student age there’s a decline in academic
performance. Again, based on the statistical data we have a weak linear association between our
variables; therefore, we can conclude that there is really no relationship between students’ age and
academic performance.

B . STUDENT ’ S ECONOMIC STATUS∧ACADEMIC PERFORMANCE

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.318480078
R Square 0.10142956
Adjusted R Square 0.083458151
Standard Error 3.409101323
Observations 52

ANOVA
df SS MS F Significance F
Regression 1 65.59371614 65.59371614 5.643940383 0.02138943
Residual 50 581.0985915 11.62197183
Total 51 646.6923077

Coefficients t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 83 1.578107264 52.59465051 1.93507E-45 79.83027828 86.16972172 79.83027828 86.16972172
X 1.309859155 0.551357633 2.375697873 0.02138943 0.202424758 2.417293552 0.202424758 2.417293552
We can conclude the following based on the statistical data we have:

a. since r =0.318480078there is a weak association/correlation between students’ economic status


and academic performance

b. with a positive b coefficient this would mean students’ economic status have positive impact in their
academic performance. Again, based on the statistical data we have a weak linear association between
our variables; therefore, we can conclude that there is really no relationship between students’ age and
academic performance.

You might also like