Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

Part I:

1. A researcher used a data from a sample of 32 companies to investigate the relationship


between annual spending (Yi), measured in millions of dollars per year and annual firm
profits (Xi), and measured in millions of dollars per year.
X 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Y 2 2 2 1 3 5 6 6 10 10 10 12 15 10 11

The preliminary analysis of the sample data produces the following sample information:
N N N N
∑ Y i= ∑ X i= ∑ Y 2= i
∑ X i2 =
N = 32 , i=1 4,917.8, i=1 11,856.1 , i=1 4,022,814.0 i=1 25,796,522.5
N N N N
∑ X i Y i= ∑ xi yi= ∑ yi 2= ∑ x i2 =
, i=1 9,785,312.0, i=1 7,963,252.0 i=1 3,267,040.6 , i=1 21,403,801.0
N N
∑ u^ i2 =∑ e 2= x i= X i− X̄ , y i =Y i−Ȳ
, i=1 1 304,324.7, where for i = 1, ..., N.
Use the above sample information to answer all the following questions:
(a) Fit the regression equation (5 pts).

Answer for the above question

U1 1 ∑ xiyi
E(Y i )=β 0 +β 1 X 1i E(V 2)=E( i )2= E(U 2)= ⋅σ2X1i=σ2¿
i √X X i X 2
and 1i 1i 1i ∑ xi
U1 1
E(V 2)=E( i )2= E(U 2)= ⋅σ2X1i=σ2 ∑ xiyi
i √X X i X ¿
1i 1i 1i 2
Then, ∑ xi
U1 1
E(V 2)=E( i )2= E(U 2)= ⋅σ2X1i=σ2
i √X X i X ¿ 0.372
1i 1i 1i

Yi=β0+β1X1i+Ui ∑ Yi 4,917.8 ∑ Xi 11,856.1


¿ Ȳ −x̄ Ȳ= x̄ =
, where as N = 32 =153.681 and N = 32 =370.503
Yi=β0+β1X1i+Ui¿ 153.681−0.372 (370.503)

Yi=β0+β1X1i+Ui =15.854

Thus the fitted regression equation is given by: Ŷ =−x̄ =15.854+0.37 2 Xi


(b) Compute R2 and adjusted R2 (2 pts).

Answer for the above question

∑ e t e t −1 = P
^
∑ e t −1 2
304,324
R2¿ 1− 3,267,040.6
304,324.7
R2¿ 1− 3,267,040.6
=1-0.09315
=0.90685
=0.91

(n−1)
Adjusted R2¿ 1−(1−R 2) ( n−k ) Where, k = the number of explanatory variables in the

Model and n= number of sample observation

(32−1)
¿ 1−(1−0.906)
(32−1)

(31)
=1−(0.094) (31)

=1−(0.094)

=0.906

(c) Report and interpret your results (8 pts).

Answer for the above question


c) Interpretation, the value of the intercept term, 15.854, implies that the value of the dependent
variable ‘Y’ is 15.854 when the value of the explanatory variable is zero. The value of the slope
coefficient 0.372 is a measure of the marginal change in the dependent variable ‘Y’ when the
value of the explanatory variable increases by one. For instance, in this model, the value of ‘Y’
increases on average by 0.372 units when ‘X’ increases by one.

R2 =0.91 result shows that 91% of the variation in annual spending of the companies is
explained by the variation in annual firm profits of the companies; and the rest 9% remain
unexplained by the firm profits. In other word, there may be other important explanatory
variables left out that could contribute to the variation in annual spending of the companies,
under consideration.

Adjusted R2=0.906 measure the variation in a multiple regression model. In this


case doesn’t measure because it is simple regression model
SSE
In general, in simple linear regression, the higher the MSE= k−1 means the better the model is
determined by the explanatory variable in the model. In multiple linear regressions, however,
SSR
MSR=
every time we insert additional explanatory variable in the model, the n−k increases irrespective
MSR
Fcal= ≈Fα(k−1,n−k)
of the improvement in the goodness-of- fit of the model. That means high MSE may not imply that
the model is good.

Part II:

1. Suppose we want to estimate equation:

, where yi (api2000) = academic performance of school


x1 (acs_k3)= average class size in KG trough third grade
x2 (meals)= the percentage of students receiving free meals
x3 (full) = the percentage of teachers who have full teaching credentials.

a. Interpret the following results (5 pts)

Source SS df MS Number of obs = 313


F( 3, 309) = 213.41
Model 2634884.26 3 878294.754 Prob > F = 0.0000
Residual 1271713.21 309 4115.57673 R-squared = 0.6745
Adj R-squared = 0.6713
Total 3906597.47 312 12521.1457 Root MSE = 64.153

api00 Coef. Std. Err. t P>|t| [95% Conf. Interval]

acs_k3 -2.681508 1.393991 -1.92 0.055 -5.424424 .0614074


meals -3.702419 .1540256 -24.04 0.000 -4.005491 -3.399348
full .1086104 .090719 1.20 0.232 -.0698947 .2871154
_cons 906.7392 28.26505 32.08 0.000 851.1228 962.3555
Answer for the above question

 The average class size x1 (acs_k3, b1=-2.68, p=0.055,), is not statistically significant at
the 0.05 level, but only just so. The coefficient is negative which would indicate that
larger class size is related to lower academic performance — which is what we would
expect.
 The effect of meals (b2=-3.70, p=0.000) is significant and its coefficient is negative
indicating that the greater the percentage students receiving free meals, the lower the
academic performance.
 The percentage of teachers with full credentials (full, b 3=0.11, p=.232) is not statistically
significant at the 0.05 level. This would seem to indicate that the percentage of teachers
with full credentials is not an important factor in predicting academic performance — this
result was somewhat unexpected.
 Finally, the constant is 906.7392, and this is the predicted value when all other
independent variables equal zero. In most cases, the constant is not very interesting.

b. Use the following test results and check whether specification error is evident or not
(4 pts).

Ramsey RESET test using powers of the fitted values of api00


Ho: model has no omitted variables
F(3, 306) = 5.21
Prob > F = 0.0016
i
Source SS df MS Number of obs = 313
F( 2, 310) = 337.91
Model 2678122.88 2 1339061.44 Prob > F = 0.0000
Residual 1228474.59 310 3962.82127 R-squared = 0.6855
Adj R-squared = 0.6835
Total 3906597.47 312 12521.1457 Root MSE = 62.951

api00 Coef. Std. Err. t P>|t| [95% Conf. Interval]

_hat 2.696019 .5149109 5.24 0.000 1.682857 3.709181


_hatsq -.0013447 .0004071 -3.30 0.001 -.0021457 -.0005437
_cons -521.8956 159.721 -3.27 0.001 -836.1699 -207.6213

Answer for the above question

Since Prob > F value is less than 0.05, thus, the conclusion is that the null hypothesis that there
are no omitted variables is rejected in favor of the hypothesis that there are omitted variables.
Therefore, there are omitted variables in this model.

c. Use the following result and check multi collinarity (2 pts)


Variable VIF 1/VIF

full 1.11 0.903713


meals 1.07 0.933517
acs_k3 1.04 0.964781

Mean VIF 1.07

Answer for the above question


The general rule of thumb is that VIFs exceeding 10 are signs of serious multicollinearity
requiring correction. Here we don’t see evidence of multicollinearity in our model. Therefore,
this value is less than 10, we can that conclude that multicollinearity is not present in the data

d. Based on the following result, test heteroscedasticity (4 pts)


i.
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of api00

chi2(1) = 0.51
Prob > chi2 = 0.4745

ii.
Cameron & Trivedi's decomposition of IM-test

Source chi2 df p

Heteroskedasticity 7.03 9 0.6335


Skewness 2.58 3 0.4603
Kurtosis 1.13 1 0.2867

Total 10.75 13 0.6314

Answer for the above question

Ho: This is the null hypothesis of the test, which states that there is constant variance among the
residuals.

chi2 (1): This is the Chi-Square test statistic of the test. In this case, it is 0.51.
Prob > chi2: This is the p-value that corresponds to the Chi-Square test statistic. In this case, it
is 0.4745. Since this value is greater than 0.05, we can accept the null hypothesis and conclude
that heteroscedasticity is not present in the data.

You might also like