Econtrix

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

Econometrix project

Mansi singh 22/1754


Muskan 22/1115
Shrinkhala 22/1614
Sushmita 22/1550
Questions
Q1. Run a simple linear regression by choosing appropriate dependent and the
independent variable. Present the population regression first and then the
estimated model. Interpret the coefficients, overall explanatory power of the
model and test the individual coefficients to see if they are significant or not.
Clearly mention all the assumptions and your choice of variables.

Descriptive statistics
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.108936843
R Square 0.011867236
Adjusted R Square 0.011154297
Standard Error 18.63447978
Observations 1388

ANOVA
df SS MS F Significance F
Regression 1 5780.055968 5780.055968 16.64552501 4.76244E-05
Residual 1386 481279.9577 347.2438367
Total 1387 487060.0137

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 17.12169764 2.960516817 5.783347537 9.03744E-09 11.31411974 22.92927554 11.31411974 22.92927554
bwght 0.100294884 0.024582726 4.079892769 4.76244E-05 0.052071516 0.148518253 0.052071516 0.148518253

Assumptions:

1. Linearity: The relationship between the independent variable (bwght) and


the dependent variable (family income) is assumed to be linear.

2. Independence of Errors: The errors (residuals) are assumed to be


independent of each other.

3. Homoscedasticity: The variance of the errors is constant across all values of


the independent variable.

4. Normality of Errors: The errors are assumed to be normally distributed.

Population Regression:
The population regression equation is: y= β 0+ β 1x+ ε

Where:

 y is the dependent variable (family income)


 x is the independent variable(bwght)
 β 0 is the intercept
 β 1 is the coefficient of bwght
 ε is the error term

Estimated Model:
^y =17.1217+0.1003bwght

INTERPREATATION OF COEFFICIENTS

 Intercept(17.1217): when the birthweight(x) is zero, ^y = 17.127


 Coefficient of bwght(0.1003): for each unit increase in birth weight , ^y
increases by 0.1003 units, holding all other variables constant.
OVERALL EXPLANATORY POWER:

 2
R (0.0119): only about 1.19% of the variability in the dependent
variable is explained by the independent variable(bwght), indicating a
weak explanatory power of the model.

Test of Individual coefficients:

 P-value: both coefficients have p-value less than 0.05%, indicating that
they are statistically significant at the 5% level. This suggest that both
the intercept and the coefficient of bwght are significantly different
from zero.

Significance test
H 0: β 1=0
H 1 : β1 ≠ 0
t statistic for bwght=4.0799
p−¿valuefor bwght<0.05
Since the p-value is less than the significance level we reject the null
hypothesis. Therefore, we conclude that there is a statistically significant
effect of birth weight on the dependent variable.
Ques.2 Now estimate a multiple linear regression model. Interpret the
coefficients, overall explanatory power of the model and test for the individual
coefficients if they are significant or not, along with overall significance of the
model. Also fit another suitable functional form. Compare both the estimated
regression results and comment which model better fits the data. Clearly
mention all the assumptions.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.485562925
R Square 0.235771355
Adjusted R Square 0.233561004
Standard Error 16.40558814
Observations 1388

ANOVA
df SS MS F Significance F
Regression 4 114834.7992 28708.6998 106.667 2.90977E-79
Residual 1383 372225.2145 269.143322
Total 1387 487060.0137

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -84.76101145 10.15567652 -8.3461709 1.69E-16 -104.683207 -64.83882 -104.6832068 -64.83881613
bwght 0.069830208 0.021718927 3.21517762 0.001334 0.027224607 0.1124358 0.027224607 0.112435809
cigtax -0.685336383 0.117129042 -5.8511226 6.09E-09 -0.91510617 -0.455567 -0.915106172 -0.455566594
cigprice 0.566101377 0.089182721 6.34765758 2.96E-10 0.391153347 0.7410494 0.391153347 0.741049406
motheduc 3.480410448 0.184352152 18.8791419 6.5E-71 3.118770376 3.8420505 3.118770376 3.842050519
Assumptions:

1. Linearity: The relationship between the independent variable (bwght) and


the dependent variable (family income) is assumed to be linear.

2. Independence of Errors: The errors (residuals) are assumed to be


independent of each other.

3. Homoscedasticity: The variance of the errors is constant across all values of


the independent variable.

4. Normality of Errors: The errors are assumed to be normally distributed

5.No perfect multicollinearity: The independent variables are not perfectly


correlated with each other.

Estimated Model:

^y =−84.761+0.070 bwght−0.685 cigtax+ 0.566 cigprice +3.480 motheeduc

Where:

 ^y is the predicted value of the dependent variable.


 bwght, cigtax, cigprice and motheduc are the independent
variables(birthweight, cigarette tax, cigarette price, and mother’s
education, respectively)
Interpretation of coefficients
 Intercept(-84.761): when all independent variables are zero , ^y
is -84.761(may not have a practical interpretation. Depending on the
context)
 For each unit increase in birth weight ^y increases by 0.070, holding all
other variables constant.
 For each unit increase in cigarette tax , ^y decreases by 0.685, holding all
other variables constant.
 For each unit increase in cigarette price, ^y increases by 0.566, holding
all other variables constant.
 For each unit increase in mother’s education , ^y increases by 3.480,
holding all other variables constant.

Overall Explanatory power:

R (0.2358): about 23.58% of the variability in the dependent


2

variable is explained by the independent , indicating a moderate
explanatory power.

Test of Individual Coefficients:


P-value: All coefficients have p-value less than 0.05, indicating that they
are statistically significant at the 5%level.
Overall significance of the model:

 ANOVA: the p-value for the overall model is significantly low(p<0.05)


indicating that they model as a whole is statistically significant.
 H 0=the coefficients of all independent variables in the model are equal
to zero.
H 1=at least one of the coefficients of the independent variable in the
model is not equal to zero.
Significance F: 2.90977x 10−79
We reject the null hypothesis, indicating that the model is statistically
significant overall.

Another suitable functional form:


ln ( y ) =β 0+ β 1 bwght + β 3 cigtax+ β 4 motheduc + ε

ln(faminc)=-84.761+0.070bwght-0.685cigtax+0.566cigprice+3.480motheedu
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.447764442
R Square 0.200492996
Adjusted R Square 0.198180611
Standard Error 0.81758194
Observations 1388

ANOVA
df SS MS F Significance F
Regression 4 231.8257595 57.95643988 86.70399749 8.8433E-66
Residual 1383 924.4528358 0.668440228
Total 1387 1156.278595

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -2.227797769 0.506113992 -4.401770753 1.15625E-05 -3.220631854 -1.234963683 -3.220631854 -1.234963683
bwght 0.003448639 0.001082375 3.186177353 0.001473874 0.001325365 0.005571914 0.001325365 0.005571914
cigtax -0.031732609 0.005837193 -5.436278562 6.42351E-08 -0.043183319 -0.020281899 -0.043183319 -0.020281899
cigprice 0.026928762 0.004444472 6.05893349 1.76358E-09 0.018210126 0.035647398 0.018210126 0.035647398
motheduc 0.154450937 0.009187296 16.81136004 7.47428E-58 0.136428395 0.172473478 0.136428395 0.172473478

Two compare the two estimated regression models, let’s analyze their
key statistics:
1. R2: for the first model=0.2358
: for the second model=0.2005
2. Adjusted R2: for the first model=0.2336
:for the second model=0.1982
3. F-statistic and Significance F: for the first model=106.67 with
a significance F-value of 2.90977E-79
: for the second model=86.70
with a significance F-value of 8.8433E-66
In conclusion, based on the R2, adjusted R2,F-statistics,
significance F-values, the first model appears to provide a better
fit for the data compared to the first model.
Ques3. Add dummy variable(s) of your choice. Explain why you have chosen
those categorical variables. Interpret the coefficients, overall explanatory
power of the model and test for the individual coefficients if they are
significant or not, along with overall significance of the model.

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.488799374
R Square 0.238924828
Adjusted R Square 0.2361713
Standard Error 16.37762768
Observations 1388

ANOVA
df SS MS F Significance F
Regression 5 116370.7301 23274.15 86.77043341 1.87752E-79
Residual 1382 370689.2836 268.2267
Total 1387 487060.0137

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -84.25461542 10.14057628 -8.30866 2.28412E-16 -104.1472015 -64.36202933 -104.1472015 -64.36202933
bwght 0.073490803 0.021735808 3.381094 0.000742129 0.030852059 0.116129546 0.030852059 0.116129546
cigtax -0.681122617 0.116942674 -5.82441 7.11895E-09 -0.910526957 -0.451718276 -0.910526957 -0.451718276
cigprice 0.566777198 0.089031173 6.366053 2.63313E-10 0.392126347 0.741428048 0.392126347 0.741428048
motheduc 3.479555489 0.184038303 18.90669 4.34663E-71 3.118530861 3.840580118 3.118530861 3.840580118
male -2.112680877 0.882874669 -2.39296 0.016846022 -3.844600235 -0.380761519 -3.844600235 -0.380761519

Estimated Model:

The estimated regression equation based on the provided coefficients is:

^y =-84.255+0.073bwght-0.681cigtax+0.567cigprice+3.480motheduc-2.113male
Where:
 ^y is the predicted value of the dependent variable
 Bwght,cigtax,cigprice,and motheeduc are the independent variables(birth
weight, cigarette tax, cigarette price, and mother’s education respectively)
 Male is a dummy variable indicating gender.

Interpretation of coefficients:
 Intercept-(-84.255): when all independent variables are zero , ^y -84.255
 Coefficients of the Independent variables:
1. For each unit increase in birth weight, ^y increases by 0.073,
holding all other variables constant.
2. For each unit increase in birth weight in cigarette tax, ^y
decreases by 0.681, holding all other variables constant.
3. For each unit increase in cigarette price, ^y increases by 0.567,
holding all other variables constant.
4. For each unit increase in mother’s education, ^y increases by
3.480, holding all other variables constant.
5. Being male decreases the ^y by 2.113 units compared to being
female, holding all other variables constant.

Overall Explanatory Power:

R (0.2389): about 23.89% of the variability in the dependent variables is


2

explained by the independent variables in the model, indicating a moderate


explanatory power.

T est of individual coefficients :


 P-value: All coefficients have p-value less than
0.05, indicating tha they are statistically
significant at the 5% level.

Overall Significance of the model

Annova-the p-value for the overall model is significantly low(p<0.05)


indicating that the model as a whole is statistically significant.

Choice of Dummy variable:

The addition of the ‘male’ dummy variable allows us to explore the


effect of gender on the dependent variable. In many cases, gender can
play a significant role in various outcomes, and including its as a
predictor variable allows us to account to this factor in the model.

You might also like