Regression Analysis of Student Performance in High School Examination (Evidence From USA)

Regression Analysis of Student Performance in High School
Examination (Evidence from USA)
Student: Shahzad Munir ID: 27720170155975

Home Work (2)
Micro-Econometrics and Application
Purpose of the study is to analysis the Marks Secured by the students in high school Students from
the United States by using simple linear regression. The data has been acquired from Kaggle https://
www.kaggle.com/spscientist/students-performance-in-exams. The data contain eight variables such
as Gender, Race/Ethnicity, Parental Level of Education, Lunch, Test Preparation Course, Math Score,
Reading Score and Writing Score. Here, Gender, Race/Ethnicity, Parental Level of education, Lunch and
Test Preparation Course are categorical variable and assumed them as independent variables. Math Score,
Reading Score and Writing Score are considered as dependent variables. For data analysis I have used
R-language and codes are given in appendix.
1 Regression Analysis
1.1 Effect of Gender on Scores
In this section, I have analyzed the Gender effect on scores obtained by the students. For this purpose have
regressed marks obtain in mathematics, reading and writing gender variable (male or female). The Ordinary
Least Square (OLS) results are given in Table 1.
1
Table 1:
Dependent variable:
‘Math Score‘ ‘Reading Score‘ ‘Writing Score‘
(1) (2) (3)
Gendermale 5.095∗∗∗ −7.135∗∗∗ −9.156∗∗∗
(0.946) (0.896) (0.917)
Constant 63.633∗∗∗ 72.608∗∗∗ 72.467∗∗∗
(0.657) (0.622) (0.637)
∗ ∗∗ ∗∗∗
Note: p<0.1; p<0.05; p<0.01
The results of Table 1 show that the male students are secured 5.095 times more scores in mathematics
than female students on average but they got 7.135 and 9.156 times less marks than female students in
reading and writing subjects on average respectively. It is also noted that the gender effect on scores is
statistically significant for all subject. The Table 2 contains the results of gender effect on marks after
including parental level of education as a control variable.
2
Table 2:
Dependent variable:
(1) (2) (3)
Gendermale 5.366∗∗∗ −6.836∗∗∗ −8.778∗∗∗
(0.933) (0.881) (0.890)
‘Parental Education‘bachelor’s degree 1.568 1.994 3.385∗∗
(1.677) (1.583) (1.600)
‘Parental Education‘high school −5.975∗∗∗ −5.930∗∗∗ −7.071∗∗∗
(1.444) (1.363) (1.377)
‘Parental Education‘master’s degree 2.333 3.846∗ 5.012∗∗
(2.158) (2.037) (2.059)
‘Parental Education‘some college −0.757 −1.465 −1.052
(1.391) (1.313) (1.327)
‘Parental Education‘some high school −4.462∗∗∗ −3.893∗∗∗ −4.884∗∗∗
(1.479) (1.396) (1.411)
Constant 65.321∗∗∗ 74.192∗∗∗ 74.088∗∗∗
(1.084) (1.023) (1.034)
∗ ∗∗ ∗∗∗
Note: p<0.1; p<0.05; p<0.01
3
It has been observed that the gender impact on marks is still statistically significant after including
parental level of education. The male students secured more scores in mathematics on average than female
and they have secured less scores in reading and writing subject than female on average. It is also observed
that the students how’s parents have master and bachelor degree got more score than those how’s parents
have associates degrees. The results show that students who’s parents have college and high school education
performed less than those who’s parents have associates degree.
1.2 Effect of Race/Ethnicity on Scores
This section contains the discussion about the impact of Race/Ethnicity on students marks. The students
are divided in four groups A, B, C and D. For this purpose I have regress math score, reading score and
writing score. The results are given in Table 3.
Table 3:
Dependent variable:
(1) (2) (3)
‘Race/Ethnicity‘group B 1.823 2.678 2.926
(1.897) (1.858) (1.928)
‘Race/Ethnicity‘group C 2.835 4.429∗∗ 5.153∗∗∗
(1.770) (1.734) (1.800)
‘Race/Ethnicity‘group D 5.733∗∗∗ 5.356∗∗∗ 7.471∗∗∗
(1.812) (1.775) (1.842)
‘Race/Ethnicity‘group E 12.192∗∗∗ 8.354∗∗∗ 8.733∗∗∗
(2.002) (1.961) (2.035)
Constant 61.629∗∗∗ 64.674∗∗∗ 62.674∗∗∗
(1.565) (1.533) (1.591)
∗ ∗∗ ∗∗∗
Note: p<0.1; p<0.05; p<0.01
4
From Table 3, it has been noticed that the students form Group B, Group C and Group D relatively
performed better than the students of Group A. The students from Groups E relatively secured better marks
than other groups (Group A, Group B, Group C and Group D). The results for Group C, Group D and
Group E are statistically significant.
1.3 Effect of Test Preparation Course on Scores
The objective of this section is to analyze the impact of Test Preparation Course on scores obtained by
students. Here, the test preparation course is divided in two groups, one those complete preparation course
and second those who did not complete preparation course. The OLS results are given bellow,
Table 4:
Dependent variable:
(1) (2) (3)
‘Test Preparation Course‘none −5.618∗∗∗ −7.360∗∗∗ −9.914∗∗∗
(0.985) (0.935) (0.952)
Constant 69.696∗∗∗ 73.894∗∗∗ 74.419∗∗∗
(0.789) (0.749) (0.763)
∗ ∗∗ ∗∗∗
Note: p<0.1; p<0.05; p<0.01
From Table 4, it is observed that the students who did not complete their preparation course got relatively
less scores than those students how have completed their test course. They have got 5.618, 7.630 and 9.914
times less scores (on average)in Math, Reading and Writing respectively than those have completed their
test preparation course. The coefficient of Test Preparation Course (Non) are statistically significant for all
subjects. In Table 5 I have included Lunch as a control variable and check impact of test preparation course
on students scores. Here the variable Lunch is a categorical variable and divided into two groups, one is for
5
those students who received free/reduce Lunch and second group is for those students who received Lunch
on standard price. The OLS results are given in Table 5.
Table 5:
Dependent variable:
(1) (2) (3)
(0.919) (0.908) (0.919)
lunchstandard 11.212∗∗∗ 7.128∗∗∗ 7.972∗∗∗
(0.921) (0.910) (0.921)
Constant 62.586∗∗∗ 69.374∗∗∗ 69.364∗∗∗
(0.940) (0.928) (0.940)
∗ ∗∗ ∗∗∗
Note: p<0.1; p<0.05; p<0.01
The results for Test Preparation Course (Non) in Table 5 are same as results observed in Table 4. From
Table 5 it is revealed that the students who got Lunch at standard price relatively perform well than those
students who received Lunch at Free/Reduce price.
1.4 Combine Effect
In this section I have included all the independent variables (Gender, Parental Level of Education, Race/Eth-
nicity, Lunch and Test Preparation Course) in single regression line for each dependent variables (Math Score,
Reading Score and Writing Score). The OLS results are given in Table 6.
In Table 6, Race/Ethnicity, Parental Level of education, Lunch and Test Preparation Course are con-
sidered as control variables and analyzed the effect of Gender on exam scores. It is observed that the male
6
Table 6:
Dependent variable:
(1) (2) (3)
∗∗∗ ∗∗∗
Gendermale 4.995 −7.071 −9.096∗∗∗
(0.839) (0.823) (0.795)
‘Race/Ethnicity‘group B 2.041 1.326 1.220

(1.700) (1.667) (1.610)
‘Race/Ethnicity‘group C 2.470 2.274 2.413

(1.592) (1.561) (1.508)
‘Race/Ethnicity‘group D 5.341∗∗∗ 4.106∗∗ 5.931∗∗∗

(1.624) (1.592) (1.539)
‘Race/Ethnicity‘group E 10.135∗∗∗ 5.514∗∗∗ 5.137∗∗∗

(1.802) (1.766) (1.707)
‘Parental Education‘bachelor’s degree 1.966 2.156 3.485∗∗

(1.502) (1.473) (1.423)
‘Parental Education‘high school −4.803∗∗∗ −4.900∗∗∗ −5.814∗∗∗

(1.297) (1.272) (1.229)
‘Parental Education‘master’s degree 2.888 4.205∗∗ 5.183∗∗∗

(1.938) (1.900) (1.836)
‘Parental Education‘some college −0.583 −1.280 −0.920

(1.247) (1.223) (1.181)
‘Parental Education‘some high school −4.249∗∗∗ −4.049∗∗∗ −5.322∗∗∗

(1.333) (1.307) (1.263)
lunchstandard 10.877∗∗∗ 7.246∗∗∗ 8.203∗∗∗

(0.873) (0.856) (0.827)

(0.876) (0.859) (0.830)
Constant 57.631∗∗∗ 71.278∗∗∗ 71.914∗∗∗

(1.872) (1.836) (1.774)
∗ ∗∗ ∗∗∗
Note: p<0.1; p<0.05; p<0.01
students relatively performed well than female in Mathematics and got 4.995 times relatively better scores
than female on average. On the other hand female students got better scores in Reading and Writing. The
male students relatively received 7.071 and 9.096 times less scores in Reading and Writing respectively. The
gender coefficients are statistically insignificant. The results revealed that the students who finished their
Test Preparation Course achieved better scores than those who did not finished or did not take Test Prepa-
ration Course. It is also observed that students who did their standard lunch performed well than those how
did not get standard lunch. The students who’s parents education level is high school attained significantly
7
less scores than other students who’s parents education level is more than high school.Findings of Table 6
are supported our all above finding.
2 Bootstrap
In this I have estimated the regression coefficients given in Table 6 by using bootstrap method. The R-Codes
for bootstrap are given in appendix. The bootstrap estimation is given in Table 7, Table 8 and Table 9 for
Math Score, Reading Score and Writing Score respectively. The data is replicated 1000 times. The bootstrap
regression coefficients are very close to original regression coefficient given in Table 6.
Table 7: Bootstrap Statistics for Math Score Table 8: Bootstrap Statistics for Reading Score
R original bootBias bootSE bootMed R original bootBias bootSE bootMed
1 1000.00 57.63 -0.06 1.85 57.57 1 1000.00 71.28 0.02 1.89 71.28
2 1000.00 5.00 0.04 0.83 5.04 2 1000.00 -7.07 -0.03 0.84 -7.08
3 1000.00 2.04 -0.04 1.66 2.01 3 1000.00 1.33 -0.01 1.74 1.30
4 1000.00 2.47 0.05 1.50 2.57 4 1000.00 2.27 0.02 1.61 2.38
5 1000.00 5.34 -0.01 1.51 5.37 5 1000.00 4.11 0.03 1.65 4.16
6 1000.00 10.13 0.09 1.78 10.25 6 1000.00 5.51 0.01 1.89 5.50
7 1000.00 1.97 0.06 1.50 2.04 7 1000.00 2.16 -0.07 1.54 2.13
8 1000.00 -4.80 0.00 1.32 -4.81 8 1000.00 -4.90 -0.01 1.22 -4.92
9 1000.00 2.89 0.00 1.81 2.84 9 1000.00 4.21 0.01 1.86 4.23
10 1000.00 -0.58 -0.00 1.27 -0.58 10 1000.00 -1.28 -0.07 1.19 -1.31
11 1000.00 -4.25 0.05 1.42 -4.09 11 1000.00 -4.05 0.05 1.41 -3.99
12 1000.00 10.88 0.01 0.90 10.91 12 1000.00 7.25 -0.02 0.85 7.20
13 1000.00 -5.49 -0.01 0.84 -5.52 13 1000.00 -7.36 0.00 0.87 -7.36
Table 9: Bootstrap Statistics for Writing Score

R original bootBias bootSE bootMed
1 1000.00 71.91 -0.05 1.81 71.82
2 1000.00 -9.10 0.03 0.78 -9.04
3 1000.00 1.22 0.08 1.66 1.27
4 1000.00 2.41 0.06 1.56 2.51
5 1000.00 5.93 0.13 1.55 5.98
6 1000.00 5.14 0.02 1.78 5.13
7 1000.00 3.48 -0.04 1.37 3.47
8 1000.00 -5.81 0.02 1.23 -5.81
9 1000.00 5.18 -0.11 1.70 5.04
10 1000.00 -0.92 0.00 1.15 -0.90
11 1000.00 -5.32 -0.04 1.29 -5.39
12 1000.00 8.20 -0.00 0.85 8.20
13 1000.00 -10.06 -0.02 0.79 -10.10
References
1. https://www.kaggle.com/spscientist/students-performance-in-exams.
2. https://cran.r-project.org/web/packages/stargazer/vignettes/stargazer.pdf.
3. https://github.com/kjhealy/latex-custom-kjh/blob/master/needs-listings/example.tex.
4. https://www.statmethods.net/advstats/bootstrapping.html.
5. https://www.datacamp.com/community/tutorials/linear-regression-R.
8
6. https://tex.stackexchange.com/questions/2832/how-can-i-have-two-tables-side-by-side.
7. https://tex.stackexchange.com/questions/297564/why-is-my-table-before-the-section-title.
Appendix
rm( l i s t = l s ( ) )
l i b r a r y (AER)
library ( boot )
l i b r a r y ( s an d w i c h )
library ( readxl )
library ( s t a r g a z e r )
library ( xtable )
S t u d e n t s P e r f o r m a n c e <− read_e x c e l ( "F : /XIAMEN␣UNIVERSITY/COURSE␣WORK/4/Micro−E c o n o m e t r i c s /Data/
StudentsPerformance . xlsx " )
attach ( S t u d e n t s P e r f o r m a n c e )
head ( S t u d e n t s P e r f o r m a n c e )
model1=lm ( ‘Math S c o r e ‘ ~ Gender )
model2=lm ( ‘ Reading S c o r e ‘ ~ Gender )
model3=lm ( ‘ W r i t i n g S c o r e ‘ ~ Gender )
s t a r g a z e r ( model1 , model2 , model3 , t a b l e . p l a c e m e n t = " htbp ! " )
model4=lm ( ‘Math S c o r e ‘ ~ Gender +‘ P a r e n t a l Education ‘ )
model5=lm ( ‘ Reading S c o r e ‘ ~ Gender +‘ P a r e n t a l Education ‘ )
model6=lm ( ‘ W r i t i n g S c o r e ‘ ~ Gender +‘ P a r e n t a l Education ‘ )
model7=lm ( ‘Math S c o r e ‘ ~ ‘ Race/ E t h n i c i t y ‘ )
model8=lm ( ‘ Reading S c o r e ‘ ~ ‘ Race/ E t h n i c i t y ‘ )
model9=lm ( ‘ W r i t i n g S c o r e ‘ ~ ‘ Race/ E t h n i c i t y ‘ )
model10=lm ( ‘Math S c o r e ‘ ~ ‘ T e s t P r e p a r a t i o n Course ‘ )
model11=lm ( ‘ Reading S c o r e ‘ ~ ‘ T e s t P r e p a r a t i o n Course ‘ )
model12=lm ( ‘ W r i t i n g S c o r e ‘ ~ ‘ T e s t P r e p a r a t i o n Course ‘ )
model13=lm ( ‘Math S c o r e ‘ ~ ‘ T e s t P r e p a r a t i o n Course ‘+ l u n c h )
model14=lm ( ‘ Reading S c o r e ‘ ~ ‘ T e s t P r e p a r a t i o n Course ‘+ l u n c h )
model15=lm ( ‘ W r i t i n g S c o r e ‘ ~ ‘ T e s t P r e p a r a t i o n Course ‘+ l u n c h )
model16=lm ( ‘Math S c o r e ‘ ~ Gender +‘ Race/ E t h n i c i t y ‘ + ‘ P a r e n t a l Education ‘+ l u n c h +‘ T e s t P r e p a r a t i o n
Course ‘ )
model17=lm ( ‘ Reading S c o r e ‘ ~ Gender +‘ Race/ E t h n i c i t y ‘ + ‘ P a r e n t a l Education ‘+ l u n c h +‘ T e s t P r e p a r a t i o n
Course ‘ )
model18=lm ( ‘ W r i t i n g S c o r e ‘ ~ Gender +‘ Race/ E t h n i c i t y ‘ + ‘ P a r e n t a l Education ‘+ l u n c h +‘ T e s t P r e p a r a t i o n
Course ‘ )
b e t a 1 <− function ( data , index ) { c o e f (lm ( ‘Math S c o r e ‘ ~ Gender +‘ Race/ E t h n i c i t y ‘ + ‘ P a r e n t a l Education ‘+
l u n c h+T e s t P r e p a r a t i o n Course ‘ , data = S t u d e n t s P e r f o r m a n c e , subset = index ) ) }
b e t a 1=b o o t ( S t u d e n t s P e r f o r m a n c e , beta1 ,R=1000)
9
b e t a 2 <− function ( data , index ) { c o e f (lm ( ‘ Reading S c o r e ‘ ~ Gender +‘ Race/ E t h n i c i t y ‘ + ‘ P a r e n t a l Education
‘+ l u n c h +‘ T e s t P r e p a r a t i o n Course ‘ , data = S t u d e n t s P e r f o r m a n c e , subset = index ) ) }
b e t a 3 <− function ( data , index ) { c o e f (lm ( ‘ W r i t i n g S c o r e ‘ ~ Gender +‘ Race/ E t h n i c i t y ‘ + ‘ P a r e n t a l Education
‘+ l u n c h +‘ T e s t P r e p a r a t i o n Course ‘ , data = S t u d e n t s P e r f o r m a n c e , subset = index ) ) }
b e t a 1=print . A s I s ( b e t a 1 )
xtable ( beta1 )
xtable ( beta2 )
xtable ( beta3 )
10

Regression Analysis of Student Performance in High School Examination (Evidence From USA)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression Analysis of Student Performance in High School Examination (Evidence From USA)

Uploaded by

Copyright:

Available Formats

Regression Analysis of Student Performance in High School

Examination (Evidence from USA)

Student: Shahzad Munir ID: 27720170155975

www.kaggle.com/spscientist/students-performance-in-exams. The data contain eight variables such

R-language and codes are given in appendix.

1.1 Effect of Gender on Scores

Least Square (OLS) results are given in Table 1.

‘Math Score‘ ‘Reading Score‘ ‘Writing Score‘

(1) (2) (3)

Gendermale 5.095∗∗∗ −7.135∗∗∗ −9.156∗∗∗

(0.946) (0.896) (0.917)

Constant 63.633∗∗∗ 72.608∗∗∗ 72.467∗∗∗

(0.657) (0.622) (0.637)

including parental level of education as a control variable.

‘Math Score‘ ‘Reading Score‘ ‘Writing Score‘

(1) (2) (3)

Gendermale 5.366∗∗∗ −6.836∗∗∗ −8.778∗∗∗

(0.933) (0.881) (0.890)

‘Parental Education‘bachelor’s degree 1.568 1.994 3.385∗∗

(1.677) (1.583) (1.600)

‘Parental Education‘high school −5.975∗∗∗ −5.930∗∗∗ −7.071∗∗∗

(1.444) (1.363) (1.377)

‘Parental Education‘master’s degree 2.333 3.846∗ 5.012∗∗

(2.158) (2.037) (2.059)

‘Parental Education‘some college −0.757 −1.465 −1.052

(1.391) (1.313) (1.327)

‘Parental Education‘some high school −4.462∗∗∗ −3.893∗∗∗ −4.884∗∗∗

(1.479) (1.396) (1.411)

Constant 65.321∗∗∗ 74.192∗∗∗ 74.088∗∗∗

(1.084) (1.023) (1.034)

performed less than those who’s parents have associates degree.

1.2 Effect of Race/Ethnicity on Scores

writing score. The results are given in Table 3.

‘Math Score‘ ‘Reading Score‘ ‘Writing Score‘

(1) (2) (3)

‘Race/Ethnicity‘group B 1.823 2.678 2.926

(1.897) (1.858) (1.928)

‘Race/Ethnicity‘group C 2.835 4.429∗∗ 5.153∗∗∗

(1.770) (1.734) (1.800)

‘Race/Ethnicity‘group D 5.733∗∗∗ 5.356∗∗∗ 7.471∗∗∗

(1.812) (1.775) (1.842)

‘Race/Ethnicity‘group E 12.192∗∗∗ 8.354∗∗∗ 8.733∗∗∗

(2.002) (1.961) (2.035)

Constant 61.629∗∗∗ 64.674∗∗∗ 62.674∗∗∗

(1.565) (1.533) (1.591)

Group E are statistically significant.

1.3 Effect of Test Preparation Course on Scores

‘Math Score‘ ‘Reading Score‘ ‘Writing Score‘

(1) (2) (3)

‘Test Preparation Course‘none −5.618∗∗∗ −7.360∗∗∗ −9.914∗∗∗

(0.985) (0.935) (0.952)

Constant 69.696∗∗∗ 73.894∗∗∗ 74.419∗∗∗

(0.789) (0.749) (0.763)

on standard price. The OLS results are given in Table 5.

‘Math Score‘ ‘Reading Score‘ ‘Writing Score‘

(1) (2) (3)

‘Test Preparation Course‘none −5.808∗∗∗ −7.481∗∗∗ −10.050∗∗∗

(0.919) (0.908) (0.919)

lunchstandard 11.212∗∗∗ 7.128∗∗∗ 7.972∗∗∗

(0.921) (0.910) (0.921)

Constant 62.586∗∗∗ 69.374∗∗∗ 69.364∗∗∗

(0.940) (0.928) (0.940)