Professional Documents
Culture Documents
Regression Analysis of Student Performance in High School Examination (Evidence From USA)
Regression Analysis of Student Performance in High School Examination (Evidence From USA)
Purpose of the study is to analysis the Marks Secured by the students in high school Students from
the United States by using simple linear regression. The data has been acquired from Kaggle https://
as Gender, Race/Ethnicity, Parental Level of Education, Lunch, Test Preparation Course, Math Score,
Reading Score and Writing Score. Here, Gender, Race/Ethnicity, Parental Level of education, Lunch and
Test Preparation Course are categorical variable and assumed them as independent variables. Math Score,
Reading Score and Writing Score are considered as dependent variables. For data analysis I have used
1 Regression Analysis
In this section, I have analyzed the Gender effect on scores obtained by the students. For this purpose have
regressed marks obtain in mathematics, reading and writing gender variable (male or female). The Ordinary
1
Table 1:
Dependent variable:
∗ ∗∗ ∗∗∗
Note: p<0.1; p<0.05; p<0.01
The results of Table 1 show that the male students are secured 5.095 times more scores in mathematics
than female students on average but they got 7.135 and 9.156 times less marks than female students in
reading and writing subjects on average respectively. It is also noted that the gender effect on scores is
statistically significant for all subject. The Table 2 contains the results of gender effect on marks after
2
Table 2:
Dependent variable:
∗ ∗∗ ∗∗∗
Note: p<0.1; p<0.05; p<0.01
3
It has been observed that the gender impact on marks is still statistically significant after including
parental level of education. The male students secured more scores in mathematics on average than female
and they have secured less scores in reading and writing subject than female on average. It is also observed
that the students how’s parents have master and bachelor degree got more score than those how’s parents
have associates degrees. The results show that students who’s parents have college and high school education
This section contains the discussion about the impact of Race/Ethnicity on students marks. The students
are divided in four groups A, B, C and D. For this purpose I have regress math score, reading score and
Table 3:
Dependent variable:
∗ ∗∗ ∗∗∗
Note: p<0.1; p<0.05; p<0.01
4
From Table 3, it has been noticed that the students form Group B, Group C and Group D relatively
performed better than the students of Group A. The students from Groups E relatively secured better marks
than other groups (Group A, Group B, Group C and Group D). The results for Group C, Group D and
The objective of this section is to analyze the impact of Test Preparation Course on scores obtained by
students. Here, the test preparation course is divided in two groups, one those complete preparation course
and second those who did not complete preparation course. The OLS results are given bellow,
Table 4:
Dependent variable:
∗ ∗∗ ∗∗∗
Note: p<0.1; p<0.05; p<0.01
From Table 4, it is observed that the students who did not complete their preparation course got relatively
less scores than those students how have completed their test course. They have got 5.618, 7.630 and 9.914
times less scores (on average)in Math, Reading and Writing respectively than those have completed their
test preparation course. The coefficient of Test Preparation Course (Non) are statistically significant for all
subjects. In Table 5 I have included Lunch as a control variable and check impact of test preparation course
on students scores. Here the variable Lunch is a categorical variable and divided into two groups, one is for
5
those students who received free/reduce Lunch and second group is for those students who received Lunch
Table 5:
Dependent variable:
∗ ∗∗ ∗∗∗
Note: p<0.1; p<0.05; p<0.01
The results for Test Preparation Course (Non) in Table 5 are same as results observed in Table 4. From
Table 5 it is revealed that the students who got Lunch at standard price relatively perform well than those
In this section I have included all the independent variables (Gender, Parental Level of Education, Race/Eth-
nicity, Lunch and Test Preparation Course) in single regression line for each dependent variables (Math Score,
Reading Score and Writing Score). The OLS results are given in Table 6.
In Table 6, Race/Ethnicity, Parental Level of education, Lunch and Test Preparation Course are con-
sidered as control variables and analyzed the effect of Gender on exam scores. It is observed that the male
6
Table 6:
Dependent variable:
‘Math Score‘ ‘Reading Score‘ ‘Writing Score‘
(1) (2) (3)
∗∗∗ ∗∗∗
Gendermale 4.995 −7.071 −9.096∗∗∗
(0.839) (0.823) (0.795)
∗ ∗∗ ∗∗∗
Note: p<0.1; p<0.05; p<0.01
students relatively performed well than female in Mathematics and got 4.995 times relatively better scores
than female on average. On the other hand female students got better scores in Reading and Writing. The
male students relatively received 7.071 and 9.096 times less scores in Reading and Writing respectively. The
gender coefficients are statistically insignificant. The results revealed that the students who finished their
Test Preparation Course achieved better scores than those who did not finished or did not take Test Prepa-
ration Course. It is also observed that students who did their standard lunch performed well than those how
did not get standard lunch. The students who’s parents education level is high school attained significantly
7
less scores than other students who’s parents education level is more than high school.Findings of Table 6
2 Bootstrap
In this I have estimated the regression coefficients given in Table 6 by using bootstrap method. The R-Codes
for bootstrap are given in appendix. The bootstrap estimation is given in Table 7, Table 8 and Table 9 for
Math Score, Reading Score and Writing Score respectively. The data is replicated 1000 times. The bootstrap
regression coefficients are very close to original regression coefficient given in Table 6.
Table 7: Bootstrap Statistics for Math Score Table 8: Bootstrap Statistics for Reading Score
R original bootBias bootSE bootMed R original bootBias bootSE bootMed
1 1000.00 57.63 -0.06 1.85 57.57 1 1000.00 71.28 0.02 1.89 71.28
2 1000.00 5.00 0.04 0.83 5.04 2 1000.00 -7.07 -0.03 0.84 -7.08
3 1000.00 2.04 -0.04 1.66 2.01 3 1000.00 1.33 -0.01 1.74 1.30
4 1000.00 2.47 0.05 1.50 2.57 4 1000.00 2.27 0.02 1.61 2.38
5 1000.00 5.34 -0.01 1.51 5.37 5 1000.00 4.11 0.03 1.65 4.16
6 1000.00 10.13 0.09 1.78 10.25 6 1000.00 5.51 0.01 1.89 5.50
7 1000.00 1.97 0.06 1.50 2.04 7 1000.00 2.16 -0.07 1.54 2.13
8 1000.00 -4.80 0.00 1.32 -4.81 8 1000.00 -4.90 -0.01 1.22 -4.92
9 1000.00 2.89 0.00 1.81 2.84 9 1000.00 4.21 0.01 1.86 4.23
10 1000.00 -0.58 -0.00 1.27 -0.58 10 1000.00 -1.28 -0.07 1.19 -1.31
11 1000.00 -4.25 0.05 1.42 -4.09 11 1000.00 -4.05 0.05 1.41 -3.99
12 1000.00 10.88 0.01 0.90 10.91 12 1000.00 7.25 -0.02 0.85 7.20
13 1000.00 -5.49 -0.01 0.84 -5.52 13 1000.00 -7.36 0.00 0.87 -7.36
References
1. https://www.kaggle.com/spscientist/students-performance-in-exams.
2. https://cran.r-project.org/web/packages/stargazer/vignettes/stargazer.pdf.
3. https://github.com/kjhealy/latex-custom-kjh/blob/master/needs-listings/example.tex.
4. https://www.statmethods.net/advstats/bootstrapping.html.
5. https://www.datacamp.com/community/tutorials/linear-regression-R.
8
6. https://tex.stackexchange.com/questions/2832/how-can-i-have-two-tables-side-by-side.
7. https://tex.stackexchange.com/questions/297564/why-is-my-table-before-the-section-title.
Appendix
rm( l i s t = l s ( ) )
l i b r a r y (AER)
library ( boot )
l i b r a r y ( s an d w i c h )
library ( readxl )
library ( s t a r g a z e r )
library ( xtable )
S t u d e n t s P e r f o r m a n c e <− read_e x c e l ( "F : /XIAMEN␣UNIVERSITY/COURSE␣WORK/4/Micro−E c o n o m e t r i c s /Data/
StudentsPerformance . xlsx " )
attach ( S t u d e n t s P e r f o r m a n c e )
head ( S t u d e n t s P e r f o r m a n c e )
model1=lm ( ‘Math S c o r e ‘ ~ Gender )
model2=lm ( ‘ Reading S c o r e ‘ ~ Gender )
model3=lm ( ‘ W r i t i n g S c o r e ‘ ~ Gender )
s t a r g a z e r ( model1 , model2 , model3 , t a b l e . p l a c e m e n t = " htbp ! " )
model4=lm ( ‘Math S c o r e ‘ ~ Gender +‘ P a r e n t a l Education ‘ )
model5=lm ( ‘ Reading S c o r e ‘ ~ Gender +‘ P a r e n t a l Education ‘ )
model6=lm ( ‘ W r i t i n g S c o r e ‘ ~ Gender +‘ P a r e n t a l Education ‘ )
s t a r g a z e r ( model4 , model5 , model6 , t a b l e . p l a c e m e n t = " htbp ! " )
model7=lm ( ‘Math S c o r e ‘ ~ ‘ Race/ E t h n i c i t y ‘ )
model8=lm ( ‘ Reading S c o r e ‘ ~ ‘ Race/ E t h n i c i t y ‘ )
model9=lm ( ‘ W r i t i n g S c o r e ‘ ~ ‘ Race/ E t h n i c i t y ‘ )
s t a r g a z e r ( model7 , model8 , model9 , t a b l e . p l a c e m e n t = " htbp ! " )
model10=lm ( ‘Math S c o r e ‘ ~ ‘ T e s t P r e p a r a t i o n Course ‘ )
model11=lm ( ‘ Reading S c o r e ‘ ~ ‘ T e s t P r e p a r a t i o n Course ‘ )
model12=lm ( ‘ W r i t i n g S c o r e ‘ ~ ‘ T e s t P r e p a r a t i o n Course ‘ )
s t a r g a z e r ( model10 , model11 , model12 , t a b l e . p l a c e m e n t = " htbp ! " )
model13=lm ( ‘Math S c o r e ‘ ~ ‘ T e s t P r e p a r a t i o n Course ‘+ l u n c h )
model14=lm ( ‘ Reading S c o r e ‘ ~ ‘ T e s t P r e p a r a t i o n Course ‘+ l u n c h )
model15=lm ( ‘ W r i t i n g S c o r e ‘ ~ ‘ T e s t P r e p a r a t i o n Course ‘+ l u n c h )
s t a r g a z e r ( model13 , model14 , model15 , t a b l e . p l a c e m e n t = " htbp ! " )
model16=lm ( ‘Math S c o r e ‘ ~ Gender +‘ Race/ E t h n i c i t y ‘ + ‘ P a r e n t a l Education ‘+ l u n c h +‘ T e s t P r e p a r a t i o n
Course ‘ )
model17=lm ( ‘ Reading S c o r e ‘ ~ Gender +‘ Race/ E t h n i c i t y ‘ + ‘ P a r e n t a l Education ‘+ l u n c h +‘ T e s t P r e p a r a t i o n
Course ‘ )
model18=lm ( ‘ W r i t i n g S c o r e ‘ ~ Gender +‘ Race/ E t h n i c i t y ‘ + ‘ P a r e n t a l Education ‘+ l u n c h +‘ T e s t P r e p a r a t i o n
Course ‘ )
s t a r g a z e r ( model16 , model17 , model18 , t a b l e . p l a c e m e n t = " htbp ! " )
b e t a 1 <− function ( data , index ) { c o e f (lm ( ‘Math S c o r e ‘ ~ Gender +‘ Race/ E t h n i c i t y ‘ + ‘ P a r e n t a l Education ‘+
l u n c h+T e s t P r e p a r a t i o n Course ‘ , data = S t u d e n t s P e r f o r m a n c e , subset = index ) ) }
b e t a 1=b o o t ( S t u d e n t s P e r f o r m a n c e , beta1 ,R=1000)
9
b e t a 2 <− function ( data , index ) { c o e f (lm ( ‘ Reading S c o r e ‘ ~ Gender +‘ Race/ E t h n i c i t y ‘ + ‘ P a r e n t a l Education
‘+ l u n c h +‘ T e s t P r e p a r a t i o n Course ‘ , data = S t u d e n t s P e r f o r m a n c e , subset = index ) ) }
b e t a 2=b o o t ( S t u d e n t s P e r f o r m a n c e , beta2 ,R=1000)
b e t a 3 <− function ( data , index ) { c o e f (lm ( ‘ W r i t i n g S c o r e ‘ ~ Gender +‘ Race/ E t h n i c i t y ‘ + ‘ P a r e n t a l Education
‘+ l u n c h +‘ T e s t P r e p a r a t i o n Course ‘ , data = S t u d e n t s P e r f o r m a n c e , subset = index ) ) }
b e t a 3=b o o t ( S t u d e n t s P e r f o r m a n c e , beta3 ,R=1000)
b e t a 1=print . A s I s ( b e t a 1 )
b e t a 2=print . A s I s ( b e t a 2 )
b e t a 3=print . A s I s ( b e t a 3 )
xtable ( beta1 )
xtable ( beta2 )
xtable ( beta3 )
10