Professional Documents
Culture Documents
MAS202 Final Project
MAS202 Final Project
Analyzing the academic performance between two genders and the impact of the exams
on course grade.
Contents
Part I: Introduction and Methodology:........................................................................................................2
Part II: Descriptive Statistics:.......................................................................................................................3
Central Tendency Measurements:...........................................................................................................3
Box and whisker plot:..............................................................................................................................4
Measurement of variation:......................................................................................................................6
Part III: Inferential Statistics:........................................................................................................................6
Problem 1: Construct Confidence Interval and Hypothesis Testing on the difference in means..............6
Problem 2: Construct Confidence Interval and Hypothesis Testing on the difference in proportions
between two populations........................................................................................................................8
Problem 3: Regression Analysis and ANOVA analysis..............................................................................9
Part IV: Conclusion.....................................................................................................................................12
2
sex: Sex of the student as recorded on the university registration system: Man or
Woman.
exam1: Exam 1 grade.
exam2: Exam 2 grade.
exam3: Exam 3 grade.
course_grade: Overall course grade.
First five rows of the datasets:
semeste course_grad
r sex exam1 exam2 exam3 e
2000-1 Man 84.5 69.5 86.5 76.2564
2000-1 Man 80 74 67 75.3882
2000-1 Man 56 70 71.5 67.0564
2000-1 Man 64 61 67.5 63.4538
2000-1 Man 90.5 72.5 75 72.3949
Median 82 74 78 72.5267
3
Figure 1: Measurement of Central Tendency of the scores of exam1, exam2, exam3 and
course_grade
At first glance, in comparing the average score between exams, students achieve best score in
the exam1 and achieve worst in the overall course grade.
4
Figure 4: Box-and-whisker plot of Exam3 variable
As can be seen from those box and whisker plot we drew before, the data distribution of these
variables are quite balance.
5
Figure 6: Box-and-whisker plot of 4 variables according to male and female students
As can be seen on the figure 6, the range of exams’ scores of men is less than female, indicate
that the exam’ score of female varies more than men.
Measurement of variation:
exam1 exam2 exam3 course_grade
Standard Deviation 12.2460602 13.77746829 14.7067911 9.807053053
Sample Variance 149.9659904 189.8186325 216.2897045 96.17828958
Range 99.3 61.5 70.8889 54.2934
IQR 82 74 78 72.5267
Coefficient of Variation 0.152277155 0.189757707 0.19484461 0.135758745
The variation measurements provide insights into the spread and consistency of the data. While
variables like exam3 and exam1 exhibit higher standard deviations and ranges, indicating
greater variability, exam2 and course_grade appear to have relatively lower variability.
The coefficient of variation highlights the relative variability within each variable, with
course_grade being the most consistent.
6
To answer this question, we will perform t-test for independent samples by using Data Analyst
tool in Excel.
State the null and alternative hypothesis:
Null Hypothesis (H0): There is no significant difference in course grades between male and
female students.
Alternative Hypothesis (H1): There is a significant difference in course grades between male and
female students.
H 0 : μ1−μ2=0
H 1 : μ 1−μ2 ≠ 0
The p-value (0.1384) is greater than the significance level (0.05). Since the p-value is greater
than α, we fail to reject the null hypothesis. Therefore, there is not enough evidence to
conclude that there is a significant difference in course grades between male and female
students.
Furthermore, the t-statistic is -1.48, which is in range of t-crit for two-tail (-1.97, 1.97) so we fail
to reject the null hypothesis.
Construct Confidence Interval for the difference in means between final course grades of man
and women:
7
So, with 5% significance level, the Confidence Interval for the difference between final course
grades of man and women is (-5.568, 0.7406)
H 1 : p 1 − p2 ≠ 0
We compare the calculated test statistic Z0 (0.77) with the critical values (-1.645 and 1.645), and
the z-stat is in range of right and left critical, so we fail to reject the null hypothesis. So, at a 5%
8
significance level, there is not enough evidence to suggest that there is a significant difference
in the proportion of male and female students with course grades greater than 80.
Construct Confidence Interval for the differences in the proportion of male students having A
score and that of female students having A score:
So, the 95% confidence interval for the difference in proportions of male and female students
with course grades greater than 80 is approximately (-0.285, -0.032). This means that, with 95%
confidence, the true difference in proportions falls within this interval, and it suggests that the
proportion of male students with course grades greater than 80 is likely lower than that of
female students.
9
Regression Equation:
^y ( the final course grade )=0.24331∗X exam1+ 0.35064∗X exam 2+ 0.26609∗X exam3 +7.14429
With this regression equation, we can predict the final course grade of a student by knowing
his/her grades of exam1, exam2 and exam3.
For example, given a student having his grade of exam1, exam2 and exam3 are 70, 85, 75
respectively, can you predict the final course grade?
Answer: The predicted final course grade is:
^y ( the final course grade )=0.24331∗X exam1+ 0.35064∗X exam 2+ 0.26609∗X exam3 +7.14429=0.24331∗70+0.35064∗
10
Therefore, with these figures, we can see that ‘exam2’ variable has the most influence on the
changes of overall course grade.
The coefficient of determination:
With the output table, we can see that R-squared is 0.8084, which indicates that approximately
80.84% of the variability in the course grade can be explained by the independent variables
(exam1, exam2, and exam3). This is a relatively high R-squared value, indicating that the model
does a good job of explaining the variance in course grades.
Hypothesis Testing for Slopes and Intercept:
Determine whether the slopes (coefficients for exam1, exam2, and exam3) and the intercept are
statistically significant.
Stating the null and alternative hypothesis:
{
H 0 : β j=0 (No variables have relationship withthe course final grade)
a
a
H 1 : β j ≠ 0( At lease one variable ha s relationship with the course final grade )
The p-values associated with each coefficient are very small (close to zero), which indicates that
all three coefficients are statistically significant.
The intercept also has a statistically significant p-value, which means it is significantly different
from zero. The intercept represents the expected course grade when all independent variables
are zero, which may not be meaningful in this context and need further information to signify
the statement.
11
Our analysis revealed several important findings:
12