MAS202 Final Project

MAS202 Final Project
Analyzing the academic performance between two genders and the impact of the exams
on course grade.
Contents
Part I: Introduction and Methodology:........................................................................................................2
Part II: Descriptive Statistics:.......................................................................................................................3
Central Tendency Measurements:...........................................................................................................3
Box and whisker plot:..............................................................................................................................4
Measurement of variation:......................................................................................................................6
Part III: Inferential Statistics:........................................................................................................................6
Problem 1: Construct Confidence Interval and Hypothesis Testing on the difference in means..............6
Problem 2: Construct Confidence Interval and Hypothesis Testing on the difference in proportions
between two populations........................................................................................................................8
Problem 3: Regression Analysis and ANOVA analysis..............................................................................9
Part IV: Conclusion.....................................................................................................................................12
Part I: Introduction and Methodology:

Main topic of group project: Analyzing the academic performance between two genders and
the impact of the exams on course grade.
In the realm of education, understanding the factors that influence academic performance is a
subject of perennial interest. One critical facet of this inquiry is the examination of gender-
based disparities in academic achievement and the extent to which various assessments
contribute to a student's overall course grade. This study delves into the dynamic interplay
between gender and academic success, seeking to shed light on the underlying determinants of
performance in an educational setting.
Dataset:
Dataset Description: Grades on three exams and overall course grade for 233 students during
several years for a statistics course at a university.
Dataset Link: https://www.openintro.org/data/index.php?
data=exam_grades&fbclid=IwAR3uMdku-RVqMOwWa1W7BbjYJBZzVMKXQ-
9pUphVkZSb_tvgMrQ0MmgbJ_w
Format of the dataset:
A data frame with 233 observations, each representing a student.
 semester: Semester when grades were recorded.
2
 sex: Sex of the student as recorded on the university registration system: Man or
Woman.
 exam1: Exam 1 grade.
 course_grade: Overall course grade.
First five rows of the datasets:
semeste course_grad
r sex exam1 exam2 exam3 e
2000-1 Man 84.5 69.5 86.5 76.2564
2000-1 Man 80 74 67 75.3882
2000-1 Man 56 70 71.5 67.0564
2000-1 Man 64 61 67.5 63.4538
2000-1 Man 90.5 72.5 75 72.3949
Main issues to address:

The main issues addressed by this dataset are assessing the academic performance of students,
analyzing the impact of exams on course grades, and potentially exploring gender-related
differences in academic performance.
Questions raised:
Which exams (exam1, exam2 or exam3) has the most influence on the ‘course_grade’?
Is there any difference in the academic performance between men and women?
Identify continuous variables (independent and independent)
In this dataset, the continuous variables are 'exam1,' 'exam2,' and 'exam3' as independent
variables, and 'course_grade' as the dependent variable.
Part II: Descriptive Statistics:

Central Tendency Measurements:
exam1 exam2 exam3 course_grade
Mean 80.41954936 72.6055794 75.47958927 72.2388309
Median 82 74 78 72.5267
Mode 80 83.5 76 #N/A
3
Figure 1: Measurement of Central Tendency of the scores of exam1, exam2, exam3 and
course_grade
At first glance, in comparing the average score between exams, students achieve best score in
the exam1 and achieve worst in the overall course grade.
Box and whisker plot:
Figure 2: Box-and-whisker plot of Exam1 variable
4
As can be seen from those box and whisker plot we drew before, the data distribution of these
variables are quite balance.
5
Figure 6: Box-and-whisker plot of 4 variables according to male and female students
As can be seen on the figure 6, the range of exams’ scores of men is less than female, indicate
that the exam’ score of female varies more than men.
Measurement of variation:
exam1 exam2 exam3 course_grade
Standard Deviation 12.2460602 13.77746829 14.7067911 9.807053053
Sample Variance 149.9659904 189.8186325 216.2897045 96.17828958
Range 99.3 61.5 70.8889 54.2934
IQR 82 74 78 72.5267
Coefficient of Variation 0.152277155 0.189757707 0.19484461 0.135758745
The variation measurements provide insights into the spread and consistency of the data. While
variables like exam3 and exam1 exhibit higher standard deviations and ranges, indicating
greater variability, exam2 and course_grade appear to have relatively lower variability.
The coefficient of variation highlights the relative variability within each variable, with
course_grade being the most consistent.
Part III: Inferential Statistics:

In here, our group will specify some problems that occur in the introduction part.
Problem 1: Construct Confidence Interval and Hypothesis Testing on the difference in

means.
Questions: With 5% confidence level, are there significant differences in course grades between
male and female students?
6
To answer this question, we will perform t-test for independent samples by using Data Analyst
tool in Excel.
State the null and alternative hypothesis:
Null Hypothesis (H0): There is no significant difference in course grades between male and
female students.
Alternative Hypothesis (H1): There is a significant difference in course grades between male and
female students.
H 0 : μ1−μ2=0
H 1 : μ 1−μ2 ≠ 0
Note: index 1 is for men and index 2 is for women.
The p-value (0.1384) is greater than the significance level (0.05). Since the p-value is greater
than α, we fail to reject the null hypothesis. Therefore, there is not enough evidence to
conclude that there is a significant difference in course grades between male and female
students.
Furthermore, the t-statistic is -1.48, which is in range of t-crit for two-tail (-1.97, 1.97) so we fail
to reject the null hypothesis.
Construct Confidence Interval for the difference in means between final course grades of man
and women:
7
So, with 5% significance level, the Confidence Interval for the difference between final course
grades of man and women is (-5.568, 0.7406)
Problem 2: Construct Confidence Interval and Hypothesis Testing on the difference in

proportions between two populations.
Question: Assume that a student having final course score above 8 gets “A score”. So, test the
hypothesis that with 5% significance level, are there any differences in the proportion of male
students having A score and that of female students having A score?
State the null and alternative hypothesis:
Null Hypothesis (H0): There is no significant difference in the proportion of male students
having A score and that of female students having A score
Alternative Hypothesis (H1): There is a significant in the proportion of male students having A
score and that of female students having A score
H 0 : p 1− p2=0
H 1 : p 1 − p2 ≠ 0
Note: index 1 is for men and index 2 is for women.
We compare the calculated test statistic Z0 (0.77) with the critical values (-1.645 and 1.645), and
the z-stat is in range of right and left critical, so we fail to reject the null hypothesis. So, at a 5%
8
significance level, there is not enough evidence to suggest that there is a significant difference
in the proportion of male and female students with course grades greater than 80.
Construct Confidence Interval for the differences in the proportion of male students having A
score and that of female students having A score:
So, the 95% confidence interval for the difference in proportions of male and female students
with course grades greater than 80 is approximately (-0.285, -0.032). This means that, with 95%
confidence, the true difference in proportions falls within this interval, and it suggests that the
proportion of male students with course grades greater than 80 is likely lower than that of
female students.
Problem 3: Regression Analysis and ANOVA analysis.

In here, we will find the relationships between independent variables (exam1, exam2, exam3)
and the dependent variable (course_grade) to see which exam mostly influences the final
course grade of a student in the statistics course.
In this part, our group used Regression in Excel Analysis Tool to analyze such relationships.
Setting Variables:
X1: ‘exam1’ variable
Y: ‘course_grade’ variable
Output of ANOVA:
9
Regression Equation:
^y ( the final course grade )=0.24331∗X exam1+ 0.35064∗X exam 2+ 0.26609∗X exam3 +7.14429
With this regression equation, we can predict the final course grade of a student by knowing
his/her grades of exam1, exam2 and exam3.
For example, given a student having his grade of exam1, exam2 and exam3 are 70, 85, 75
respectively, can you predict the final course grade?
Answer: The predicted final course grade is:
^y ( the final course grade )=0.24331∗X exam1+ 0.35064∗X exam 2+ 0.26609∗X exam3 +7.14429=0.24331∗70+0.35064∗
^y ( the final course grade )=72.1838 5
Regression coefficient of the significance independent variables:

The regression coefficients for these significant independent variables are as follows:
 For exam1: 0.2431

 For exam2: 0.3506
 For exam3: 0.2661
These coefficients indicate how much the course grade is expected to change for a one-unit
increase in each respective exam score. For example, for every one-point increase in exam1, the
course grade is expected to increase by approximately 0.2431 points, and so on.
10
Therefore, with these figures, we can see that ‘exam2’ variable has the most influence on the
changes of overall course grade.
The coefficient of determination:
With the output table, we can see that R-squared is 0.8084, which indicates that approximately
80.84% of the variability in the course grade can be explained by the independent variables
(exam1, exam2, and exam3). This is a relatively high R-squared value, indicating that the model
does a good job of explaining the variance in course grades.
Hypothesis Testing for Slopes and Intercept:
Determine whether the slopes (coefficients for exam1, exam2, and exam3) and the intercept are
statistically significant.
Stating the null and alternative hypothesis:
{
H 0 : β j=0 (No variables have relationship withthe course final grade)
a
a
H 1 : β j ≠ 0( At lease one variable ha s relationship with the course final grade )
alpha P-value comparison decision

exam1 5% 1.19442E-19 < α reject H0
exam2 5% 3.21592E-36 < α reject H0
exam3 5% 6.3588E-27 < α reject H0
The p-values associated with each coefficient are very small (close to zero), which indicates that
all three coefficients are statistically significant.
The intercept also has a statistically significant p-value, which means it is significantly different
from zero. The intercept represents the expected course grade when all independent variables
are zero, which may not be meaningful in this context and need further information to signify
the statement.
Part IV: Conclusion

In this project, we conducted a comprehensive analysis of academic performance and the
influence of exams on course grades, focusing on the gender-based differences within a student
population. The study aimed to explore whether gender plays a significant role in academic
outcomes and to understand the relationship between individual exam scores and the final
course grade.
11
Our analysis revealed several important findings:
 Gender-Based Academic Performance: We found that, on average, female students

tended to perform slightly better than male students in terms of both individual exam
scores and final course grades. However, it's essential to note that these differences
were not substantial.
 Impact of Individual Exams: We observed that individual exam scores, particularly those
of exam 2, had a significant positive correlation with the final course grade. Higher
scores on exam 2 were associated with higher overall course grades. This suggests that
exam 2 may have a more substantial influence on the final outcome compared to the
other exams.
 Variability in Academic Performance: Our analysis also highlighted the wide variability in
both exam scores and course grades. This suggests that factors beyond gender and exam
performance may be contributing to students' overall academic success.
12

MAS202 Final Project

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MAS202 Final Project

Uploaded by

Copyright:

Available Formats

MAS202 Final Project

Part I: Introduction and Methodology:

 semester: Semester when grades were recorded.

Main issues to address:

Part II: Descriptive Statistics:

Mean 80.41954936 72.6055794 75.47958927 72.2388309

Mode 80 83.5 76 #N/A

Box and whisker plot:

Figure 2: Box-and-whisker plot of Exam1 variable

Figure 3: Box-and-whisker plot of Exam2 variable

Figure 5: Box-and-whisker plot of Exam4 variable

Part III: Inferential Statistics:

Problem 1: Construct Confidence Interval and Hypothesis Testing on the difference in

Note: index 1 is for men and index 2 is for women.

Problem 2: Construct Confidence Interval and Hypothesis Testing on the difference in

Note: index 1 is for men and index 2 is for women.

Problem 3: Regression Analysis and ANOVA analysis.

^y ( the final course grade )=72.1838 5

Regression coefficient of the significance independent variables:

 For exam1: 0.2431

alpha P-value comparison decision

Part IV: Conclusion

 Gender-Based Academic Performance: We found that, on average, female students

You might also like