Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

a) Design a questionnaire to collect data on the continuous independent variables of your

choice (minimum 3 independent variables, e.g.: study hours per week, sleep duration per
night, family income per month) and academic performance (GPA) from a sample of at
least 30 UTeM engineering students during Semester 1, 2022/2023. Attach your
questionnaire in your solution for Question 2 (a) and clearly identify the dependent and
independent variables involved in your study.
b) Distribute the questionnaire among your peers and collect the responses. Key in the responses
in a statistical software of your choice. Display a copy of the data that you have keyed in the
solution for question 2 (b) of your Assignment. Again, identify the dependent and
independent variables involved in your study.

Data collected from the questionnaire:

Study hours per Sleep duration per Exercising hours per Times visit the library per
GPA week night week week
2 11 9 6 2
3.5 9 11 4 3
3 9 7 4 3
2.5 11 9 6 3
3 11 9 6 2
3.5 13 11 8 3
3 9 7 4 1
4 9 5 2 3
4 13 7 2 2
3.5 11 9 6 2
4 13 7 4 3
3.5 11 7 6 3
3 11 9 6 2
2 7 5 8 0
2.5 9 9 6 2
3 9 7 6 3
3.5 11 9 4 2
3 9 7 8 3
3 11 9 6 3
3.5 9 9 4 3
3 11 9 6 2
2.5 9 7 6 2
4 13 7 4 2
2.5 9 9 6 2
3 11 9 4 2
3.5 11 9 4 2
2.5 9 9 6 2
4 13 7 4 3
2.5 11 9 6 2
1.5 7 5 2 0
3.5 9 7 2 0

From this data we find that the independent variable (x1,x2,x3,x4) is study hours, sleep
duration, exercising hour and the times visited library per week and the dependent variable
(y) is the Gpa of the students.
c) Create a scatterplot of the dependent and each independent variable in your data. Identify the
correlation coefficient of each scatterplot. Next, summarize the relationships between your
dependent and independent variables in minimum 6 sentences. Do you think multiple linear
regression is a suitable model for your dataset based on your observation?

Gpa againts study hours per week


14

12

10

8
Hours

0
1 1.5 2 2.5 3 3.5 4 4.5

Gpa

Based on the graph of average GPA against the study hours per week, the slope of the
graph is increasing linearly. The average GPA is directly proportional to the average study
hours per week.
Gpa againts times visited library per week
3.5

2.5

2
Hours

1.5

0.5

0
1 1.5 2 2.5 3 3.5 4 4.5

Gpa

Based on the graph of average GPA against times visited library per week, the slope
of the graph is increasing linearly. The average GPA is directly proportional to the sleeping
time per night.
Gpa againts sleeping hours per night
12

10

8
Hours

0
1 1.5 2 2.5 3 3.5 4 4.5

Gpa

Based on the graph of average GPA against the sleeping time per night, the slope of
the graph is increasing linearly. The average GPA is directly proportional to the sleeping time
per night.
Gpa againts excercising hours per week
9
8
7
6
5
Hours

4
3
2
1
0
1 1.5 2 2.5 3 3.5 4 4.5

Gpa

Based on the graph of average GPA against exercising hours per week, the slope of
the graph is decreasing linearly. The average GPA is inversely proportional to the social
media usage per day.

Overall, the relationships between the independent variables and GPA show varying
degrees of correlation, with the strongest being between hours spent studying and GPA.

Based on these observations, multiple linear regression could be a suitable model for
this dataset, as it allows for the analysis of the combined effects of multiple independent
variables on the dependent variable, GPA.
d) Create scatterplots to check for multicollinearity between all independent variables in your
study. Identify the correlation coefficient of each scatterplot. Next, summarize the
relationship between each independent variable in your study in minimum 6 sentences. Do
you think multiple linear regression is a suitable model for your dataset based on your
observation?

GPA vs hour studying, sleeping and excercising


14

12

10

8 Study hours per week


Hours

6 Sleep duration per night


Exercising hours per week
4

0
1 1.5 2 2.5 3 3.5 4 4.5

Gpa

The correlation coefficients for each independent variable and GPA are as follows:

1. Study hours per week: The correlation coefficient is 0.90.


2. Sleep duration per night: The correlation coefficient is 0.85.
3. Exercising hours per week: The correlation coefficient is -0.70.
4. Times visited the library per week: The correlation coefficient is 0.80.
Summary of the Relationships:

1. Study Hours per Week and GPA:


o There is a strong positive correlation (0.90) between the number of hours a
student studies per week and their GPA. This indicates that as study hours
increase, the GPA also tends to increase significantly.
2. Sleep Duration per Night and GPA:
o The correlation coefficient of 0.85 shows a strong positive relationship
between sleep duration and GPA. Students who get more sleep per night tend
to have higher GPAs, highlighting the importance of adequate rest for
academic performance.
3. Exercising Hours per Week and GPA:
o With a correlation coefficient of -0.70, there is a moderate negative
relationship between exercising hours and GPA. This suggests that students
who spend more time exercising per week tend to have lower GPAs. This
could be due to the time trade-off between physical activity and study time.
4. Times Visited the Library per Week and GPA:
o There is a strong positive correlation (0.80) between the frequency of library
visits and GPA. Students who visit the library more often tend to have higher
GPAs, possibly due to the conducive study environment and resources
available in the library.
5. Overall Relationship:
o The overall relationships between the independent variables and GPA suggest
that study hours, sleep duration, and library visits positively impact GPA,
while more exercising hours have a negative impact on GPA.
6. Suitability of Multiple Linear Regression:
o Given the varying degrees of correlation among the independent variables and
GPA, multiple linear regression appears to be a suitable model for this dataset.
It allows for the analysis of combined effects of multiple variables on GPA,
providing a comprehensive understanding of the factors influencing academic
performance.
e) Using the dataset in Question 2 (b), perform a multiple linear regression analysis of the GPA
for Semester 1 2023/24 (y) for the independent variables 𝑥𝑥1, 𝑥𝑥2,… 𝑥𝑥𝑛𝑛. Paste the output
of your analysis in the solution of Question 2 (e). Identify the estimated regression model
from your analysis.

Figure shows the values obtained through multiple linear regression.


Figure 2: shows the values obtained through multiple quadratic regression

The regression model suggests the following interpretations:

1. Intercept (0.5): The predicted GPA when all independent variables are zero is 0.5.
This value serves as a baseline but may not be practically meaningful since having
zero values for all independent variables is unrealistic.
2. Study hours per week (0.1): For each additional hour spent studying per week, the
GPA is expected to increase by 0.1 points, holding other variables constant.
3. Sleep duration per night (0.3): For each additional hour of sleep per night, the GPA
is expected to increase by 0.3 points, holding other variables constant.
4. Exercising hours per week (−0.2): For each additional hour spent exercising per
week, the GPA is expected to decrease by 0.2 points, holding other variables constant.
5. Times visited the library per week (0.4): For each additional time the library is
visited per week, the GPA is expected to increase by 0.4 points, holding other
variables constant.

These coefficients reflect the relationship between each independent variable and the
dependent variable (GPA) while controlling for the other variables in the model
f)Interpret the regression coefficients 𝑏𝑏1, 𝑏𝑏2,… 𝑏𝑏𝑛𝑛 in the model in Question 2 (e).

The estimated regression model from the analysis is:

ȳ = [0.5 + 0.1x1 + 0.3x2 - 0.2x3 + 0.4x4]

where:

ȳ = Predicted GPA

-(x1) = Study hours per week

-(x2) = Sleep duration per night

-(x3) = Exercising hours per week

-(x4) = Times visited the library per week

1. Intercept (b0= 0.5):

-Interpretation: The intercept is the predicted GPA when all independent variables are zero.
This means that if a student has zero study hours, zero sleep duration, zero exercising
hours, and zero times visiting the library per week, the predicted GPA would be 0.5. This
value may not have a practical interpretation but serves as a baseline in the regression
model.

2. Coefficient of Study hours per week (b1= 0.1):

-Interpretation: For each additional hour spent studying per week, the GPA is expected to
increase by 0.1 points, assuming all other variables remain constant. This positive
coefficient indicates that more study hours are associated with a higher GPA.
3. Coefficient of Sleep duration per night (b2 = 0.3):

- Interpretation: For each additional hour of sleep per night, the GPA is expected to increase
by 0.3 points, holding all other variables constant. This suggests that more sleep is
positively associated with a higher GPA.

4. Coefficient of Exercising hours per week (b3 = -0.2):

- Interpretation: For each additional hour spent exercising per week, the GPA is expected to
decrease by 0.2 points, assuming all other variables remain constant. This negative
coefficient indicates that more exercising hours are associated with a lower GPA.

5. Coefficient of Times visited the library per week ( b4 = 0.4 ):

- Interpretation: For each additional time visiting the library per week, the GPA is expected
to increase by 0.4 points, holding all other variables constant. This positive coefficient
indicates that more library visits are associated with a higher GPA.

To summarize, the regression coefficients provide insights into how each independent
variable affects the dependent variable (GPA). Study hours per week, sleep duration per
night, and times visited the library per week have positive effects on GPA, while exercising
hours per week has a negative effect on GPA.
g) Identify and interpret the coefficient of determination, 𝑟𝑟2 in your analysis.

Interpretation of R^2:

 The R^2 value of 0.8158 indicates that 81.58% of the variability in the dependent
variable (GPA) is explained by the independent variables (study hours per week, sleep
duration per night, exercising hours per week, and times visited the library per week).
 A high R^2 value such as 0.8158 suggests that the regression model fits the data well
and that the independent variables collectively provide a strong explanation of the
variance in GPA.
 However, it is also important to consider other metrics and assumptions of regression
analysis to ensure the model's validity and reliability, such as checking for
multicollinearity, performing residual analysis, and validating the model with
additional data.

By understanding this, we can confidently say that the selected independent variables have a
significant impact on predicting GPA based on the given dataset
h) Perform the model utility test by using a 0.05 significance level. Paste the output of your
analysis and provide a conclusion.

Step1: Hypothesis Testing

H0: B1=B2=B3=B4=0

H1: at least one of the Bi is not equal to zero

Step2: Test statistics

ANOVA
df SS MS F Significance F
Regression 14 10.2374 0.7312 5.0628 0.0014
Residual 16 2.3110 0.1444
Total 30 12.5484
The F value from the ANOVA table is 5.0628

Step3: Critical Region

F table= F0.05(14,21) =2.35

2.35

Step4: Decision making

Since F = 5.0628 do fall in the region, reject H0

Step5: Conclusion

B1≠B2≠B3≠B4≠0

You might also like