Professional Documents
Culture Documents
Case Study Instructions Feb 2023
Case Study Instructions Feb 2023
INSTRUCTIONS
• This is a group assignment of maximum FOUR (4) students only (same program & tutorial lecturer).
• Remain the same group as registered in UniKL VLE previously; select a leader
• It is up to the members of the group to make sure everyone contributes equally. Plan your schedules so
that you will have time to work together on the project.
• This case study should be a well-organized typed report, insert all statistical output, work collaboratively
by all team members and SUBMIT to your lecturer “SUBMISSION FOLDER” in VLE as PDF format (1 file only
submitted by team leader).
• Use ONLY the assignment cover that has been uploaded in your VLE.
• Attach the Case Study question.
• Attach a clear individual photo of each member and one group photo while doing this project.
Simple linear regression is a statistical method that allows us to summarize and study relationships between
two continuous (quantitative) variables:
Simple linear regression gets its adjective "simple," because it concerns the study of only one predictor
variable. In contrast, multiple linear regression, gets its adjective "multiple," because it concerns the study of
two or more predictor variables.
Scientists and engineers often collect data in order to determine the nature of a relationship between two
quantities. For example, a chemical engineer may run a chemical process several times in order to study the
relationship between the concentration of a certain catalyst and the yield of the process. Each time the process
is run, the concentration X and the yield Y are recorded. The experiment thus generates bivariate data; a
collection of ordered pairs (x1, y1), …., (xn, yn). In many cases, ordered pairs generated in a scientific experiment
will fall approximately along a straight line when plotted. In these situations, the data can be used to compute
an equation for the line.
This equation can be used for many purposes; for example, in the catalyst versus yield experiment just
described, it could be used to predict the yield Y that will be obtained the next time the process is run with a
specific catalyst concentration X. This is generally referred to as predictive modeling.
When you choose to analyze your data using linear regression, part of the process involves checking to make
sure that the data you want to analyze can actually be analyzed using linear regression. You need to do this
because it is only appropriate to use linear regression if your data "passes" these assumptions that are
required for linear regression to give you a valid result.
Do not be surprised when analyzing your own data, one or more of these assumptions is violated (i.e., not met).
This is common when working with real-world data rather than textbook examples, which often only show you
how to carry out linear regression when everything goes well.
***Source:
1) https://online.stat.psu.edu/stat500/lesson/9/9.2/9.2.3
2) https://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/R/R5_Correlation-Regression/R5_Correlation-Regression4.html
3) https://statistics.laerd.com/spss-tutorials/linear-regression-using-spss-statistics.php
In this statistical experiment of simple linear regression analysis, you are going to develop a
mathematical model of relationship between one dependent variable (Y) and one independent variable (X). In
this project, you are going to obtain your own data set. A good-size data set should contain somewhere between
50 and 100 observations. Make sure you choose example for which it is feasible to gather the data in a relative
short period of time. The data must be related to your major of study. You are free to choose your own example
but it has to be authentic (not plagiarized from another group).
The selection of topic has to be done through Madam Zatul Iradah or Madam Siti Esah (office M008) before
9 June 2023 (Friday). Every group has to register their topic and only appointed leader is accessible to do so.
The source of your data must be recorded in your case study.
3. Perform data checking (six assumptions) to make sure that the data you want to analyze can actually
be analyzed using linear regression.
5. By using statistical software MINITAB, fit the scatterplot with a regression line.
(a) Report the parameter estimates (the estimates of the intercept and slope).
(b) What is the regression equation for this data set?
(c) Does the slope match what you expected in Question 4(b)? Is the independent variable (X)
useful in predicting the dependent variable (Y)? Why?
(d) Construct a 95% confidence interval for the slope. Interpret the result.
(e) Test at 5% significance level whether the slope is not zero. Based on the result, decide if there
is a useful liner relationship between independent variable (X) and dependent variable (Y).
6. What is the correlation coefficient of the linear regression equation? What does this tell you? Explain.
7. What is the coefficient of determination? What percentage of the variation in the dependent variable
is explained by the independent variable? Is that high or low? What does this tell you about the
accuracy of your prediction? Explain.
8. Use your model to predict two dependent values when any two independent values are chosen. Explain
your finding.
10. List all the references used in the report using a proper reference format.