Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

IDEA ENGINEERING

EXPERIMENT OF STATISTICS APPLICATION

“ANALYSIS OF MULTIPLE LINEAR REGRESSION USING R-STUDIO”

Submitted to fulfill the assignment taught by Sudianto Manullang,S.Si., M.Sc.

ARRANGED BY :

Group 8 – MESP 2019

1. FEBRIYANTI SAYEKTI (4191111016)

2. SYAFIRA FATIHAH RIZQI (4192111002)

BILINGUAL MATHEMATICS EDUCATION STUDY PROGRAM

FACULTY OF MATHEMATICS AND NATURAL SCIENCES

STATE UNIVERSITY OF MEDAN

2021
PREFACE
Praise be to Allah SWT. Almighty God, for the completion of the Critical Journal
Report on the Experiment statistics application course, which is one of the compulsory
reports to fulfill college assignments.

I would like to express my deep gratitude to Mr. Sudianto Manullang,S.Si., M.Sc. as a


lecturer in this course who give me confidence in compiling and completing the Critical
Journal Report with its best.

As an imperfect human, I realize that in the writing of this Critical Journal Report,
there are many deficiencies and errors in writing. So that these mistakes later can become
learning for the next better writing. I have tried my best to complete this report.

Finally, I hope that this Critical Journal Report will be of benefit to all reader, thank
you.

Medan, November 2021

Group 8

i
CONTENTS

PREFACE ................................................................................................................................................... i
CONTENTS ............................................................................................................................................... ii
CHAPTER I INTRODUCTION ..................................................................................................................... 1
1.1 Background of The Problem ................................................................................................... 1
1.2 Formulation of The Problem ................................................................................................... 1
1.3 Benefit ..................................................................................................................................... 1
CHAPTER II THEORETICAL REVIEW.......................................................................................................... 3
2.1 Definition of Regression ......................................................................................................... 3
2.2 Multiple Linear Regression Analysis ...................................................................................... 3
2.3 R- Studio ................................................................................................................................. 4
CHAPTER III SIMPLE CASE AND PROCEDURE .......................................................................................... 5
3.1 Simple Case ............................................................................................................................ 5
3.2 Procedure ................................................................................................................................ 5
CHAPTER IV MANUAL CALCULATION AND INTERPRETATION OF OUTPUT ............................................ 8
4.1 Manual Calculation ................................................................................................................. 8
4.2 Interpretation of Output ........................................................................................................ 10
CHAPTER V CLOSING ............................................................................................................................. 12
5.1 Conclusion .................................................................................................................................. 12
5.2 Sugestion ..................................................................................................................................... 12
REFERENCES .......................................................................................................................................... 14

ii
CHAPTER I
INTRODUCTION
1.1 Background of The Problem
Regression Analysis Using Software R- Studio basically a study of the dependence of the
dependent variable (bound) with one or more independent variables (explanatory/independent
variables), with the aim of estimating and'or predicting the population mean or the mean
value of the dependent variable. based on the known value of the independent variable . The
results of the regression analysis are in the form of coefficients for each independent variable.
This coefficient is obtained by predicting the value of the dependent variable with an
equation.
The regression coefficient is calculated with two objectives at once, first, minimizing the
deviation between the actual value and the estimated value of the dependent variable based
on existing data. In regression analysis, in addition to measuring the strength of the
relationship between two or more variables, it also shows the direction of the relationship
between the dependent variable and the independent variable. The dependent variable is
assumed to be random/stochastic. which means it has a probubilistic distribution, the
independent/independent variable is assumed to have a fixed value (in repeated sampling).
Therefore, we created this paper with the title "Analysis of Multiple Linear Regression
using R-Studio"
1.2 Formulation of The Problem
The formulation of the problem from this Idea Engineering paper is as follows:
1. What is Multiple Linear Regression Analysis?
2. What is the purpose of Multiple Linear Regression Analysis?
3. How to do Multiple Linear Regression Analysis using R-Studio?
4. What are the results and interpretations of the cases made on Multiple Linear
Regression using R-Studio?
1.3 Benefit
The objectives of this Idea Engineering paper are:
1. To know the meaning of Multiple Linear Regression Analysis
2. To know the purpose of Multiple Linear Regression Analysis
3. To find out the procedure for doing Multiple Linear Regression Analysis using R-
Studio
4. To find out What are the results and interpretations of the cases made on Multiple
Linear Regression using R-Studio
1
2
CHAPTER II
THEORETICAL REVIEW

2.1 Definition of Linear Regression


In statistics, linear regression is an approach to model the relationship between a (one or
more) dependent variable and one ( simple linear regression ) or more independent variables
(multiple linear regression). One application of linear regression is to make predictions based
on previously owned data. Assuming the relationship between these variables can be
approximated by a straight line equation, the model that approximates the relationship
between the variables in the data is called a linear regression model.
Linear regression is a simple analytical model with interval or ratio data types. Through
this analysis, researchers can make predictions based on the data obtained. In general, linear
regression is used to determine whether the independent variables studied have a significant
correlation to the dependent variable. In addition, this analysis can also be used to determine
which variables have a significant effect on the dependent variable.

2.2 Multiple Linear Regression Analysis


Multiple linear regression analysis is a form of linear regression analysis in which the
independent variable is more than one. Regression analysis is an analysis that can be used to
measure the effect of an independent variable on the dependent variable. This method of
analysis is one of the most widely used analyzes for reasons of simplicity and sufficient
power to explain the effect of an independent variable on the dependent variable. There are
many conditions that we can test with linear regression analysis.
If the researcher wants to use this analysis, there are several classical assumptions that
must be met, including:
a) Data in the form of intervals or ratios
b) Have linearity
c) Residual is normal
d) Avoid Heteroscedasticity
e) Non Multicollinearity
When all assumptions have been met, a new regression analysis can be carried out using the
equation:

3
Based on the equation above, the matrix form of the multiple linear regression model
can be written as follows:

2.3 R- Studio
R Program
R is a statistical and graphics computing program ( R Core Team 2021 ) . Currently, R
is widely recognized as one of the most powerful software for data analysis and data science .
Of course, apart from R, there are many other software that are also often used for data
analysis, such as Python. R was created with the original purpose of statistical computing and
graphics. Initially used by scientists in their research and academics. However, with the
development of technology, the scope of R's capabilities as a programming language has
become much wider. You can create and update reportsroutinely use R Markdown. You can
also create interactive web applications or dashboards with the shiny package. Since R was
designed for data analysis and development and its capabilities cover almost all lines of data
analysis, it's no wonder today many data analysts and data scientists are using R to solve their
problems. Here are some of R's capabilities.
R - Studio
R Studio is an Integrated Development Environment (IDE) for R that is widely used today. It
can be said that almost all R users who are familiar with RStudio will prefer to use R via R
Studio rather than using R GUI. Download the desktop version of RStudio according to your
needs. It is highly recommended to use the latest version of RStudio, as well as R. R and
RStudio are two different programs. You don't have to have RStudio installed to use R
programs (via the R GUI). But you are required to install R first before installing and using
RStudio because RStudio requires an R program already installed on your PC or server.
Figure 1.6 illustrates this analogy of the R being like the frame of a car and its engine, while
the RStudio is like the exterior of a car and its interior. You will not be able to use the car if
you only have the outer frame and dashboard (RStudio) only.

4
CHAPTER III
SIMPLE CASE AND PROCEDURE

3.1 Simple Case


It is suspected that the size of a student's exam score depends on the size of the
intelligence test score and the frequency of truancy. For this purpose, observations were
made on 12 students by noting the frequency of truancy, intelligence test scores and exam
scores, the data from these variables are as follows:
No intelligence test score (X1) frequency of truancy (X2) exam scores (Y)
1 75 4 85
2 60 7 75
3 65 6 75
4 75 2 90
5 65 2 85
6 80 3 87
7 75 2 95
8 80 3 95
9 65 4 80
10 80 3 90
11 60 5 75
12 65 5 75

3.2 Procedure
1. Open R studio application on your PC
2. Import data that will be analyzed by click Environment then select Import
Document. Select the document format from the data that you saved previously in
spss or excel (in this case Ms. Excel is used) so select from excel

a.

5
b.
3. Then a dialog box appears. Click Browser, then select the data you want to
analyze and then click Open. After that the selected data will appear in the Data
Preview, if it is appropriate, click Import

4. The imported data will appear in the Source/Editor Window. To do multiple linear
regression analysis, type the script as follows in the script window
a. model <- lm(Y~X1+X2, Data), then click run
b. summary (model)

5. Click run again and press enter to display the results of the multiple linear
regression test. Then the results of multiple linear regression will appear in the
Console Window.

6
7
CHAPTER IV
MANUAL CALCULATION AND INTERPRETATION OF OUTPUT

4.1 Manual Calculation


1. Hypothesis
𝐻o : There is no significant influence between intelligence test score and frequency of
truancy with exam scores
𝐻 : There is a significant influence between intelligence test score and frequency of
truancy with exam scores
2. Table of Data
X X2 X1 X2
X1 2 Y X1Y Y X2 X12 2
Y2
75 4 85 6375 340 300 5625 16 7225
60 7 75 4500 525 420 3600 49 5625
65 6 75 4875 450 390 4225 36 5625
75 2 90 6750 180 150 5625 4 8100
65 2 85 5525 170 130 4225 4 7225
80 3 87 6960 261 240 6400 9 7569
75 2 95 7125 190 150 5625 4 9025
80 3 95 7600 285 240 6400 9 9025
65 4 80 5200 320 260 4225 16 6400
80 3 90 7200 270 240 6400 9 8100
60 5 75 4500 375 300 3600 25 5625
65 5 75 4875 375 325 4225 25 5625

∑= =4 ∑=1 ∑=7 ∑=3 ∑=3 ∑=6 ∑= ∑=8
845 6 007 1485 741 145 0175 206 5169
Where X1 = intelligence test score, X2 = frequency of truancy, and Y = exam scores
3. Calculate b value

8
4. Determine the regression equation
𝑌̂ = 𝑏0 + 𝑏1𝑋1 + 𝑏2𝑋2 𝑌
𝑌̂ = 55,78 + 0,527 𝑋1 – 2,34 𝑋2
5. Test the regression equation by calculating the value of R :

6. Determine F value

7. F table
α = 0,05
𝑁1 = 𝑚 − 1 = 2 − 1 = 1

9
𝑁2 = 𝑛 − 𝑚 − 1 = 12 − 2 − 1 = 9
F table = (𝑁1, 𝑁2) = (1,9) = 5,12
8. Decision
𝐹𝑐𝑎𝑙𝑐 < 𝐹𝑡𝑎𝑏𝑙𝑒 ➔ Ho is accepted
𝐹𝑐𝑎𝑙𝑐 > 𝐹𝑡𝑎𝑏𝑙𝑒 ➔ Ho is rejected
8. Conclusion
We can see that 𝐹𝑐𝑎𝑙𝑐 > 𝐹𝑡𝑎𝑏𝑙𝑒: 31,8 > 5,12. That means 𝐻o is rejected and 𝐻𝑎 is
accepted. And the conclusion is there is a significant influence between intelligence test
score and frequency of truancy with exam scores simultaneously

4.2 Interpretation of Output

There are two things that we can observe from the output above, which are the F statistical
test and the t statistical test. The F statistic test is used to see whether the independent
variables simultaneously influence the dependent variable. While the t statistic test is to see
the influence of one independent variable individually in explaining the variation of the
dependent variable. From the output above, the F value is 31,85 with p-value < 0.05, so we
can conclude that the variables X1 and X2 simultaneously influence the variable Y. From the
output, it can also be seen that the multiple R-squared (𝑅2 ) is 0.8762. That means that the
variables 𝑋1 and 𝑋2 are able to explain 87,62% of the variation in the 𝑌 variable, while the
remaining 12,38% is explained by other variables. From the t-test, we see that the t-value of
𝑋1 is 3.371 and the regression coefficient is 0,5271, while the t-value of 𝑋2 is -3,147 and the

10
regression coefficient is -2,3436. The significance test of the two variables individually was
obtained that each 𝑋1 and X2 are influence Y variable significantly (p < 0.05). From the
coefficient output part, the regression equation is as follows: 𝑌̂ = 55,7803 + 0,5271 𝑋1 –
2,3436 𝑋2

11
CHAPTER V
CLOSING

5.1 Conclusion
1. Multiple linear regression analysis is a form of linear regression analysis in which
the independent variable is more than one. Regression analysis is an analysis that
can be used to measure the effect of an independent variable on the dependent
variable.
2. The purpose of multiple linear regression analysis is knowing how much
influence the size of the intelligence test score and the frequency of truancy
depends on the size of a student's exam score.
3. The procedure for doing multiple linear regression analysis by using R-Studio are
:
a) Open R studio application on your PC
b) Import data that will be analyzed by click Environment then select Import
Document. Select the document format from the data that you saved
previously in spss or excel (in this case Ms. Excel is used) so select from excel
c) Then a dialog box appears. Click Browser, then select the data you want to
analyze and then click Open. After that the selected data will appear in the
Data Preview, if it is appropriate, click Import
d) The imported data will appear in the Source/Editor Window. To do multiple
linear regression analysis, type the script as follows in the script window
a. model <- lm(Y~X1+X2, Data), then click run
b. summary (model)
e) Click run again and press enter to display the results of the multiple linear
regression test. Then the results of multiple linear regression will appear in the
Console Window.
4. As for the results and interpretation of the Multiple Linear Regression case that
we madeusing R-Studio are: 𝑌̂ = 55,78 + 0,527 𝑋1 – 2,34 𝑋2

5.2 Sugestion
Our advice to readers is that before using R for beginners, it is better to see and pay
attention to the menus in R before using it so as not to be overwhelmed in analyzing
statistical data. We also that this paper useful for readers in analyzing multiple regression
data using R-Studio.

12
13
REFERENCES

Amrin.2016. Data Mining dengan Regresi Linier Berganda untuk Peramalan Tingkat Inflasi.
Jurnal Techno Nusa Mandiri,18(1)

Janie , D. N. 2012. Statistik deskriptif dan regresi linier berganda dengan spss. Semarang :
Semarang University Press

Ningsih,S. 2019. Penerapan Metode Suksesif Interval pada Analsis Regresi Linier Berganda.
Jambura Journal of Mathematics.Vol 1 (1). Page 43-53

Spiegel, M.R. 2004. Statistika. Jakarta : Erlangga

14

You might also like