Multipkle Regression

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Multiple Regression and

Correlation Matrix

2016

Course Title: Business Statistics II


Course Code: STS 201

Submitted To:
Dewan Muktadir-Al-Mukit
Assistant Professor
Faculty of Business Administration
Eastern University

Submitted By:
Tanjina Alam Jhumur
ID: 142 200 104
Syed Sujibur Rahman
ID: 153 200 087
Sharmin Aktar Juli
ID: 133 200 012
Arifa Ahmed
ID: 143 200 056
Section: 4
Date of Submission: 18th April, 2

18th April, 2016


To
Dewan Muktadir-Al-Mukit

LETTER OF TRANSMITTAL

Assistant Professor
Faculty of Business Administration
Eastern University
Subject: Request for accepting the report.
Dear Sir,
We would like to draw your kind attention that we are submitting our report
about the Multiple Regression and Correlation Matrix. We have tried
our best to prepare this report which has fulfilled your requirements. We
believe that all these ideas from this report will help us in our future
practical life.
We will be highly grateful to your honors if you would kindly accept our
report and obliged thereby.
Thanking you,
On behalf of
The entire group member

________________________
Tanjina Alam Jhumur

ACKNOWLEDEMENT

T first we would like to express our deepest gratitude to Allah for giving us the
strength and the composure to finish the task
within the scheduled time.

Acknowledging the debt is not

We would also like to express our gratitude from the core of

easy for us as we are indebted to

our heart to our mentor Dewan Muktadir-Al-Mukit,

so many people. We take this

Assistant Professor of Faculty of Business Administration of

opportunity in expressing the

Eastern University who helped us in coordinating our entire

fact that this project report is

report. His consistent support and cooperation showed the

the result of incredible amount

way towards the successful completion of report.

of encouragement, co-operation,
and moral support that we have

And finally, we like to say that we have tried and soul to

received from others.

prepare this report accurately. However, there might be


some errors and silly mistakes due to our aptitude and time
constraint. In this regard, we seek your kind consideration
and were in the process of learning.

II

EXECUTIVE SUMMARY

The report analyzes the multiple regression & correlation matrix we collected our data from
primary sources such as survey
The introduction part of the report provides introduction about multiple regression &
correlation matrix. It also provides objectives of the report, methods of data collection &
limitations of the study.
The findings & analysis part of the report provides the data that we have collected from
survey. The analysis part interprets the following things:

Find the equation of regression and Interpret.

Estimate the relationship among the variables in relative terms

Assess the explanatory power of the independent variables

Assess the significance of the results

Ascertain whether there is a problem of Multicollinearity.

Assess Mean, Median, Mode

Assess variance, geometric mean & standard deviation

III

Table of Contents
LETTER OF TRANSMITTAL .................................................................................................................... I
ACKNOWLEDEMENT ........................................................................................................................... II
EXECUTIVE SUMMARY ....................................................................................................................... III
Chapter One: Introduction: .................................................................................................................... 1
1.1

Introduction ............................................................................................................................ 1

1.2

Origin of the report ................................................................................................................. 1

1.3

Objective of the report ............................................................................................................ 2

1.4

Methodology of data collection ........................................................................................ 2

1.5

Limitations of the study .......................................................................................................... 2

Chapter Two: Overview of Multiple Regression & Correlation Coefficient ......................................... 4


Multiple Regression ........................................................................................................................ 4
Multiple Regression Correlation Coefficient .................................................................................. 4
Assumptions.................................................................................................................................... 5
Chapter Three: Findings & Analysis ....................................................................................................... 7
Findings: .............................................................................................................................................. 7
3.1

DATA File (House Rent Survey) ........................................................................................... 7

Analysis: .............................................................................................................................................. 8
Part One: Multiple Regression & Correlation Matrix.......................................................................... 8
3.2

Summary Output ................................................................................................................. 8

3.3

Interpretation...................................................................................................................... 9

3.3.1

Developing the hypothesis: ........................................................................................ 9

3.3.2

Multiple Regression Equation: .................................................................................... 9

3.3.3

Assessing the relationship in relative Term: ............................................................... 9

3.3.4

Assessing the Explanatory Power of Independent Variables:..................................... 9

3.3.5

Assessing the significance of the result: ................................................................... 10

3.3.6

Assessing the problem of Multicollinearity: ............................................................. 10

Part Two: Mean, Median & Mode .................................................................................................... 11


3.4

Frequency, Relative Frequency, Cumulative frequency ................................................... 11

3.5

Mean ................................................................................................................................. 11

3.6

Arranging data from largest to smallest: .......................................................................... 12

3.7

Median: ............................................................................................................................. 12

3.8

Mode: ................................................................................................................................ 12

3.9

Mean Deviation ................................................................................................................. 12

3.10

Population Variance & Standard Deviation ...................................................................... 13

3.11

Geometric Mean ............................................................................................................... 14

Chapter Four: Conclusion ..................................................................................................................... 16


4.1

Conclusion ............................................................................................................................. 16

APPENDIX: ......................................................................................................................................... 17
Survey Data ................................................................................................................................... 17

Chapter One

Introduction

Chapter One: Introduction:


1.1

Introduction

Multiple regression is a flexible method of data analysis that may be appropriate whenever a
quantitative variable (the dependent or criterion variable) is to be examined in relationship to
any other factors (expressed as independent or predictor variables). Relationships may be
nonlinear, independent variables may be quantitative or qualitative, and one can examine the
effects of a single variable or multiple variables with or without the effects of other variables
taken into account.
Correlation and regression analysis are related in the sense that both deal with relationships
among variables. The correlation coefficient is a measure of linear association between two
variables. Values of the correlation coefficient are always between -1 and +1. A correlation
coefficient of +1 indicates that two variables are perfectly related in a positive linear sense; a
correlation coefficient of -1 indicates that two variables are perfectly related in a negative
linear sense, and a correlation coefficient of 0 indicates that there is no linear relationship
between the two variables. For simple linear regression, the sample correlation coefficient is
the square root of the coefficient of determination, with the sign of the correlation coefficient
being the same as the sign of b1, the coefficient of x1 in the estimated regression equation.
Neither regression nor correlation analyses can be interpreted as establishing cause-and-effect
relationships. They can indicate only how or to what extent variables are associated with each
other. The correlation coefficient measures only the degree of linear association between two
variables. Any conclusions about a cause-and-effect relationship must be based on the
judgment of the analyst.

1.2

Origin of the report

This report has been prepared to make a study on the Multiple Regression & Correlation
Matrix as a part of the Business Statistics II course required for the BBA program of the
Faculty of Business Administration of Eastern University.

The report was prepared under the supervision of Dewan Muktadir-Al-Mukit, Assistant

Professor of Faculty of Business Administration of Eastern University. We are very much


thankful to him for assigning us such types of report work.

1.3

Objective of the report

Everything in life holds some kinds of objectives to be fulfilled. This report is not an exception to
it. The following are a few straight forward objectives which we have tried to fulfill in the report:

Develop the Hypothesis

Find the equation of regression and Interpret.

Estimate the relationship among the variables in relative terms

Assess the explanatory power of the independent variables

Assess the significance of the results

Ascertain whether there is a problem of Multicollinearity.

Assess Mean, Median, Mode

Assess variance, geometric mean etc

1.4

Methodology of data collection

For smooth and accurate data collection we have to follow some rules &
regulation. The report inputs were collected from primary sources such as survey.

1.5

Limitations of the study

2. Due to time restraints it was not possible to study in depth.


3. From the secondary sources we have not get available relevant data.
4. Since the financial matters are sensitive in nature the same could not acquired easily.

Chapter TWO

Overview of Multiple Regression &


Correlation Coefficient

Chapter Two: Overview of Multiple


Regression & Correlation Coefficient
2.1.: Multiple Regression
Multiple regression is a statistical technique to understand the relationship between one
dependent variable and several independent variables.
The purpose of multiple regression is to find a linear equation that can best determine the
value of dependent variable Y for different values independent variables in X.
The basic equation of Multiple Regression is
Y = a + b1X1 + b2X2 + b3X3 + + bNXN
The value of b1 is the slope of regression line of Y against X1. Same is the case with b2,
b3 and so on. These values are then used to minimize the difference between actual and
expected value of Y. The difference gives rise to another parameter called Coefficient of
Multiple Regression (R2) whose value can range from 0 (for no relationship between Xiand
Y) to 1 (perfect relationship between Xi and Y).
Only those independent variables with high values of R2 are included in the equation of
multiple regression.
There are mainly two uses of Multiple Regression Equation.
1. For Prediction: Here the regression equation is used to predict the value of
independent variable Y for different values of dependent variables in X.
2. For identification of causes: Here the regression equation is used to find the nature of
relationship between dependent variable and independent variables. Here we can find
how the dependent variable changes according to changes in independent variables.
Suppose we want to determine various factors affecting the short-listing criteria for the
interview of a renowned organization. Let the cumulative score (Y) is determined by the
graduation percentage(X1), participation in extra-curricular activities (X2), number of
state/national level competitions won(X3), positions of responsibility held(X4) etc. Then the
multiple regression function is given by
Y = 1 + 0.5X1 + 0.2X2 + 0.15X3 + 0.15X4
where coefficients 0.5, 0.2, 0.15, 0.15 are determined by a simple linear regression between
Y and each of Xis.

2.2.: Multiple Regression Correlation Coefficient


R2, or coefficient of determination, as it is also called, is a tester parameter of simple and
multiple regression models. In multiple regression models, R2 represents how much the

independent variables can explain the behavior of the dependent variable. Thus, R2 represents
the explanatory power of a regression model.
In simple regression with one independent variable, R2 is simply the square of the correlation
coefficient.
In multiple linear regression with more than one explanatory variables, with the intercept of
the regression straight line included, R2 is called the multiple correlation coefficient.
In all the standard statistical solvers including MS Excel, R2 is provided in the output. For
example, an R2 of 74% from the solver output implies the regression model can explain 74%
of the behavior of the dependent variable. This indicates we may refine our regression model
a bit.
However, a problem with R2 is that it increases as one simply plugs in more independent
variables into the regression model, even if they do not increase the explanatory power of the
model. To solve this problem, adjusted R2 (Ra2) is used (also provided by all statistical
solvers) defined below, which is free from this bias.

Where, n = no. of observations, k = no. of independent variables.

2.3.: Assumptions
Multiple regression technique does not test whether data are linear. On the contrary, it
proceeds by assuming that the relationship between the Y and each of Xi's is linear. Hence as
a rule, it is prudent to always look at the scatter plots of (Y, Xi), i= 1, 2,,k. If any plot
suggests non linearity, one may use a suitable transformation to attain linearity.
Another important assumption is non existence of multicollinearity- the independent
variables are not related among themselves. At a very basic level, this can be tested by
computing the correlation coefficient between each pair of independent variables.
Other assumptions include those of homoscedasticity and normality.
Multiple regression analysis is used when one is interested in predicting a continuous
dependent variable from a number of independent variables. If dependent variable is
dichotomous, then logistic regression should be used.

Chapter three

Findings & Analysis

Chapter Three: Findings & Analysis


Findings:
3.1

DATA File (House Rent Survey)

Serial
No.

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.

House Rent (Taka)


(Y)
11000
15000
10500
12000
15000
16500
8500
14000
30000
25000
18000
11500
18000
13000
22000
40000
27000
45000
60000
55000
60000
40000
70000
65000
60000
40000
38000
80000
35000
70000

No. of Rooms
(X1)
3
4
3
3
4
4
3
4
3
3
3
2
3
3
3
4
3
3
3
3
3
3
3
3
3
3
3
4
3
3

Area in Square Feet


(X2)
800
1050
750
900
1200
1250
950
1100
2500
1150
1200
650
1000
1066
1163
2400
1650
1750
1600
1560
2300
1600
1800
2130
2200
1350
1300
2300
1400
2000

Analysis:
Part One: Multiple Regression & Correlation Matrix
3.2

Summary Output
SUMMARY OUTPUT

Regression Statistics
Multiple R
0.806569155
R Square
0.650553802
Adjusted R Square
0.624668899
Standard Error
13352.80393
Observations
30
ANOVA
df
Regression
Residual
Total

2
27
29

SS
MS
F
8962137604 4481068802 25.13255655
4814029062 178297373
13776166667

Significance F
0.00000068

Coefficient
Intercept
X Variable 1
X Variable 2

Coefficients
8222.577181
-7421.32255
33.65967748

Standard Error
17614.1569
5470.176413
4.747771671

t Stat
P-value
0.46681639 0.644376317
0.00000013 0.186116258
7.08957376 1.26982E-07

Lower 95%
Upper 95%
-27918.687 44363.84141
-18645.197 3802.552217
23.918055 43.40130017

3.3

Interpretation

3.3.1

Developing the hypothesis:

Ho1 =There is No association between House Rent & No. of Rooms


Ho2 =There is No association between House Rent & Area in Square feet
H1=There is association of House Rent with No. of Rooms & Area in Square feet
3.3.2

Multiple Regression Equation:

The required equation is, y`=a+b1X1+b2X2


Where, Y=Dependent Variable (House Rent)
X1=Independent Variable-1 (No. of Rooms)
X2=Independent Variable-2 (Area in Square feet)
From the coefficient table we find the values of a, b1 and b2. The Desired equation can be
written as:

b1= (-7421.323) indicates that when number of rooms decreases by 1 room then the mean
decrease in house rent is Taka 7421.323 while other variables are held constant.
b2=33.65 indicates that when area in square feet increases by 1 Square feet then the mean
increase in house rent is Taka 33.65 while other variables are held constant.
3.3.3

Assessing the relationship in relative Term:

The Relationship among the variables in relative terms can be estimated with the help of
coefficient of multiple correlation (r).
R=0.81 indicates that there is a strong positive correlation among three variables (House rent,
Number of Rooms & Area in Square Feet).
3.3.4

Assessing the Explanatory Power of Independent Variables:

The explanatory power of the independent variables can be assessed with the help of
coefficient of multiple determination (R2) [Adjusted]
Adjusted R2=0.62 indicates that around 62 percent of the variation in the dependent variable
can be explained by the total variation in independent variables of No. of Rooms & Area in
Square feet).

3.3.5

Assessing the significance of the result:

If the p-value of slope coefficient is equal or less then 5% (0.05) then the relationship
between dependent and independent variable is statistically significant.
If significance of F (p value of F) is equal or less then 5% (0.05) then the overall model is
statistically significant. [ANOVA Table]
From the coefficient table, we can say that the slope coefficient of no. of rooms is not
statistically significant at 5% level. [0.18>0.05].
So, we cannot reject our null hypothesis. That means there is no between houses rent &
Number of rooms.
From the coefficient table, we can say that the slope coefficient of areas in square feet is
statistically significant at 5% level.[0.00<0.05].
So, we can reject our null hypothesis. That means there is positive and significant relationship
between Areas in Square Feet & House Rent.
From the ANOVA table, F statistic implies that the overall regression model is statistically
significant at 5% level (0.001<0.05). So, the regression equation is a good model fit for the
data.
3.3.6

Assessing the problem of Multicollinearity:

i) Correlation coefficients between the independent variables may be higher than the
coefficient between dependent and any other independent variables
r (X1,X2)= 0.184 > r (Y,X1)= -0.00629

Multicollinearity

r (X1,X2)= 0.184 < r (Y,X2)= 0.791664

No Multicollinearity

ii) Correlation coefficient between the independent variables is Less than 0.80
r (X1,X2)= 0.184 < 0.80

No Multicollinearity

Overall there is no multicollinearity problem


House Rent (Y)

No of Rooms(X1)

Areas in Square Feet (X2)

House Rent (Y)


1
No of Rooms (X1)
-0.00629

0.791664

0.183698

Areas in Square Feet (X2)


1

10

Part Two: Mean, Median & Mode


3.4

Frequency, Relative Frequency, Cumulative frequency

2K > n
25 (32) >28, so number of classes, K=5

H L
K
94
or ,
5
or ,1
I

So take interval = 1
Exclusive Method:

Class

Tally Bar

Frequency

Relative Frequency

Cumulative frequency

4-5

IIIIIIII

0.32

5-6

IIIIIII

0.29

17 (9+8)

6-7

IIII

0.18

22

7-8

IIII

0.18

27

8-9

27

9-10

0.036

28

28

Total:

3.5

Mean

Data Range =

11

3.6

Arranging data from largest to smallest:

4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 9

3.7

Median:

3.8

Mode:

Mode is 4, (Highest occurrence of 9 times)

3.9

Mean Deviation

No. of Family Members (X)


5
4
7
4
7
5
6
5
5
4
7
5
4
7
5
4
6
7
4
6
6
4
4
5

0 (5-5)
-1 (4-5)
2
-1
2
0
1
0
0
-1
2
0
-1
2
0
-1
1
2
-1
1
1
-1
-1
0

0
1
2
1
2
0
1
0
0
1
2
0
1
2
0
1
1
2
1
1
1
1
1
0

12

6
4
5
9

1
-1
0
4

1
1
0
4
28

Total=

3.10 Population Variance & Standard Deviation

5
4
7
4
7
5
6
5
5
4
7
5
4
7
5
4
6
7
4
6
6
4
4
5
6
4

0
1
4
1
4
0
1
0
0
1
4
0
1
4
0
1
1
4
1
1
1
1
1
0
1
1

13

5
9
Total=

0
16
50

3.11 Geometric Mean


Geometric Mean

= 5.2
Hence, Geometric mean of the no. of family members will be 5.

14

Chapter four

Conclusion

15

Chapter Four: Conclusion


4.1

Conclusion

Multiple regression analysis is a powerful technique used for predicting the unknown value
of a variable from the known value of two or more variables- also called the predictors.
Multiple regressions is a statistical tool used to derive the value of a criterion from several
other independent, or predictor, variables. It is the simultaneous combination of multiple
factors to assess how and to what extent they affect a certain outcome.
This technique breaks down when the nature of the factors themselves is of an un-measurable
or pure-chance nature.
In regression analysis, the problem of interest is the nature of the relationship itself between
the dependent variable (response) and the (explanatory) independent variable.
The analysis consists of choosing and fitting an appropriate model, done by the method of
least squares, with a view to exploiting the relationship between the variables to help estimate
the expected response for a given value of the independent variable. For example, if we are
interested in the effect of age on height, then by fitting a regression line, we can predict the
height for a given age.
The observations are assumed to be independent. For correlation, both variables should be
random variables, but for regression only the dependent variable Y must be random. In
carrying out hypothesis tests, the response variable should follow Normal distribution and the
variability of Y should be the same for each value of the predictor variable. A scatter diagram
of the data provides an initial check of the assumptions for regression.

16

APPENDIX:
Survey Data
Serial No:
Area:
Rent:
Size:
Rooms:

1
Hazaribag
11000 per month
800 square feet
3 rooms

Serial No:
Area:
Rent:
Size:
Rooms:

2
Hazaribag
15000 per month
1050 square feet
4 rooms

Serial No:
Area:
Rent:
Size:
Rooms:

3
Hazaribag
10500 per month
750 square feet
3 rooms

Serial No:
Area:
Rent:
Size:
Rooms:

4
Dhanmondi
12000 per month
900 square feet
3 rooms

Serial No:
Area:
Rent:
Size:
Rooms:

5
Hazaribag
15000 per month
1200 square feet
4 rooms

Serial No:
Area:
Rent:
Size:
Rooms:

6
Hazaribag
16500 Taka per month
1250 square feet
4 rooms

Serial No:
Area:
Rent:
Size:
Rooms:

7
Hazaribag
8500 Taka per month
950 square feet
3 rooms

17

Serial No:
Area:
Rent:
Size:
Rooms:

8
Hazaribag
14000 Taka per month
1100 square feet
4 rooms

Serial No:
Area:
Rent:
Size:
Rooms:

9
Dhanmondi
30000 Taka per month
2500 square feet
3 rooms

Serial No:
Area:
Rent:
Size:
Rooms:

10
Dhanmondi
25000 Taka per month
1150 square feet
3 rooms

Serial No:
Area:
Rent:
Size:
Rooms:

11
Hazaribag
18000 Taka per month
1200 square feet
3 rooms

Serial No:
Area:
Rent:
Size:
Rooms:

12
Dhanmondi
11500 Taka per month
650 square feet
2 rooms

Serial No:
Area:
Rent:
Size:
Rooms:

13
Dhanmondi
18000 Taka per month
1000 square feet
3 rooms

Serial No:
Area:
Rent:
Size:
Rooms:

14
Hazaribag
13000 Taka per month
1066 square feet
3 rooms

Serial No:
Area:
Rent:
Size:

15
Dhanmondi
22000 Taka per month
1163 square feet

18

Rooms:

3 rooms

Serial No:
Area:
Rent:
Size:
Rooms:

16
114 b gikatola 3rd floor monswor road bhaka dhanmondi area
40,000 per month
2,400 square feet
4 rooms

Serial No:
Area:

17
109/3/a home. 9/a road. 4/a flat. Dhanmondi dreams. west Dhanmondi (behind of
Dhanmondi party centre.
27,000 Taka per month
1650 square feet (scft)
3 bed room, 1 dining room, 1sitting room, 3 toilet + 1 servants toilet , 1 kitchen , 2
balcony ,1 garage

Rent:
Size:
Rooms:

Serial No:
Area:
Rent:
Size:
Rooms:

18
Banani
45,000 Taka per month
1750 Square Feet
3 rooms

Serial No:
Area:
Rent:
Size:
Rooms:

19
Banani
60,000 Taka per month
1600 Square Feet
3 rooms

Serial No:
Area:
Rent:
Size:
Rooms:

20
Banani
55,000 Taka per month
1560 Square Feet
3 rooms

Serial No:
Area:
Rent:
Size:
Rooms:

21
Banani (BAN-113)
60,000 Taka per month
2300 Square Feet
3 rooms

Serial No:
Area:
Rent:
Size:
Rooms:

22
Banani (BAN1572)
40,000 Taka per month
1600 Square Feet
3 rooms

19

Serial No:
Area:
Rent:
Size:
Rooms:

23
Banani (BAN1394)
70,000 Taka per month
1800 Square Feet
3 rooms

Serial No:
Area:
Rent:
Size:
Rooms:

24
Banani BAN1136
65,000 Taka per month
2130 Square Feet
3 rooms

Serial No:
Area:
Rent:
Size:
Rooms:

25
Banani
60,000 Taka per month
2200 Square Feet
3 rooms

Serial No:
Area:
Rent:
Size:
Rooms:

26
Banani
40,000 Taka per month
1350 Square Feet
3 rooms

Serial No:
Area:
Rent:
Size:
Rooms:

27
Banani
38,000 Taka per month
1300 Square Feet
3 rooms

Serial No:
Area:
Rent:
Size:
Rooms:

28
Banani (BAN-9)
80,000 Taka per month
2300 Square Feet
4 rooms

Serial No:
Area:
Rent:
Size:

29
Banani (BAN-59)
35,000 Taka per month
1400 Sq Feet

20

Rooms:

3 rooms

Serial No:
Area:
Rent:
Size:
Rooms:

30
Banani (BAN-6)
70,000 Taka per month
2000 Square Feet
3 rooms

21

You might also like