Real Estate Price Prediction Using Multiple Linear Regression

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Real Estate Price Prediction using Multiple Linear Regression

Jane Maria Jose

04/02/2022

INTRODUCTION:
Multiple linear regression (MLR), also known simply as multiple regression, is a statistical
technique that uses several explanatory variables to predict the outcome of a response
variable. It is an extension of linear (OLS) regression that uses just one explanatory
variable. MLR is used extensively in econometrics and financial inference.

OBJECTIVE:
To predict the Real estate prices based on few factors i.e house age, distance to nearest
metro station and number of convenience stores using multiple linear regression. The
analysis comprises of following objectives:-
1. To plot a matrix of scatter diagrams between the variables of interest and also find the
matrix of coefficient of correlations and interpret it.
2. To fit a multiple linear regression model and interpret the estimated coefficients.
3. To test the significance of regression parameters using the t-test and interpret it.

DATA DESCRIPTION:
The data consists of 4 variables- House Age(X1),distance to nearest MRT
Station(X2),number of convenient stores(X3) and House price of unit area(Y).
library(readxl)
Real_estate_price <- read_excel("D:/JANE/CHRIST UNIVERSITY/TEXT AND NOTES/SEM
2/REGRESSION ANALYSIS/Real estate price.xlsx")
View(Real_estate_price)

head(Real_estate_price)

## # A tibble: 6 x 4
## `house age` `distance to the neares~ `number of convenien~ `house price
of un~
## <dbl> <dbl> <dbl>
<dbl>
## 1 32 84.9 10
37.9
## 2 19.5 307. 9
42.2
## 3 13.3 562. 5
47.3
## 4 13.3 562. 5
54.8
## 5 5 391. 5
43.1
## 6 7.1 2175. 3
32.1

ANALYSIS:
#dependent and independent variable

Y=Real_estate_price$`house price of unit area`


X1=Real_estate_price$`house age`
X2=Real_estate_price$`distance to the nearest MRT station`
X3=Real_estate_price$`number of convenience stores`

#scatter plot matrix

pairs(Real_estate_price[1:4])

INTERPRETATION:
From the above matrix, we can conclude that there exists some kind of linear relationship
between each set of dependent and independent variables.
#correlation matrix

round(cor(Real_estate_price),2)

## house age
## house age 1.00
## distance to the nearest MRT station 0.13
## number of convenience stores -0.13
## house price of unit area -0.38
## distance to the nearest MRT station
## house age 0.13
## distance to the nearest MRT station 1.00
## number of convenience stores -0.64
## house price of unit area -0.71
## number of convenience stores
## house age -0.13
## distance to the nearest MRT station -0.64
## number of convenience stores 1.00
## house price of unit area 0.63
## house price of unit area
## house age -0.38
## distance to the nearest MRT station -0.71
## number of convenience stores 0.63
## house price of unit area 1.00

INTERPRETATION:
From the correlation value between each of the independent variables we can say that
there is no correlation between the independent variables. The fact that there exists no
multicollinearity between the regressors makes this dataset suitable for performing
multiple linear regression.
#fitting the regression model

model=lm(Y~X1+X2+X3,data = Real_estate_price)
model

##
## Call:
## lm(formula = Y ~ X1 + X2 + X3, data = Real_estate_price)
##
## Coefficients:
## (Intercept) X1 X2 X3
## 44.237200 -0.331483 -0.005072 1.358965

summary(model)
##
## Call:
## lm(formula = Y ~ X1 + X2 + X3, data = Real_estate_price)
##
## Residuals:
## Min 1Q Median 3Q Max
## -37.890 -5.311 -0.641 5.057 28.336
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 44.2372002 2.6798588 16.507 < 2e-16 ***
## X1 -0.3314827 0.0700275 -4.734 6.39e-06 ***
## X2 -0.0050723 0.0007666 -6.617 1.25e-09 ***
## X3 1.3589652 0.3707131 3.666 0.000376 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.625 on 114 degrees of freedom
## Multiple R-squared: 0.6274, Adjusted R-squared: 0.6176
## F-statistic: 63.98 on 3 and 114 DF, p-value: < 2.2e-16

INTERPRETATION:
The fitted regression model is given by :- house price of unit area = 44.237 - 0.33 house age
- 0.005 distance to MRT station + 1.3589 Number of convenience stores. The average value
of house price, when all 3 independent variables are zero is approximately 44.237. From
the fitted model, we can infer the following:- 1 unit increase in house age will decrease the
house price by almost 0.33 units , 1 unit increase in distance to MRT station will decrease
the house price by around 0.005 units and finally 1 unit increase in number of convenience
stores will increase the house price by approximately 1.3589 units. The coefficient of
determination is 0.6176 which implies that about 67% of the total variability in the
response variable is explained by the regressors.
#testing the significance of regression parameters

#H0:There is no significant correlation between Y and X variables i.e


beta1=beta2=beta3=0
#H1:There is significant correlation between Y and X variables i.e
beta1,beta2,beta3 are not equal to zero.

# ttab= t(114,0.025) = 1.98

#t0 = beta1/S.E(beta1) --> test statistic


t0 = 0.331 / 0.07
t0 # Can also be obtained directly from the summary table

## [1] 4.728571
#t1 = beta2/S.E(beta2) --> test statistic
t1 = 0.005 / 0.0007
t1

## [1] 7.142857

#t2 = beta3/S.E(beta3) --> test statistic


t2= 1.3589 / 0.370
t2

## [1] 3.672703

INTERPRETATION:
Here the calculated values t0,t1 and t2 is greater than the tabular value of t at 114 degrees
of freedom and 0.025 significance level. We can reject the null hypothesis. So we conclude
that there is significant correlation between the Dependent variable Y and independent
variables X1,X2,X3.

CONCLUSION:
A multipe linear regression is performed to predict how certain factors like house age,
distance to MRT station and number of convenience stores affect the house price of unit
area. The following conclusions are drawn:-
1. From the scatter plot matrix, we can see some kind of linear relation between each set of
dependent and independent variables. Also the correlation matrix confirms that there is no
multicollinearity between the regressors.
2. The fitted regression model is given by:- house price of unit area = 44.237 - 0.33 house
age - 0.005 distance to MRT station + 1.3589 Number of convenience stores
3. The t test shows that there is significant correlation between the Y and X variables i.e
beta1,beta2,beta3 not equal to zero.

You might also like