House Pricing Prediction System

House Pricing
Prediction System
Shrey Singh (1703210146)
Satyam Chauhan(1703210133)
Shardool Veer Vikram(1703210134)
Shivam Patel(1703210140)
Under the guidance of Prof. Santosh Kumar
≥ Project Introduction
≥ Motivation
≥ Project Objective
≥ Scope of the project
≥ Literature Survey
≥ Requirements
Content ≥ Proposed Method
≥ System Design
≥ Implementation
≥ Conclusion
≥ Deliverables
≥ Stakeholders
≥ Gantt Chart
≥ References
Problems faced during buying a house:
1) Buying a house is a stressful thing.
2) Buyers are generally not aware of
factors that influence the house prices.
Project 3) Many problems are faced during
buying a house.
Introduction 4) Hence real estate agents are trusted
with the communication between
buyers and sellers as well as laying
down a legal contract for the transfer.
This just creates a middle man and
increases the cost of houses.
 Having a house price prediction model can
be a very important tool for both the
seller and the buyer as it can aid them in
making well informed decision.
Motivation  For sellers , it may help them to determine
the average price at which they should put
their house for sale while for buyers , it
may help them find out the right average
price to purchase the house.
• To predict the house price according to the area.
• To calculate house price depending upon
surrounding environment like railway station.
Project Hospital area ,ATM ,College , banks so customer
can purchase flat/house with full facilities.
Objective • To suggest builder price prediction for the new
constructions.
• To provide comparison of house pricing to
customers.
• Traditional house price prediction is based on
cost and sale price comparison, lacking of an
accepted standard and a certification process.
Scope
• Therefore, the availability of a house price
Of the
prediction model helps fill up an important
Project information gap and improve the efficiency of
the real estate market
 In [1], using the city of Oslo as their case to find the difference in explanatory
power of absolute versus relative location in a hedonic model.
 Their main finding is that the power of postcode dummies diminishes

significantly when introducing relative location variables such as walking
Literature distance to places like parks, atms, metro.
Survey  In particular, walking distance to key locations and more refined proximity
measures to the types of amenities home buyers are likely to care about are
considered as these variables are found to have a high explanatory power.
 This analysis shows that relative location is so important, that a failure to

incorporate it may severely affect the pursuit of providing high value
neighbourhoods.
 In [2], an innovative solution which is used for analyzing and mapping of real
estate. The data is collected systematically and then the data is analyzed and
assessed about the changes in the real estate market by the software.
 It is concluded that the statistics comparisons which can be created by the

Literature software enable professionals and researchers to gain insight on the actual change
in the market prices of real estate in the Czech Republic.
Survey  The output can be used for making appropriate investment decisions for both
common persons and companies. They witnessed a long-term decrease in the
prices of real estate since the second quarter of 2008.
 In [3], the predictive power of artificial neural network model and hedonic model is
compared.
 The results from the hedonic price models support the previous findings that even if
the R^2 of hedonic price models are high in sample forecast, hedonic models still do
Literature not outperform neural network models.
Survey
 The hedonic models show poorer results on out-of-sample forecasts, especially when
compared to neural network models.
 Hence, the empirical evidence presented in this paper supports the potential of neural
network on house price prediction.
 In [4], using several methods such as fuzzy logic, artificial neural
network and K-Nearest Neighbor are compared to find the most
appropriate method which can be used as a reference for determining the
prices of land and house.
Literature  The real price transaction and the prediction using the MAPE formula
are compared to test the experimental methods.
Survey  Based on the results it is shown that the fuzzy method is superior to the
neural networks as well as k-nearest neighbor for the house price
prediction in limited data training.
 In [5], the solution of “House Prices : Advanced Regression Techniques”
competition which was held on kaggle platform is described.
 The goal of the competition is to predict the house price based on the factors
given in the training data some of which are lot area, lot size, house type, many
Literature classic algorithms were used to find the solution.

 It is mentioned that the best way to improve results is to train more base models
Survey for ensemble which requires more computational resources.
 Another problem with various models ensembling is how to make errors of these
models uncorrelated, otherwise we would not receive any quality improvement.
 In [6], multiple algorithms like Simple Linear Regression(SLR), Multiple
Linear Regression(MLR), Neural Networks(NN) are used to find the best
algorithm which has the lower Mean Square Error(MSE) and then that
algorithm is chosen for predicting the house price.
Literature  From the result, it is concluded that the neural network has better prediction
ability.
Survey
 In [7], the application of predictive analytics in the field of Real Estate. Many
predictive techniques are used to forecast the property prices.
 They concluded that the neural network works the best as it gives highest
accuracy among all with lowest error but working with neural network is a trial
Literature and error method.
Survey  Next model which works the best is the Decision Tree. However, the decision
tree only gives binary output in order to obtain a more precise and accurate
prediction result.
 In [8], prices are predicted based on the budgets and priorities of the real
estate customers.
 Future prices will be predicted by analyzing previous market trends and
price ranges and also new developments. The extracted data should be
Literature useful, so the system makes optimal use of the linear regression algorithm.
 The system makes use of such huge data in the most efficient way and the
Survey linear regression algorithm helps to fulfill customers by reducing the risk of
investing in real estate and increasing the accuracy of estate choice.
 In [9], based on the factors that affect the price of houses a predictive model
is constructed.
 Many regression techniques are applied such as multiple linear regression,
Lasso and Ridge regression model, support vector regression and boosting
Literature algorithms such as Extreme gradient boost Regression.
 The predictive model is built to pick the best performing model by analyzing
Survey the predictive errors obtained between these models.
 When the evaluation metrics obtained for advanced regression models is
observed, it is concluded that both behaved in the same manner.
Software & Hardware Requirements
System Specifications:-
o CPU - Intel core i3/i5 3.60 Ghz
o RAM - 4/8 Gb
Requirements o GPU – intel / amd / Nvidia
Implementation Details:-
o Python3 code
o Pycharm / Anaconda
o Machine learning algorithms
In this task we will follow the given advances:-
Step1:
Firstly we trained our model after considering which model will be
used for classification. In this project we will be considering following
data models.
Proposed
≥ Linear Regression : Linear Regression is a supervised machine
Method learning algorithm.
It's used to predict values within a continuous range, (e.g.
sales, price) rather than trying to classify them into categories.
≥ Random Forest : Random forest is a flexible, easy to use machine

learning algorithm that produces, even without hyper-parameter
tuning, a great result most of the time. It is also one of the most used
algorithms, because of its simplicity and diversity
≥ Decision Tree : Decision tree learning is one of the predictive modelling
approaches used in statistics, data mining and machine learning. It uses
a decision tree (as a predictive model) to go from observations about an
item (represented in the branches) to conclusions about the item's target
value (represented in the leaves).
Proposed ≥ KNN : K-nearest neighbors algorithm is a non-parametric machine

learning method first developed by Evelyn Fix and Joseph Hodges in
1951, and later expanded by Thomas Cover. It is used for classification
and regression. In both cases, the input consists of the k closest training
Method examples in feature space
≥ XGBoost Regressor: XGBoost is an algorithm that has recently been
dominating applied machine learning and Kaggle competitions for
structured or tabular data. XGBoost is an implementation of gradient
boosted decision trees designed for speed and performance.
≥ Support Vector Machine: In machine learning, support-vector machines

Proposed are supervised learning models with associated learning algorithms that
analyze data for classification and regression analysis
Step 2:
Method After this we select our record which should be tried over the machine
learning model and afterward it gives the outcome that is the predicted
sale price.
Process Flow Diagram
System
Design
Step involved:-
● Importing the required packages in our python
environment.
● Importing data-set.
Implementation
● Analysing and cleaning the data set.
Implementation
● Building the data model.
Implementation
Implementation
Implementation
Implementation
● Evaluating the models using evaluation metrics(R-
squared score, Mean Square Error, Mean Absolute
Error).
Implementation
● Results
Implementation
● Results
Mean absolute error
Model Chennai Delhi Hyderabad Kolkata Mumbai Bangalore
Linear Regression 4.67E+06 1.36E+07 2.21E+06 5.99E+06 1.01E+07 5.71E+06
Implementation Random Forest 3.87E+06 1.20E+07 1.65E+06 6.83E+06 9.18E+06 4.99E+06
SVM 4.70E+06 1.37E+07 4.71E+06 5.09E+06 9.48E+06 5.85E+06
XGBoost 3.80E+06 1.32E+07 1.51E+06 6.32E+06 8.76E+06 4.95E+06
Decision Tree 4.21E+06 1.37E+07 2.34E+06 5.66E+06 9.69E+06 5.32E+06
KNN 4.15E+06 1.28E+07 1.96E+06 6.20E+06 9.39E+06 5.40E+06

• After observing the resultant metrics for various
models it can be concluded that XGBoost has the
highest R^2 score. We can also check for outliers
with the help of box plots and remove them if they
Conclusion exist and then analyse the improvement in the
performance of the model.
• We can construct several models using advanced

mechanisms such as particle swarm optimization or
neural network which can also improve the accuracy
of predictions.
The project will analyze the data and evaluate the R-squared
Deliverables error, mean square error(MSE), mean absolute error(MAE)
and will also determine the best house price for the
customer.
• Real Estate Market
• Companies such as Zillow and Trulia
StakeHolders • Banks
• Government Agencies
Gantt Chart
[1] Heyman Axel and Sommervoll Dag, "House prices and relative location", Cities, 2019.
[2] Eduard Hromada, Mapping of real estate prices using data mining techniques, Czech
Republic:Czech Technical University, 2015.
[3] Limsonbunchai, V., Gan, C., & Lee, M. (2004). House price prediction: Hedonic price
model vs. artificial neural network. American Journal of Applied Sciences,1, 193–201.
References [4] M. F. Mukhlishin, R. Saputra, and A. Wibowo, “Predicting House Sale Price Using Fuzzy
Logic, Artificial Neural Network and K-Nearest Neighbor,” 1 st International Conference on
Informatics and Computational Sciences (ICICoS), vol. 1, pp. 171–176, 2017.
[5] P.V. Aleksandrovich, K.I. Leopoldovich and P.A. Viktorovich, "Predicting Sales Prices of
the Houses Using Regression Methods of Machine Learning", 2018 3rd Russian-Pacific
Conference on Computer Technology and Applications (RPC), pp. 1-5, 2018.
[6] Vineeth N., Ayyappa M., Bharathi B. House Price Prediction Using Machine Learning
Algorithms / I. Zelinka, R. Senkerik, G. Panda, P. S. Lekshmi Kanthan (Eds) // Soft
Computing Systems. Singapore: Springer Singapore, 2018. P. 425–433.
[7] N. Shinde, and K. Gawande. Survey on predicting property price. In 2018 International
Conference on Automation and Computational Engineering (ICACE) (pp. 1-7). IEEE.
October 2018.
References [8] Nihar Bhagat, Ankit Mohorkar and Shreyas Mane, "House Price Forecasting using Data
Mining", International Journal of Computer Applications, 2016.
[9] J. Manasa, R. Gupta, and N. S. Narahari, “Machine learning based predicting house
prices using regression techniques,” in 2020 2nd International Conference on Innovative
Mechanisms for Industry Applications (ICIMIA), pp. 624–630, 2020.
Thank You

House Pricing Prediction System

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

House Pricing Prediction System

Uploaded by

Copyright:

Available Formats

House Pricing

 Their main finding is that the power of postcode dummies diminishes

 This analysis shows that relative location is so important, that a failure to

 It is concluded that the statistics comparisons which can be created by the

Literature classic algorithms were used to find the solution.

≥ Random Forest : Random forest is a flexible, easy to use machine

Proposed ≥ KNN : K-nearest neighbors algorithm is a non-parametric machine

≥ Support Vector Machine: In machine learning, support-vector machines

Model Chennai Delhi Hyderabad Kolkata Mumbai Bangalore

Linear Regression 4.67E+06 1.36E+07 2.21E+06 5.99E+06 1.01E+07 5.71E+06

Implementation Random Forest 3.87E+06 1.20E+07 1.65E+06 6.83E+06 9.18E+06 4.99E+06

SVM 4.70E+06 1.37E+07 4.71E+06 5.09E+06 9.48E+06 5.85E+06

XGBoost 3.80E+06 1.32E+07 1.51E+06 6.32E+06 8.76E+06 4.95E+06

Decision Tree 4.21E+06 1.37E+07 2.34E+06 5.66E+06 9.69E+06 5.32E+06

KNN 4.15E+06 1.28E+07 1.96E+06 6.20E+06 9.39E+06 5.40E+06

• We can construct several models using advanced

You might also like