Professional Documents
Culture Documents
Utkarsh Gupta_House price prediction
Utkarsh Gupta_House price prediction
Chandradeep Bhatt
Computer Science and Engineering
Graphic Era Hill University
Dehradun, Uttarakhand
bhattchandradeep@gmail.com
Abstract— Everyone dreams of buying and living in their Machine Learning starts predicting new values from the
own house, which is suitable for their lifestyle and suits previous data given to them. Now a day’s house prices
their personality. The main objective for which people are increasing day by day due to overpopulation. People
look at a house is the square footage of the house, the
who are unaware of house prices may suffer a lot to get
number of bedrooms and bathrooms, the location of the
a desire house. In dataset that we have used in our
house, and the year it was built. This qualitative research
helps to get an accurate price for the house that is suitable model in that we have divided it into two parts like
for their budget and will not affect them financially. It training set and testing set. We can use any percentage
also helps a person buy a house according to his or her to divide it into train and test. For example, I have taken
requirements. There are many machine learning 70% of the data to train the model and the rest 30% data
algorithms out there, like Gradient boosting regression, to test the model. There are many algorithms that can
linear regression, polynomial regression, and random be used to predict house price, But I have used here
forest regression. After researching on every algorithm, Linear Regression algorithm that perform sudden task
the best accuracy algorithm will be chosen to predict the
to do prediction of house accurate.
house price.
1. Collecting data: Gather a large dataset containing
Keywords: Ml Algorithm, House price prediction model,
Regression Techniques. Gradient boosting regression, information about houses such as location of house,
linear regression, polynomial regression number of bedrooms and bathrooms, in which year the
house is built, etc.
I. INTRODUCTION
2. Data Cleaning: Clean the data by removing missing
Buying a new own house is everyone’s most important values, outliers and duplicates.
decision that a person makes in his life. Everyone’s
dream is to live in their dream house with a price range 3. Splitting of data: Split the data into two sets (i-) A
of their budget. The house price depends on a various Training set – The training set of the dataset is that set
variety of factor range such as the number of bedrooms which is used to train our model, and (ii-) A Testing
and bathrooms in the house, location of the house, set- A testing set is used to test the model’s
ready to move or not, as well as the year built. Without performance.
data we can’t train our model that why data is called
4. Train the model: Training a regression model using
heart of Machine Learning model. So, we give certain
the training set. By this our model learns how to predict
information like location of house, number of
the accurate price of the house by the given data.
bedrooms, bathrooms and other amenities to our model
to predict the house price accurately.
5. Evaluating the model: by using the testing set of the house price. Random forest algorithm is giving less
data we judge the model’s performance. performance than the ensemble algorithm. Ensemble
also gives the good result as compare with random
6. Deploy the model: when the performance of the forest algorithm.
model is satisfied, then model is ready to make
prediction on new data. Darshil Shah (2020) [2], he has shown in his paper that
the models which are already there in the market are
7. Monitor the model: keep monitoring the model having dataset which is very old and to solve this
performance and update it periodically to ensure that it problem, he introduced automated system with best
stays accurate. accuracy of predicting house prices. He has done this
by using Light GBM, XG Boost and Random forest
Data Data Train-Test techniques. He used these techniques to train and test
Collection Cleaning Splitting the model to predict house price with more accuracy.
House Price
Prediction
Data
Fig.2 Proposed architecture of house price prediction model
Reduction
1. Data Collection- The collection of data is the Fig.4 Preprocessing steps
mandatory part for making machine able to predict
3. Pre-processing of the dataset- Pre-processing means
price. For training the machine learning project a big
breaking the dataset into two parts a training and testing
amount of dataset is required. A perfect dataset having
module. As in dataset, there present some non- 1. The number of parameters present must be smaller
numerical features also such as the house environment, than the number of observations made.
location and house is ready to move or not. I use one
hot ender and label encoder function which is the 2. The mean error value of Ɛ has to be 0, by this we
library of sci-kit learn, these libraries helps to convert know that the term Ɛ is distributed normally.
the non-numerical into numerical features. In dataset
some empty set also there so to remove that I have used
the mean of the column using the simple imputer
function which is also the library of the scikit learn.
Whereas variables ‘x’ and ’y’ are known as the model 1. This regression techniques gives the best calculation
parameters. When we take A as 0, we get value of ‘x’ between dependent variable and the independent
which is intercept of B and ‘y’ is slope which show variables.
change of the variable B with A. If ‘y’ value is larger
then if we make a smaller in A that will lead to make a 2. The dataset with the largest power of the polynomial
large change in B. The value of ‘x’ and ‘y’ by the least suits good in dataset.
square method. Every times the predicted values is not
3. By changing the degree we can able to fit many
accurate so sometimes there will be a difference, for
curves into polynomial regression.
that we include one term to the equation (1) which is
known as error term, by doing this is help to predict Disadvantage of polynomial regression are:
better values.
1. This algorithm is very much reactive towards the
B= x + yA + Ɛ (2) outliers that are present, due to the outliers the variance
of the model increases.
Some prediction has been done in simple linear
regression that is:
8. Data Analysis- Before proceeding further we have to
be clear about the data that it is accurate and ready to
use in the model. For doing this I have scanned my data
based on some features. By analyzing the data I have
found.
V. CONCLUSION