Ip Project Kavi Priyan

VEDIC VIDYASHRAM SENIOR SECONDARY SCHOOL
Madurai Road, Thachanallur, Tirunelveli - 627 358
INFORMATICS PRACTICES
PROJECT ON
HOUSE PRICE PREDICTION
Submitted in partial fulfillment of the requirement
Practicals of Senior Secondary (CBSE)
(2023 - 2024)
Submitted By : P.KAVIPRIYAN
Grade : XII C
VEDIC VIDYASHRAM SENIOR SECONDARY SCHOOL
Madurai Road, Thachanallur, Tirunelveli - 627 358
CERTIFICATE
This is to certify that the Project Work entitled “HOUSE PRICE

PREDICTION” is the bonafide record of work done by P.KAVIPRIYAN of
Grade : XII, Exam No: in partial fulfillment of the practical classes
of 12th Standard during the Academic Year 2023-24.
He has taken proper care and shown utmost sincerity in completion of this
project as per the guidelines issued by CBSE.
DATE: INTERNAL EXAMINER
PRINCIPAL EXTERNAL EXAMINER

ACKNOWLEDGEMENT
● It is with a sense of gratitude, I acknowledge the efforts of the entire host

of well-wishers who have contributed in their own special ways to the
success and execution of this project.
● First of all, I express my heartfelt gratitude and indebtedness to my
school CORRESPONDENT, Mr.T. DURAISAMY, MCA, from the
bottom of my heart, for his unlimited support, motivation and
infrastructural aid rendered at all times.
● I would like to express my sincere thanks to my PRINCIPAL,
Mr.C P ENOSH, M.A, M.Phil, B.Ed, for all his substantial valuable
guidance and moral support which has helped me to patch this project
with undoubted success.
● I had been immeasurably enriched by working under the
expert supervision my subject teacher, Mr.S.
SHUNMUGA SUNDARAM, M.E.,A.M.I.E, who has the knack of
correcting and directing me in every situation. I convey my special
thanks to him.
● At last, I extend thanks with all my heart to the Teaching and Non-
Teaching staff who have assisted me constructively in my work .
DECLARATION
We hereby declare that the project work entitled HOUSE PRICE
PREDICTION submitted to DEPARTMENT OF INFORMATICS
PRACTICES, VEDIC VIDYASHRAM SENIOR SECONDARY
SCHOOL is a result of my own work and my indebtedness to other work
publications, references, if any, have been duly acknowledged.
DATE: P.KAVIPRIYAN
CONTENTS
Pg.
S.NO TITLE
no
01. ACKNOWLEDGEMENT
02. DECLARATION
03. PROBLEM DEFINITION 1
04. PROJECT STAGES 2
05. OBJECTIVE 3
06. EXISTING AND PROPOSED SYSTEM 4
HARDWARE AND SOFTWARE

07. 7
REQUIREMENT
08. WORKING DESCRIPTION 8
09. CODING 10
10. OUTPUT SCREENS 24
11. CONCLUSION 28
12. BIBLIOGRAPHY 29
PROBLEM DEFINITION
• People looking to buy a new home tend to be more conservative with their
budgets and market strategies. The existing system involves calculation of
house prices without the necessary prediction about future market trends and
price increase. The goal of the paper is to predict the efficient house pricing for
real estate customers with respect to their budgets and priorities.
• By analyzing previous market trends and price ranges, and also upcoming
developments future prices will be predicted. The functioning of this paper
involves a website which accepts customer’s specifications and then combines
the application of multiple linear regression algorithm of data mining.
• This application will help customers to invest in an estate without

approaching an agent. It also decreases the risk involved in the transaction.
pg. - 1 -
PROJECT STAGES
The project consists of the following stages:
IMPORTING
LIBRARIES AND
DATASET
EXPLORING AND
PREPROCESSING
THE DATASET
MODEL
IMPLEMENTATION
MODEL TESTING
pg. - 2 -
OBJECTIVES
• Create a machine learning model using linear regression
and Boston housing dataset while following the machine learning

workflow.
High-Level Approach:
● Exploring and analyzing the data used for making prediction
● Creating a simple model using linear regression
● Using the model to carryout prediction and evaluating it's

efficiency
pg. - 3 -
EXISTING AND PROPOSED SYSTEM
EXISTING SYSTEM:
• There are several approaches that can be used to determine the price of the
house, one of them is the prediction analysis. The first approach is a quantitative
prediction.
• A quantitative approach is an approach that utilizes time series data [5]. The
time-series approach is to look for the relationship between current prices and
prevailing prices. The second approach is to use linear regression based on
hedonic pricing. Previous research conducted by Gharehchopogh using linear
regression approach get 0,929 errors with the actual price. In linear regression,
determining coefficients generally using the least square method, but it takes a
long time to get the best formula.
• Particle swarm optimization (PSO) is proposed to find the coefficients aimed

at obtaining optimal result . Some previous researches such as Marini and
Walzack show that PSO gets better results than other hybrid methods. There are
several advantages of PSO, in the small search space PSO can do
better solution search. Although the PSO global search is less than optimal , but
on the optimization problem the value of the variable on the regression equation
can find a maximum solution using PSO 3. PROPOSED SYSTEM. The land
prices are predicted with a new set of parameters with a different technique. Also
we predicted the compensation for the settlement of the property.
pg. - 4 -
• Mathematical relationships help us to understand many aspects of everyday
life. When such relationships are expressed with exact numbers, we gain
Additional clarity Regression is concerned with specifying the relationship
between a single numeric dependent variable and one or more numeric
independent variables. House prices increase every year, so there is a need for a
system to predict house prices in the future. House price prediction can help the
developer determine the selling price of a house and can help the customer to
arrange the right time to purchase a house.
PROPOSED SYSTEM:
• Nowadays, e-education and e-learning is highly influenced. Everything is
shifting from manual to automated systems. The objective of this project is to
predict the house prices so as to minimize the problems faced by the customer.
The present method is that the customer approaches a real estate agent to manage
his/her investments and suggest suitable estates for his investments. But this
method is risky as the agent might predict wrong estates and thus leading to loss
of the customer’s investments.
• The manual method which is currently used in the market is out dated and
has high risk. So as to overcome this fault, there is a need for an updated and
automated system. Data mining algorithms can be used to help investors to invest
in an appropriate estate according to their mentioned requirements. Also the new
system will be cost and time efficient. This will have simple operations. The
proposed system works on Linear Regression Algorithm.
pg. - 5 -
REQUIREMENT
• libraries:
• Numpy
• Pandas
• Sklearn
• matplotlib.plt
• Seaborn
SOFTWARE :
● PYTHON 3.7
● MYSQL 5.0
HARDWARE :
CPUs Intel Dual Core
RAM 2GB (minimum) - 4GB (recommended)
Disk Storage 500GB
Operating
Windows or Linux
System
pg. - 6 -
WORKING DESCRIPTION
• The Sequence diagram above explains the working of the system. The
proposed system is supposed to be a website with 3 objects namely: Customer,
the Web Interface and the Database Server.
• The database
server also includes
the computational
mechanism described
in the algorithm.
When the customer
first enters into the
website they are
displayed with a GUI
where they can enter
inputs such as the
type of house, the
area in which it is
located etc.
• A data index
searching then
provides with outputs
consisting of matching properties. Now, if the customer wants to check the
house price in future they can enter the date from the future. The system will
identify the date and categorize it in the quarters. The algorithm then will
compute the value of rate and provide the results back to the customer.
pg. - 7 -
CODING
# -*- coding: utf-8 -*-
"""house-price-prediction-top-14-xgboost.ipynb
Automatically generated by colaboratory.
Original file is located at
https://colab.research.google.com/drive/16p1a388cb30t6r0sgf6w0tahiwqewtw-
My main objectives on this project are:
* applying exploratory data analysis and trying to get some insights about our
dataset
* getting data in better shape by transforming and feature engineering to help us

in building better models
* Building and tuning couple models to get some stable results on predicting
housing prices
"""
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
"""# meeting the data
We’re going to start by loading the data and taking first look on it as usual. for
the column names we have great dictionary file in our dataset location so we can
get familiar with them in no time. I highly recommend looking at that before you
start working on the dataset.
pg. - 8 -
"""
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
df_train = pd.read_csv('../input/house-prices-advanced-regression-
techniques/train.csv')
df_test = pd.read_csv('../input/house-prices-advanced-regression-
techniques/test.csv')
df_train.head()
df_test.head()
df_train.shape
df_test.shape
"""As we can see that in train there are 1460 rows with 81 columns and in test
dataset 1459 rows with 80 columns. our dependent variable is **'saleprice'**"""
df_train.describe()
df_test.describe()
df_train.columns , df_test.columns
"""We have 1460 observations of 80 variables in the training dataframe. The

variables are described below:
saleprice - this is the target variable/dependent variable that you're trying to

predict.
* mssubclass: the building class
* mszoning: the general zoning classification
pg. - 9 -
* lotfrontage: linear feet of street connected to property
* lotarea: lot size in square feet
* street: type of road access
* alley: type of alley access
* lotshape: general shape of property
* landcontour: flatness of the property
* utilities: type of utilities available
* lotconfig: lot configuration
* landslope: slope of property
* neighborhood: physical locations within ames city limits
* condition1: proximity to main road or railroad
* condition2: proximity to main road or railroad (if a second is present)
* bldgtype: type of dwelling
* housestyle: style of dwelling
* overallqual: overall material and finish quality
* overallcond: overall condition rating
* yearbuilt: original construction date
* yearremodadd: remodel date
* roofstyle: type of roof
* roofmatl: roof material
* exterior1st: exterior covering on house
* exterior2nd: exterior covering on house (if more than one material)
* masvnrtype: masonry veneer type
* masvnrarea: masonry veneer area in square feet
* exterqual: exterior material quality
pg. - 10 -
* extercond: present condition of the material on the exterior
* foundation: type of foundation
* bsmtqual: height of the basement
* bsmtcond: general condition of the basement
* bsmtexposure: walkout or garden level basement walls
* bsmtfintype1: quality of basement finished area
* bsmtfinsf1: type 1 finished square feet
* bsmtfintype2: quality of second finished area (if present)
* bsmtfinsf2: type 2 finished square feet
* bsmtunfsf: unfinished square feet of basement area
* totalbsmtsf: total square feet of basement area
* heating: type of heating
* heatingqc: heating quality and condition
* centralair: central air conditioning
* electrical: electrical system
* 1stflrsf: first floor square feet
* 2ndflrsf: second floor square feet
* lowqualfinsf: low quality finished square feet (all floors)
* grlivarea: above grade (ground) living area square feet
* bsmtfullbath: basement full bathrooms
* bsmthalfbath: basement half bathrooms
* fullbath: full bathrooms above grade
* halfbath: half baths above grade
* bedroom: number of bedrooms above basement level
* kitchen: number of kitchens
pg. - 11 -
* kitchenqual: kitchen quality
* totrmsabvgrd: total rooms above grade (does not include bathrooms)
* functional: home functionality rating fireplaces: number of fireplaces
* fireplacequ: fireplace quality
* garagetype: garage location
* garageyrblt: year garage was built
* garagefinish: interior finish of the garage
* garagecars: size of garage in car capacity
* garagearea: size of garage in square feet
* garagequal: garage quality
* garagecond: garage condition
* paveddrive: paved driveway
* wooddecksf: wood deck area in square feet
* openporchsf: open porch area in square feet
* enclosedporch: enclosed porch area in square feet
* 3ssnporch: three season porch area in square feet
* screenporch: screen porch area in square feet
* poolarea: pool area in square feet
* poolqc: pool quality
* fence: fence quality
* miscfeature: miscellaneous feature not covered in other categories
* miscval: $value of miscellaneous feature
* mosold: month sold
* yrsold: year sold
* saletype: type of sale
pg. - 12 -
* salecondition: condition of sale"""
#correlation matrix
import matplotlib.pyplot as plt
import seaborn as sns
corrmat = df_train.corr()
f, ax = plt.subplots(figsize=(15, 12))
sns.heatmap(corrmat, vmax=.8, square=true)
#saleprice correlation matrix
k = 10 #number of variables for heatmap
cols = corrmat.nlargest(k, 'saleprice')['saleprice'].index
cm = np.corrcoef(df_train[cols].values.t)
sns.set(font_scale=1.25)
plt.figure(figsize=(10,10))
hm = sns.heatmap(cm, cbar=true, annot=true, square=true, fmt='.2f',

annot_kws={'size': 10}, yticklabels=cols.values, xticklabels=cols.values)
plt.show()
"""**so what is log transformation:-log transformation is used to transform

skewed data to approximately conform to normality.**"""
'''#before log transformation
sns.distplot(df_train['saleprice']);
fig_saleprice = plt.figure(figsize=(12,5))
result1 = stats.probplot(df_train['saleprice'],plot = plt)'''
'''#applying log transformation
df_train['saleprice'] = np.log(df_train['saleprice'])'''
'''#after log transformation
sns.distplot(df_train['saleprice']);
pg. - 13 -
fig_saleprice2 = plt.figure(figsize=(12,5))
result3 = stats.probplot(df_train['saleprice'],plot = plt)'''
"""below code is used to see top 10 highly correlated columns with saleprice in
which overallqual,grlivearea,garagecars,garagearea,totalbsmtsf and 1stflrsf are
highly correlated"""
#below code is used to see which column is more correlated to dependent

variable so first ten columns are more correlated compare to other columns
corr = df_train.corr()["saleprice"]
corr[np.argsort(corr, axis=0)[::-1]]
"""# **outliers**
We are going to plot first 10 highly correlated columns to see how many outliers
we have in our dataset
"""
fig = plt.subplots()
plt.scatter(x = df_train['grlivarea'], y = df_train['saleprice'])
plt.ylabel('saleprice', fontsize=13)
plt.xlabel('grlivarea', fontsize=13)
plt.show()
fig1= plt.subplots()
plt.scatter(x = df_train['overallqual'], y = df_train['saleprice'])
plt.xlabel('overallqual', fontsize=13)
plt.show()
plt.scatter(x = df_train['garagecars'], y = df_train['saleprice'])
pg. - 14 -
plt.xlabel('garagecars', fontsize=13)
plt.show()
plt.scatter(x = df_train['garagearea'], y = df_train['saleprice'])
plt.xlabel('garagearea', fontsize=13)
plt.show()
plt.scatter(x = df_train['totalbsmtsf'], y = df_train['saleprice'])
plt.xlabel('totalbsmtsf', fontsize=13)
plt.show()
plt.scatter(x = df_train['1stflrsf'], y = df_train['saleprice'])
plt.xlabel('1stflrsf', fontsize=13)
plt.show()
plt.scatter(x = df_train['fullbath'], y = df_train['saleprice'])
plt.xlabel('fullbath', fontsize=13)
plt.show()
plt.scatter(x = df_train['totrmsabvgrd'], y = df_train['saleprice'])
pg. - 15 -
plt.xlabel('totrmsabvgrd', fontsize=13)
plt.show()
plt.scatter(x = df_train['yearbuilt'], y = df_train['saleprice'])
plt.xlabel('yearbuilt', fontsize=13)
plt.show()
'''#deleting outliers
df = df.drop(df[(df['grlivarea']>4000) & (df['saleprice']<300000)].index)
df = df.drop(df[(df['garagearea']>1200) & (df['saleprice']<500000)].index)
df = df.drop(df[(df['totalbsmtsf']>3000) & (df['saleprice']<700000)].index)
df = df.drop(df[(df['1stflrsf']>2700) & (df['1stflrsf']<700000)].index)'''
#scatterplot
sns.set()
columns = ['saleprice', 'overallqual', 'grlivarea', 'garagecars', 'totalbsmtsf',

'1stflrsf']
sns.pairplot(df_train[columns], size = 3)
plt.show();
"""# some feature engineering
Here I have merged some columns to just reduce complexity I have tried with all
the columns but I didn't get this much accuracy which I am getting right now
"""
#feature engineering
df_train['totalsf'] = df_train['totalbsmtsf']+df_train['1stflrsf']+df_train['2ndflrsf']
df_train=df_train.drop(columns={'1stflrsf', '2ndflrsf','totalbsmtsf'})
pg. - 16 -
df_train['wholeexterior'] = df_train['exterior1st']+df_train['exterior2nd']
df_train=df_train.drop(columns={'exterior1st','exterior2nd'})
df_train['bsmt'] = df_train['bsmtfinsf1']+ df_train['bsmtfinsf2']
df_train = df_train.drop(columns={'bsmtfinsf1','bsmtfinsf2'})
df_train['totalbathroom'] = df_train['fullbath'] + df_train['halfbath']
df_train = df_train.drop(columns={'fullbath','halfbath'})
df_test['totalsf'] = df_test['totalbsmtsf']+df_test['1stflrsf']+df_test['2ndflrsf']
df_test=df_test.drop(columns={'1stflrsf', '2ndflrsf','totalbsmtsf'})
df_test['wholeexterior'] = df_test['exterior1st']+df_test['exterior2nd']
df_test=df_test.drop(columns={'exterior1st','exterior2nd'})
df_test['bsmt'] = df_test['bsmtfinsf1']+ df_test['bsmtfinsf2']
df_test = df_test.drop(columns={'bsmtfinsf1','bsmtfinsf2'})
df_test['totalbathroom'] = df_test['fullbath'] + df_test['halfbath']
df_test = df_test.drop(columns={'fullbath','halfbath'})
"""**we're going to merge the datasets here before we start editing it so we don't
have to do these operations twice. Let’s call it features since it has features only.
so our data has 2919 observations and 79 features to begin with...**"""
frames = [df_train,df_test]
df = pd.concat(frames,keys=['train','test'])
"""there are 2919 observations with 76 columns. including the target variable
saleprice and id.the train set has 1460 observations while the test set has 1459
observations, the target variable saleprice is absent in test. the aim of this study is
to train a model on the train set and use it to predict the target saleprice of the test
set."""
df
df_missing=df.isnull().sum().sort_values(ascending=false)
pg. - 17 -
df_missing
"""now we are separating categorical columns and numerical columns for filling
missing values"""
cat_col = df.select_dtypes(include=['object'])
cat_col.isnull().sum()
cat_col.columns
num_col = df.select_dtypes(include=['int64', 'float64'])
num_col.isnull().sum()
num_col.columns
"""In below cell you have your numerical columns so I just replace nan by 0. I
have also tried mode, median and mean but I got best result in 0.if you want to do
it then just fork my notebook and apply that functions. If you want that other
function's code then just comment below I will give you the code in comment
section.
# handling missing data
# Numerical columns
"""
# handling missing values of numerical columns
df['lotfrontage'] = df['lotfrontage'].fillna(value=0)
df['garageyrblt'] = df['garageyrblt'].fillna(value=0)
df['masvnrarea'] = df['masvnrarea'].fillna(value=0)
df['bsmtfullbath'] = df['bsmtfullbath'].fillna(value=0)
df['bsmthalfbath'] = df['bsmthalfbath'].fillna(value=0)
df['garagearea'] = df['garagearea'].fillna(value=0)
df['garagecars'] = df['garagecars'].fillna(value=0)
pg. - 18 -
df['bsmtunfsf'] = df['bsmtunfsf'].fillna(value=0)
df['bsmt'] = df['bsmt'].fillna(value=0)
df['totalsf'] = df['totalsf'].fillna(value=0)
"""I have applied same technique as I applied in numerical columns where I put 0
and here i have replaced all the nan values with none. That means if the original
dataset have nan values, it means that the particular house is doesn't have that
thing. For example, if id no = 220 do not have garage then why we put values
that id no = 220 has a garage.
so i replaced them with none.
# Categorical columns
"""
# handling missing values of categorical columns
df['mszoning'] = df['mszoning'].fillna(value='none')
df['garagequal'] = df['garagequal'].fillna(value='none')
df['garagecond'] = df['garagecond'].fillna(value='none')
df['garagefinish'] = df['garagefinish'].fillna(value='none')
df['garagetype'] = df['garagetype'].fillna(value='none')
df['bsmtexposure'] = df['bsmtexposure'].fillna(value='none')
df['bsmtcond'] = df['bsmtcond'].fillna(value='none')
df['bsmtqual'] = df['bsmtqual'].fillna(value='none')
df['bsmtfintype2'] = df['bsmtfintype2'].fillna(value='none')
df['bsmtfintype1'] = df['bsmtfintype1'].fillna(value='none')
df['masvnrtype'] = df['masvnrtype'].fillna(value='none')
df['utilities'] = df['utilities'].fillna(value='none')
df['functional'] = df['functional'].fillna(value='none')
df['electrical'] = df['electrical'].fillna(value='none')
pg. - 19 -
df['kitchenqual'] = df['kitchenqual'].fillna(value='none')
df['saletype'] = df['saletype'].fillna(value='none')
df['wholeexterior'] = df['wholeexterior'].fillna(value='none')
"""top 40 correlated columns after data preprocessing"""
cols = corrmat.nlargest(k, 'saleprice')['saleprice'].index
cm = np.corrcoef(df_main[cols].values.t)
hm = sns.heatmap(cm, cbar=true, square=true, fmt='.2f', annot_kws={'size': 10},

yticklabels=cols.values, xticklabels=cols.values)
plt.show()
eid = df_main.loc['test']
df_test = df_main.loc['test']
df_train = df_main.loc['train']
eid = eid.id
df_test.drop(['saleprice','id'], axis =1, inplace=true)
x_train = df_train.drop(['saleprice','id'], axis = 1)
y_train = df_train['saleprice']
import xgboost
xgboost = xgboost.xgbregressor(learning_rate=0.05,
colsample_bytree = 0.5,
subsample = 0.8,
n_estimators=1000,
max_depth=5,
pg. - 20 -
gamma=5)
xgboost.fit(x_train, y_train)
y_pred = xgboost.predict(df_test)
y_pred
#making main csv file
main_submission = pd.dataframe({'id': eid, 'saleprice': y_pred})
main_submission.to_csv("submission.csv", index=false)
main_submission.head()
pg. - 21 -
OUTPUT SCREEN
***
cols = corrmat.nlargest(k, 'SalePrice')['SalePrice'].index
cm = np.corrcoef(df_train[cols].values.T)
hm = sns.heatmap(cm, cbar=True, annot=True, square=True, fmt='.2f',
annot_kws={'size': 10}, yticklabels=cols.values, xticklabels=cols.values)
plt.show()
OUTPUT:-
SalePrice
OverallQual
GrLiveArea
GarageCars
GarageArea
TotalBsmtSF
1stFlrSF
FullBath
TotRmsAbvGrd
YearBuilt
***
#values of correlation
abs(df_train.corr()['SalePrice']).nlargest(10)
***
pg. - 22 -
OUTPUT:-
SalePrice 1.000000
OverallQual 0.790982
GrLivArea 0.708624
GarageCars 0.640409
GarageArea 0.623431
TotalBsmtSF 0.613581
1stFlrSF 0.605852
FullBath 0.560664
TotRmsAbvGrd 0.533723
YearBuilt 0.522897
Name: SalePrice, dtype: float64
***
#sum of missing data

df.isnull().sum().sort_values(ascending=False)
***
OUTPUT:-
SalePrice :1459
MSZoning: 4
LotFrontage: 486
Alley: 2721
Utilities: 2
Exterior1st: 1
Exterior2nd: 1
MasVnrType: 24
MasVnrArea: 23
BsmtQual: 81
BsmtCond: 82
BsmtExposure: 82
BsmtFinType1: 79
BsmtFinSF1: 79
BsmtFinType2: 80
pg. - 23 -
BsmtFinSF2: 1
BsmtUnfSF: 1
TotalBsmtSF: 1
Electrical: 1
BsmtFullBath: 2
BsmtHalfBath: 2
KitchenQual: 1
Functional: 2
FireplaceQu: 1420
GarageType: 157
GarageYrBlt: 159
GarageFinish: 159
GarageCars: 1
GarageArea: 1
GarageQual: 159
GarageCond: 159
PoolQC: 2909
Fence: 2348
MiscFeature:2814
MoSold: Month Sold
YrSold: Year Sold
SaleType: 1
Length: 36, dtype: int64
#encoded
df_main = pd.get_dummies(df)
df_main.shape
***
OUTPUT:-
(2919, 339)
pg. - 24 -
***
#rmse
y_test = y_train.drop([10], axis=0)
from math import sqrt
print('xgb rmse', sqrt(mean_squared_error(y_test, y_pred1)))
print('gbr rmse', sqrt(mean_squared_error(y_test, y_pred2)))

print('rf rmse', sqrt(mean_squared_error(y_test, y_pred3)))
print('lightgbm rmse', sqrt(mean_squared_error(y_test, y_pred4)))
print('svr rmse:', sqrt(mean_squared_error(y_test, y_pred5)))
print('stacked rmse:', sqrt(mean_squared_error(y_test, y_pred6)))
***
OUTPUT:-
xgb rmse: 0.1223501568206363

gbr rmse :0.5585375883105338
rf rmse : 0.43600854434323927
lightgbm rmse : 0.5596622356678556
SVR rmse: 0.5246953605047906
stacked rmse: 0.5026308085477498
pg. - 25 -
CONCLUSION
• In today’s real estate world, it has become tough to store such huge data and
extract them for one’s own requirement. Also, the extracted data should be
useful. The system makes optimal use of the Linear Regression Algorithm. The
system makes use of such data in the most efficient way. The linear regression
algorithm helps to fulfill customers by increasing the accuracy of estate choice
and reducing the risk of investing in an estate.
• A lot’s of features that could be added to make the system more widely
acceptable. One of the major future scopes is adding estate database of more
cities which will provide the user to explore more estates and reach an accurate
decision. More factors like recession that affect the house prices shall be added.
In-depth details of every property will be added to provide ample details of a
desired estate. This will help the system to run on a larger level.
pg. - 26 -
REFERENCES
• WIKIPEDIA
• HTTPS://WWW.CRIO.DO/
• HTTPS://WWW.GEEKSFORGEEKS.ORG/
• HTTPS://WWW.KAGGLE.COM/
• HTTPS://WWW.GITHUB.COM/
*********
pg. - 27 -

Ip Project Kavi Priyan

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ip Project Kavi Priyan

Uploaded by

Copyright:

Available Formats

VEDIC VIDYASHRAM SENIOR SECONDARY SCHOOL

Madurai Road, Thachanallur, Tirunelveli - 627 358

HOUSE PRICE PREDICTION

Submitted in partial fulfillment of the requirement

Practicals of Senior Secondary (CBSE)

Madurai Road, Thachanallur, Tirunelveli - 627 358

This is to certify that the Project Work entitled “HOUSE PRICE

DATE: INTERNAL EXAMINER

PRINCIPAL EXTERNAL EXAMINER

● It is with a sense of gratitude, I acknowledge the efforts of the entire host

We hereby declare that the project work entitled HOUSE PRICE

PREDICTION submitted to DEPARTMENT OF INFORMATICS

PRACTICES, VEDIC VIDYASHRAM SENIOR SECONDARY

SCHOOL is a result of my own work and my indebtedness to other work

publications, references, if any, have been duly acknowledged.

03. PROBLEM DEFINITION 1

04. PROJECT STAGES 2

06. EXISTING AND PROPOSED SYSTEM 4

HARDWARE AND SOFTWARE

08. WORKING DESCRIPTION 8

10. OUTPUT SCREENS 24

• This application will help customers to invest in an estate without

The project consists of the following stages:

• Create a machine learning model using linear regression

and Boston housing dataset while following the machine learning

● Exploring and analyzing the data used for making prediction

● Creating a simple model using linear regression

● Using the model to carryout prediction and evaluating it's

• Particle swarm optimization (PSO) is proposed to find the coefficients aimed

CPUs Intel Dual Core

RAM 2GB (minimum) - 4GB (recommended)

Disk Storage 500GB

Automatically generated by colaboratory.

Original file is located at

My main objectives on this project are:

* getting data in better shape by transforming and feature engineering to help us

for dirname, _, filenames in os.walk('/kaggle/input'):

for filename in filenames:

"""# meeting the data

import matplotlib.pyplot as plt

import seaborn as sns

from scipy import stats

"""We have 1460 observations of 80 variables in the training dataframe. The

saleprice - this is the target variable/dependent variable that you're trying to

* mssubclass: the building class

* mszoning: the general zoning classification

* lotarea: lot size in square feet

* street: type of road access

* alley: type of alley access

* lotshape: general shape of property

* landcontour: flatness of the property

* utilities: type of utilities available

* lotconfig: lot configuration

* landslope: slope of property

* neighborhood: physical locations within ames city limits

* condition1: proximity to main road or railroad

* condition2: proximity to main road or railroad (if a second is present)

* bldgtype: type of dwelling

* housestyle: style of dwelling

* overallqual: overall material and finish quality

* overallcond: overall condition rating

* yearbuilt: original construction date