SML - Assignment 2 (B) - Multiple Linear Regressionipynb - Colaboratory

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

2/10/23, 2:45 PM SML_Assignment 2(b)_Multiple Linear Regressionipynb - Colaboratory

Name: Yoginii Waykole


PRN: 21070126114
AIML B2

Importing necessary libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Reading dataset and understanding it

df = pd.read_csv('/content/Housing.csv')
df

price area bedrooms bathrooms stories mainroad guestroom basement h

0 13300000 7420 4 2 3 yes no no

1 12250000 8960 4 4 4 yes no no

2 12250000 9960 3 2 2 yes no yes

3 12215000 7500 4 2 2 yes no yes

4 11410000 7420 4 1 2 yes yes yes

... ... ... ... ... ... ... ... ...

540 1820000 3000 2 1 1 yes no yes

541 1767150 2400 3 1 1 no no no

542 1750000 3620 2 1 1 yes no no

543 1750000 2910 3 1 1 no no no

544 1750000 3850 3 1 2 yes no no

545 rows × 13 columns

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 545 entries, 0 to 544
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 price 545 non-null int64
1 area 545 non-null int64
2 bedrooms 545 non-null int64
3 bathrooms 545 non-null int64
4 stories 545 non-null int64
5 mainroad 545 non-null object
6 guestroom 545 non-null object
7 basement 545 non-null object
8 hotwaterheating 545 non-null object
9 airconditioning 545 non-null object
10 parking 545 non-null int64
11 prefarea 545 non-null object
12 furnishingstatus 545 non-null object
dtypes: int64(6), object(7)
memory usage: 55.5+ KB

df.isnull().sum()

price 0
area 0
bedrooms 0
bathrooms 0
stories 0
mainroad 0
guestroom 0
basement 0
hotwaterheating 0
airconditioning 0
parking 0
prefarea 0

https://colab.research.google.com/drive/1v4TyK7poX4Ufl-y67dO4bnBIK4I7pqhm#scrollTo=x6-D8ad_bMu4&printMode=true 1/7
2/10/23, 2:45 PM SML_Assignment 2(b)_Multiple Linear Regressionipynb - Colaboratory
furnishingstatus 0
dtype: int64

Encoding

#Applying encoding
from sklearn.preprocessing import LabelEncoder
# label_encoder object knows how to understand word labels.
label_encoder = LabelEncoder()

# Encode labels in column 'mainroad'
df['mainroad'] = label_encoder.fit_transform(df['mainroad'])
df

price area bedrooms bathrooms stories mainroad guestroom basement h

0 13300000 7420 4 2 3 1 no no

1 12250000 8960 4 4 4 1 no no

2 12250000 9960 3 2 2 1 no yes

3 12215000 7500 4 2 2 1 no yes

4 11410000 7420 4 1 2 1 yes yes

... ... ... ... ... ... ... ... ...

540 1820000 3000 2 1 1 1 no yes

541 1767150 2400 3 1 1 0 no no

542 1750000 3620 2 1 1 1 no no

543 1750000 2910 3 1 1 0 no no

544 1750000 3850 3 1 2 1 no no

545 rows × 13 columns

# Encode labels 
df['guestroom'] = label_encoder.fit_transform(df['guestroom'])
df['basement'] = label_encoder.fit_transform(df['basement'])
df['hotwaterheating'] = label_encoder.fit_transform(df['hotwaterheating'])
df['airconditioning'] = label_encoder.fit_transform(df['airconditioning'])
df['prefarea'] = label_encoder.fit_transform(df['prefarea'])
df

price area bedrooms bathrooms stories mainroad guestroom basement h

0 13300000 7420 4 2 3 1 0 0

1 12250000 8960 4 4 4 1 0 0

2 12250000 9960 3 2 2 1 0 1

3 12215000 7500 4 2 2 1 0 1

4 11410000 7420 4 1 2 1 1 1

... ... ... ... ... ... ... ... ...

540 1820000 3000 2 1 1 1 0 1

541 1767150 2400 3 1 1 0 0 0

542 1750000 3620 2 1 1 1 0 0

543 1750000 2910 3 1 1 0 0 0

544 1750000 3850 3 1 2 1 0 0

545 rows × 13 columns

df['furnishingstatus'] = label_encoder.fit_transform(df['furnishingstatus'])
df

# 0 - furnished
# 1 - semi-furnished
# 2 - unfurnished

https://colab.research.google.com/drive/1v4TyK7poX4Ufl-y67dO4bnBIK4I7pqhm#scrollTo=x6-D8ad_bMu4&printMode=true 2/7
2/10/23, 2:45 PM SML_Assignment 2(b)_Multiple Linear Regressionipynb - Colaboratory

price area bedrooms bathrooms stories mainroad guestroom basement h

0 13300000 7420 4 2 3 1 0 0

1 12250000 8960 4 4 4 1 0 0

2 12250000 9960 3 2 2 1 0 1

3 12215000 7500 4 2 2 1 0 1

4 11410000 7420 4 1 2 1 1 1

... ... ... ... ... ... ... ... ...

540 1820000 3000 2 1 1 1 0 1

541 1767150 2400 3 1 1 0 0 0

542 1750000 3620 2 1 1 1 0 0

543 1750000 2910 3 1 1 0 0 0

544 1750000 3850 3 1 2 1 0 0


EDA545 rows × 13 columns

fig, ax = plt.subplots(figsize=(14,14)) 
sns.heatmap(df.corr(),annot = True)

<matplotlib.axes._subplots.AxesSubplot at 0x7fcaf38dc7f0>

sns.jointplot(data=df, x="area", y="price", hue="furnishingstatus",palette = 'dark:salmon_r')

https://colab.research.google.com/drive/1v4TyK7poX4Ufl-y67dO4bnBIK4I7pqhm#scrollTo=x6-D8ad_bMu4&printMode=true 3/7
2/10/23, 2:45 PM SML_Assignment 2(b)_Multiple Linear Regressionipynb - Colaboratory

<seaborn.axisgrid.JointGrid at 0x7fcae4790760>

sns.scatterplot(df["area"], df["price"], hue=df["parking"], style=df['guestroom'],palette = "rainbow")
plt.show()

/usr/local/lib/python3.8/dist-packages/seaborn/_decorators.py:36: FutureWarning: P
warnings.warn(

sns.lineplot(data=df, x="furnishingstatus", y="area")

<matplotlib.axes._subplots.AxesSubplot at 0x7fcae3a3e340>

sns.lineplot(data=df, x="parking", y="area", hue="furnishingstatus")

<matplotlib.axes._subplots.AxesSubplot at 0x7fcae3777310>

sns.relplot(
    data=df, x="price", y="area",
    col="furnishingstatus", hue="guestroom",

https://colab.research.google.com/drive/1v4TyK7poX4Ufl-y67dO4bnBIK4I7pqhm#scrollTo=x6-D8ad_bMu4&printMode=true 4/7
2/10/23, 2:45 PM SML_Assignment 2(b)_Multiple Linear Regressionipynb - Colaboratory
    kind="line"
)
<seaborn.axisgrid.FacetGrid at 0x7fcae3deeca0>

Setting values for X and Y

#x: independent y: dependent
x = df[['area','bedrooms','bathrooms','stories','mainroad','guestroom','basement','hotwaterheating','airconditioning','parking','prefarea
y = df['price']

#splitting the data in testing and training dataset
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(
    x,y, test_size = 0.2, random_state = 12
)

Implementing Linear Model

#fitting the data into the model
from sklearn.linear_model import LinearRegression
mlr = LinearRegression()
mlr.fit(x_train, y_train)

LinearRegression()

#accuracy
mlr.score(x_test, y_test)

0.7246609398021734

#model equation
print("Intercept: ", mlr.intercept_)
print("Coefficients: ")
list(zip(x, mlr.coef_))

Intercept: 502258.74141358025
Coefficients:
[('area', 222.41271177955767),
('bedrooms', 113574.43546379649),
('bathrooms', 813331.5786976352),
('stories', 467348.5404731219),
('mainroad', 381404.2946666891),
('guestroom', 472352.84768597595),
('basement', 302431.6627455462),
('hotwaterheating', 1021601.4179976879),
('airconditioning', 842621.5174165794),
('parking', 228050.62874560902),
('prefarea', 528097.0767752184),
('furnishingstatus', -228726.74398788757)]

#Prediction of test set
y_pred_mlr = mlr.predict(x_test)
#predicted values
print("Prediction for test set:{}".format(y_pred_mlr))

Prediction for test set:[5044036.12146487 3371988.0496655 5819936.06548105 5178520.76992238


4894046.89553338 6980390.20618904 2670368.5666494 6930621.51088423

https://colab.research.google.com/drive/1v4TyK7poX4Ufl-y67dO4bnBIK4I7pqhm#scrollTo=x6-D8ad_bMu4&printMode=true 5/7
2/10/23, 2:45 PM SML_Assignment 2(b)_Multiple Linear Regressionipynb - Colaboratory
5296331.5817033 5842080.65607399 4473955.86429059 3739062.74251273
7002947.32102297 7572489.03595826 3061312.63778014 4444935.88613103
7843791.72983743 5039931.36831411 4441902.20458413 3880361.16777741
6521215.67067494 6668814.70586545 9153743.85068458 4117185.34141732
5662698.47710452 4926717.98702026 4555979.57362845 3646616.41890544
3829564.59321317 3634837.49184675 4245618.3836221 4375498.61617406
5229805.47200164 3912333.94482799 3029522.09879963 4359239.54304868
2641310.96166184 5501698.36908181 4640417.34369324 5924666.68483171
5029623.69055809 3145127.08365035 4041494.40525828 3522886.46376994
6374423.28090043 3222631.04877269 4580852.94634391 6440087.96187757
3260044.09860128 3061162.18005005 6460243.48461596 4917436.25126966
3915626.97653834 3223379.50719291 3584018.19937922 2739172.55484484
3360003.17150916 4976262.57677354 4440307.02646053 3019054.22254203
2770883.15825424 4061421.26909796 7239166.51406275 4691072.00409443
6546095.62127606 4529339.67348955 2823689.38532108 2426716.20082982
3163622.48519874 2779206.84296516 4961727.67492128 4107233.37967946
3597724.27107283 3765259.70931744 2844673.77939992 5208462.30895539
4066009.42882868 3526078.70199041 6553662.39734136 4048216.41188632
7700349.72782504 5423853.07781062 4178840.82474062 7924718.49779782
8070889.30758449 5819774.49285125 9980272.55581673 4800827.13431684
7971859.47329146 5802776.30646289 4215714.64562341 6102368.52120898
4358440.23456435 2881660.64283998 3477905.26466323 3060636.52253787
5223108.68174778 4903180.52462505 5099072.1057637 4372881.89467666
4480861.55322468 4055708.86093821 7711519.36174872 4005130.80916413
6726166.75738259 3931777.65442362 3371250.45449106 2893063.12416501
3045241.4463543 ]

mlr_diff = pd.DataFrame({'Actual value': y_test, 'Predicted value': y_pred_mlr})
mlr_diff

Actual value Predicted value

298 4200000 5.044036e+06

372 3640000 3.371988e+06

14 9240000 5.819936e+06

168 5250000 5.178521e+06

200 4900000 4.894047e+06

... ... ...

12 9310000 6.726167e+06

397 3500000 3.931778e+06

544 1750000 3.371250e+06

537 1890000 2.893063e+06

445 3150000 3.045241e+06

109 rows × 2 columns

#model evaluation
from sklearn import metrics
meanAbErr = metrics.mean_absolute_error(y_test, y_pred_mlr)
meanSqErr = metrics.mean_squared_error(y_test, y_pred_mlr)
rootMeanSqErr = np.sqrt(metrics.mean_squared_error(y_test, y_pred_mlr))
print("R squared {:.2f}".format(mlr.score(x,y)*100))
print("Mean Absolute Error: ",meanAbErr)
print("Mean Square Error: ",meanSqErr)
print("Root Mean Square Error",rootMeanSqErr)

R squared 67.31
Mean Absolute Error: 828404.4775206818
Mean Square Error: 1460126418767.4592
Root Mean Square Error 1208356.9086852854

https://colab.research.google.com/drive/1v4TyK7poX4Ufl-y67dO4bnBIK4I7pqhm#scrollTo=x6-D8ad_bMu4&printMode=true 6/7
2/10/23, 2:45 PM SML_Assignment 2(b)_Multiple Linear Regressionipynb - Colaboratory

check 0s completed at 2:43 PM

https://colab.research.google.com/drive/1v4TyK7poX4Ufl-y67dO4bnBIK4I7pqhm#scrollTo=x6-D8ad_bMu4&printMode=true 7/7

You might also like