Capstone Project Report

HOUSE PRICE PREDICTION
Structure:
1. Introduction
2. Data Loading
3. EDA - Univariate
4. EDA - Bivariate
5. Data Preprocessing
6. Model Building with Dataset-1
7. Hypertuning Dataset-1
8. Summary - Dataset-1
9. Model Building with Dataset-2
10. Hypertuning Dataset-2
11. Summary - Dataset -2
12. Conclusion
13. Pickle file creation
Note:
Dataset - 1 = 22 features
['price', 'room_bed', 'room_bath', 'living_measure', 'lot_measure', 'ceil', 'coast', 'sight', 'condition', 'quality', 'ceil_measure', 'basement', 'yr_built',
'living_measure15', 'lot_measure15', 'furnished', 'total_area', 'month_year', 'City', 'has_basement', 'HouseLandRatio', 'has_renovated']
Dataset - 2 = 31 features (important features after imputing dummy and analyzing different models)
['price', 'room_bed', 'room_bath', 'living_measure', 'lot_measure', 'ceil', 'sight', 'condition', 'ceil_measure', 'basement', 'yr_built', 'yr_renovated', 'zipcode', 'lat',
'long', 'living_measure15', 'lot_measure15', 'total_area', 'coast_1', 'quality_3', 'quality_4', 'quality_5', 'quality_6', 'quality_7', 'quality_8', 'quality_9',
'quality_10', 'quality_11', 'quality_12', 'quality_13', 'furnished_1'
Prerequisites for the running the file:

Below are 2 files needed to be added to you current working directory.
1. Need to add file USA ZipCodes_1.xlsx to current working directory to access this data
2. Add the folder WA to your current working directory
3. Install below 2 libraries
conda install -c conda-forge/label/cf201901 geopandas
conda install -c conda-forge/label/cf201901 shapely

This Jupyter Notebook is done as part of PGPML Great Learning Programme for Capstone Project. Let's first, define the problem, objective of
this excercise.
We have the problem statment well defined in the given document which is as follows
INTRODUCTION
Problem Statement
As a house value is simply more than location and square footage. Like the features that make up a person, an educated party would want to
know all aspects that give a house its value. For example, if we want to sell a house and we don't know the price which we can take, as it can't
be too low or too high. To find house price we usually try to find similar properties in our neighbourhood and based on collected data we trying
to assess our house price.
Problem Definition
When any person/business wants to sell or buy a house, they always face this kind of issue as they don't know the price which they should
offer. Due to this they might be offering too low or high for the property. Therefore, we can analyze the available data of the properties in the
area and can predict the price. We need to find how these attributes influence the house prices Right pricing is very imporatnt aspect to sell
house. It is very important to understand what are the factors and how they influence the house price. Objective is to predict the right price of
the house based on the attributes
Objective
Build model which will predict the house price when required features passed to the model. So we will
Find out the significant features from the given features dataset which affects the house price the most.
Build best feasible model to predict the house price with 95% confidence level
Business Reason
As people don't know the features/aspects which commulate property price, we can provide them HouseBuyingSelling guiding services in the
area so they can buy or sell their property with most suitable price tag and they didn't lose their hard earned money by offering low price or
keep waiting for buyers by putting high prices.
DATA LOADING
First, we will load the data from the given csv(comma seperated values) file provided as part of the Capstone Project.
In [2]:
# loading the library required for data loading and processing

import pandas as pd
import numpy as np
#Supress warnings
import warnings
warnings.filterwarnings('ignore')
# read the data using pandas function from 'innercity.csv' file

house_df = pd.read_csv('innercity.csv')
In [3]:
# let's check whether data loaded successfully or not, by checking first few records
house_df.head()
Out[3]:
cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... basement yr_built yr_renovated zipcode
0 3034200666 20141107T000000 808100 4 3.25 3020 13457 1.0 0 0 ... 0 1956 0 98133
1 8731981640 20141204T000000 277500 4 2.50 2550 7500 1.0 0 0 ... 800 1976 0 98023
2 5104530220 20150420T000000 404000 3 2.50 2370 4324 2.0 0 0 ... 0 2006 0 98038
3 6145600285 20140529T000000 300000 2 1.00 820 3844 1.0 0 0 ... 0 1916 0 98133
4 8924100111 20150424T000000 699000 2 1.50 1400 4050 1.0 0 0 ... 0 1954 0 98115
5 rows × 23 columns
Data is loaded successfully as we can see first 5 records from the dataset.
Data Understanding
After loading data into our pandas library dataframe, we can now try to understand the kind of data we have with us.
In [4]:
# print the number of records and features/aspects we have in the provided file
house_df.shape
Out[4]:
(21613, 23)
We have more than 21k records having 23 features
In [5]:
# let's check out the columns/features we have in the dataset
house_df.columns
Out[5]:
Index(['cid', 'dayhours', 'price', 'room_bed', 'room_bath', 'living_measure',

'lot_measure', 'ceil', 'coast', 'sight', 'condition', 'quality',
'ceil_measure', 'basement', 'yr_built', 'yr_renovated', 'zipcode',
'lat', 'long', 'living_measure15', 'lot_measure15', 'furnished',
'total_area'],
dtype='object')
From the above we can see the different columns we have in dataset.
These columns provide below information
1. cid: Notation for a house. Will not of our use. So we will drop this column
2. dayhours: Represents Date, when house was sold.
3. price: It's our TARGET feature, that we have to predict based on other featues
4. room_bed: Represents number of bedrooms in a house
5. room_bath: Represents number of bathrooms
6. living_measure: Represents square footage of house
7. lot_measure: Represents square footage of lot
8. ceil: Represents number of floors in house
9. coast: Represents whether house has waterfront view. It seems to be a categorical variable. We will see in our further data analysis
10. sight: Represents how many times sight has been viewed.
11. condition: Represents the overall condition of the house. It's kind of rating given to the house.
12. quality: Represents grade given to the house based on grading system
13. ceil_measure: Represents square footage of house apart from basement
14. basement: Represents square footage of basement
15. yr_built: Represents the year when house was built
16. yr_renovated: Represents the year when house was last renovated
17. zipcode: Represents zipcode as name implies
18. lat: Represents Lattitude co-ordniates
19. long: Represents Longitude co-ordinates
20. living_measure15: Represents square footage of house, when measured in 2015 year as house area may or may not changed after
renovation if any happened
21. lot_measure15: Represents square footage of lot, when measured in 2015 year as lot area may or may not change after renovation if any
done
22. furnished: Tells whether house is furnished or not. It seems to be categorical variable as description implies
23. total_area: Represents total area i.e. area of both living and lot
In [6]:
# let's see the data types of the features

house_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21613 entries, 0 to 21612
Data columns (total 23 columns):
cid 21613 non-null int64
dayhours 21613 non-null object
price 21613 non-null int64
room_bed 21613 non-null int64
room_bath 21613 non-null float64
living_measure 21613 non-null int64
lot_measure 21613 non-null int64
ceil 21613 non-null float64
coast 21613 non-null int64
sight 21613 non-null int64
condition 21613 non-null int64
quality 21613 non-null int64
ceil_measure 21613 non-null int64
basement 21613 non-null int64
yr_built 21613 non-null int64
yr_renovated 21613 non-null int64
zipcode 21613 non-null int64
lat 21613 non-null float64
long 21613 non-null float64
living_measure15 21613 non-null int64
lot_measure15 21613 non-null int64
furnished 21613 non-null int64
total_area 21613 non-null int64
dtypes: float64(4), int64(18), object(1)
memory usage: 3.8+ MB
In the dataset, we have more than 21k records and 23 columns, out of which
4 features are of float type
18 features are of integer type
1 feature is of object type (we may need to convert this object type to specific datatype)
In [7]:
# let's check whether our dataset have any null/missing values

house_df.isnull().sum()
Out[7]:
cid 0
dayhours 0
price 0
room_bed 0
room_bath 0
living_measure 0
lot_measure 0
ceil 0
coast 0
sight 0
condition 0
quality 0
ceil_measure 0
basement 0
yr_built 0
yr_renovated 0
zipcode 0
lat 0
long 0
living_measure15 0
lot_measure15 0
furnished 0
total_area 0
dtype: int64
We don't have any null or missing values for any of the columns
In [8]:
# let's check whether there's any duplicate record in our dataset or not. If present, we have to remove them
house_df.duplicated().sum()
Out[8]:
0
We don't have any duplicate record in out dataset. So we can say we have more than 21k Unique records
In [9]:
# let's do the 5 - factor analysis of the features
house_df.describe().transpose()
Out[9]:
count mean std min 25% 50% 75% max
cid 21613.0 4.580302e+09 2.876566e+09 1.000102e+06 2.123049e+09 3.904930e+09 7.308900e+09 9.900000e+09
price 21613.0 5.401822e+05 3.673622e+05 7.500000e+04 3.219500e+05 4.500000e+05 6.450000e+05 7.700000e+06
room_bed 21613.0 3.370842e+00 9.300618e-01 0.000000e+00 3.000000e+00 3.000000e+00 4.000000e+00 3.300000e+01
room_bath 21613.0 2.114757e+00 7.701632e-01 0.000000e+00 1.750000e+00 2.250000e+00 2.500000e+00 8.000000e+00
living_measure 21613.0 2.079900e+03 9.184409e+02 2.900000e+02 1.427000e+03 1.910000e+03 2.550000e+03 1.354000e+04
lot_measure 21613.0 1.510697e+04 4.142051e+04 5.200000e+02 5.040000e+03 7.618000e+03 1.068800e+04 1.651359e+06
ceil 21613.0 1.494309e+00 5.399889e-01 1.000000e+00 1.000000e+00 1.500000e+00 2.000000e+00 3.500000e+00
coast 21613.0 7.541757e-03 8.651720e-02 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00
sight 21613.0 2.343034e-01 7.663176e-01 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 4.000000e+00
condition 21613.0 3.409430e+00 6.507430e-01 1.000000e+00 3.000000e+00 3.000000e+00 4.000000e+00 5.000000e+00
quality 21613.0 7.656873e+00 1.175459e+00 1.000000e+00 7.000000e+00 7.000000e+00 8.000000e+00 1.300000e+01
ceil_measure 21613.0 1.788391e+03 8.280910e+02 2.900000e+02 1.190000e+03 1.560000e+03 2.210000e+03 9.410000e+03
basement 21613.0 2.915090e+02 4.425750e+02 0.000000e+00 0.000000e+00 0.000000e+00 5.600000e+02 4.820000e+03
yr_built 21613.0 1.971005e+03 2.937341e+01 1.900000e+03 1.951000e+03 1.975000e+03 1.997000e+03 2.015000e+03
yr_renovated 21613.0 8.440226e+01 4.016792e+02 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 2.015000e+03
zipcode 21613.0 9.807794e+04 5.350503e+01 9.800100e+04 9.803300e+04 9.806500e+04 9.811800e+04 9.819900e+04
lat 21613.0 4.756005e+01 1.385637e-01 4.715590e+01 4.747100e+01 4.757180e+01 4.767800e+01 4.777760e+01
long 21613.0 -1.222139e+02 1.408283e-01 -1.225190e+02 -1.223280e+02 -1.222300e+02 -1.221250e+02 -1.213150e+02
living_measure15 21613.0 1.986552e+03 6.853913e+02 3.990000e+02 1.490000e+03 1.840000e+03 2.360000e+03 6.210000e+03
lot_measure15 21613.0 1.276846e+04 2.730418e+04 6.510000e+02 5.100000e+03 7.620000e+03 1.008300e+04 8.712000e+05
furnished 21613.0 1.966872e-01 3.975030e-01 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00
total_area 21613.0 1.718687e+04 4.158908e+04 1.423000e+03 7.035000e+03 9.575000e+03 1.300000e+04 1.652659e+06

1. CID: House ID/Property ID.Not used for analysis
2. Dayhours: 5 factor analysis is reflecting for this column
3. price: Our taget column value is in 75k - 7700k range. As Mean > Median, it's Right-Skewed.
4. room_bed: Number of bedrooms range from 0 - 33. As Mean slightly > Median, it's slightly Right-Skewed.
5. room_bath: Number of bathrooms range from 0 - 8. As Mean slightly < Median, it's slightly Left-Skewed.
6. living_measure: Square footage of house range from 290 - 13,540. As Mean > Median, it's Right-Skewed.
7. lot_measure: Square footage of lot range from 520 - 16,51,359. As Mean almost double of Median, it's Hightly Right-Skewed.
8. ceil: Number of floors range from 1 - 3.5 As Mean ~ Median, it's almost Normal Distributed.
9. coast: As this value represent whether house has waterfront view or not. It's categorical column. From above analysis we got know, very
few houses has waterfront view.
10. sight: Value ranges from 0 - 4. As Mean > Median, it's Right-Skewed
11. condition: Represents rating of house which ranges from 1 - 5. As Mean > Median, it's Right-Skewed
12. quality: Representign grade given to house which range from 1 - 13. As Mean > Median, it's Right-Skewed.
13. ceil_measure: Square footage of house apart from basement ranges in 290 - 9,410. As Mean > Median, it's Right-Skewed.
14. basement: Square footage house basement ranges in 0 - 4,820. As Mean highlty > Median, it's Highly Right-Skewed.
15. yr_built: House built year ranges from 1900 - 2015. As Mean < Median, it's Left-Skewed.
16. yr_renovated: House renovation year only 2015. So this column can be used as Categorical Variable for knowing whether house is
renovated or not.
17. zipcode: House ZipCode ranges from 98001 - 98199. As Mean > Median, it's Right-Skewed.
18. lat: Lattitude ranges from 47.1559 - 47.7776 As Mean < Median, it's Left-Skewed.
19. long: Longittude ranges from -122.5190 to -121.315 As Mean > Median, it's Right-Skewed.
20. living_measure15: Value ragnes from 399 to 6,210. As Mean > Median, it's Right-Skewed.
21. lot_measure15: Value ragnes from 651 to 8,71,200. As Mean highly > Median, it's Highly Right-Skewed.
22. furnished: Representing whether house is furnished or not. It's a Categorical Variable
23. total_area Total area of house ranges from 1,423 to 16,52,659. As Mean is almost double of Median, it's Highly Right-Skewed
From above analysis we got to know,
Most columns distribution is Right-Skewed and only few features are Left-Skewed (like room_bath, yr_built, lat).
We have columns which are Categorical in nature are -> coast, yr_renovated, furnished
Exploratory Data Analysis

Let's do some visual data analysis of the features
Univariate Analysis - By BoxPlot
In [10]:
#let's first import the required libraries for the plots

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
# size of plots to make it uniform throughout our analysis in the notebook

plotSizeX = 12
plotSizeY = 6
# let's boxplot all the numerical columns and see if there any outliers
for i in house_df.iloc[:, 2:].columns:
house_df.iloc[:, 1:].boxplot(column=i)
plt.show()
We can see, there are lot of features which have outliers. So we might need to treat those before building model
Analyzing Feature: cid

In [11]:
#cid - CID is appearing muliple times, it seems data contains house which is sold multiple times
cid_count=house_df.cid.value_counts()
cid_count[cid_count>1].shape
Out[11]:
(176,)
We have 176 properties that were sold more than once in the given data
Analyzing Feature: dayhours
In [12]:
#we will create new data frame that can be used for modeling
#We will convert the dayhours to 'month_year' as sale month-year is relevant for analysis
house_dfr=house_df.copy()
house_df.dayhours=house_df.dayhours.str.replace('T000000', "")
house_df.dayhours=pd.to_datetime(house_df.dayhours,format='%Y%m%d')
house_df['month_year']=house_df['dayhours'].apply(lambda x: x.strftime('%B-%Y'))
house_df['month_year'].head()
Out[12]:
0 November-2014
1 December-2014
2 April-2015
3 May-2014
4 April-2015
Name: month_year, dtype: object
We successfully converted dayhours feature to month_year for better analysis.
In [13]:
house_df['month_year'].value_counts()
Out[13]:
April-2015 2231
July-2014 2211
June-2014 2180
August-2014 1940
October-2014 1878
March-2015 1875
September-2014 1774
May-2014 1768
December-2014 1471
November-2014 1411
February-2015 1250
January-2015 978
May-2015 646
Name: month_year, dtype: int64
We can see, most houses sold in April, July month
In [14]:
house_df.groupby(['month_year'])['price'].agg('mean')
Out[14]:
month_year
April-2015 561933.463021
August-2014 536527.039691
December-2014 524602.893270
February-2015 507919.603200
January-2015 525963.251534
July-2014 544892.161013
June-2014 558123.736239
March-2015 544057.683200
May-2014 548166.600113
May-2015 558193.095975
November-2014 522058.861800
October-2014 539127.477636
September-2014 529315.868095
Name: price, dtype: float64
So the time line of the sale data of the properties is from May-2014 to May-2015 and April month have the highest mean price.
Analyzing Feature: Price (our Target)
In [15]:
house_df.price.describe()
Out[15]:
count 2.161300e+04
mean 5.401822e+05
std 3.673622e+05
min 7.500000e+04
25% 3.219500e+05
50% 4.500000e+05
75% 6.450000e+05
max 7.700000e+06
Name: price, dtype: float64
In [16]:
plt.figure(figsize=(plotSizeX, plotSizeY))
sns.distplot(house_df['price'])
Out[16]:
<matplotlib.axes._subplots.AxesSubplot at 0x225ef84d550>
The Price is ranging from 75,000 to 77,00,000 and distribution is right-skewed.
Analyzing Feature: room_bed
In [17]:
house_df['room_bed'].value_counts()
Out[17]:
3 9824
4 6882
2 2760
5 1601
6 272
1 199
7 38
8 13
0 13
9 6
10 3
11 1
33 1
Name: room_bed, dtype: int64
The value of 33 seems to be outlier we need to check the data point before imputing the same
In [18]:
house_df[house_df['room_bed']==33]
Out[18]:
cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... yr_built yr_renovated zipcode lat long
2014-06-
750 2402100895 640000 33 1.75 1620 6000 1.0 0 0 ... 1947 0 98103 47.6878 -122.331
25
Will delete this data point after bivariate analysis as it looks to be an outlier as it has low price for 33 bed room property
In [19]:
sns.countplot(house_df.room_bed,color='green')
Out[19]:
<matplotlib.axes._subplots.AxesSubplot at 0x225ef14f780>
Most of the houses/properties have 3 or 4 bedrooms
Analyzing Feature: room_bath
In [20]:
sns.countplot(house_df.room_bath,color='green')
house_df['room_bath'].value_counts().sort_index()
Out[20]:
0.00 10
0.50 4
0.75 72
1.00 3852
1.25 9
1.50 1446
1.75 3048
2.00 1930
2.25 2047
2.50 5380
2.75 1185
3.00 753
3.25 589
3.50 731
3.75 155
4.00 136
4.25 79
4.50 100
4.75 23
5.00 21
5.25 13
5.50 10
5.75 4
6.00 6
6.25 2
6.50 2
6.75 2
7.50 1
7.75 1
8.00 2
Name: room_bath, dtype: int64
Majority of the properties have bathroom in the range of 1.0 to 2.5

In [21]:
print("Skewness is :",house_df.room_bath.skew())
sns.distplot(house_df.room_bath)
Skewness is : 0.511107573347417
Out[21]:
<matplotlib.axes._subplots.AxesSubplot at 0x225ef14f748>
Analyzing Feature: Living measure
In [22]:
#Data is skewed as visible from plot, as its distribution is normal

print("Skewness is :",house_df.living_measure.skew())
sns.distplot(house_df.living_measure)
house_df.living_measure.describe()
Skewness is : 1.471555426802092
Out[22]:
count 21613.000000
mean 2079.899736
std 918.440897
min 290.000000
25% 1427.000000
50% 1910.000000
75% 2550.000000
max 13540.000000
Name: living_measure, dtype: float64
Data distribution tells us, living_measure is right-skewed.
In [23]:
#Let's plot the boxplot for living_measure

sns.boxplot(house_df.living_measure)
Out[23]:
<matplotlib.axes._subplots.AxesSubplot at 0x225ef0f5b70>
There are many outliers in living measure. Need to review further to treat the same.
In [24]:
#checking the no. of data points with Living measure greater than 8000
house_df[house_df['living_measure']>8000]
Out[24]:
cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... yr_built yr_renovated zipcode lat
2014-09-
264 9208900037 6890000 6 7.75 9890 31374 2.0 0 4 ... 2001 0 98039 47.6305 -122
19
2014-06-
668 1924059029 4670000 5 6.75 9640 13068 1.0 1 4 ... 1983 2009 98040 47.5570 -122
17
2014-06-
1123 2303900035 2890000 5 6.25 8670 64033 2.0 0 4 ... 1965 2003 98177 47.7295 -122
11
2014-10-
4789 1247600105 5110000 5 5.25 8010 45517 2.0 1 4 ... 1999 0 98033 47.6767 -122
20
2014-10-
16785 6762700020 7700000 6 8.00 12050 27600 2.5 0 3 ... 1910 1987 98102 47.6298 -122
13
2014-07-
18393 6072800246 3300000 5 6.25 8020 21738 2.0 0 0 ... 2001 0 98006 47.5675 -122
02
2014-06-
19888 9808700762 7060000 5 4.50 10040 37325 2.0 1 2 ... 1940 2001 98004 47.6500 -122
11
2014-05-
20740 1225069038 2280000 7 8.00 13540 307752 3.0 0 4 ... 1999 0 98053 47.6675 -121
05
2014-08-
20917 2470100110 5570000 5 5.75 9200 35069 2.0 0 0 ... 2001 0 98039 47.6289 -122
04
We have only 9 properties/house which have more than 8k living_measure. So will treat these outliers.
Analyzing Feature: lot_measure

In [25]:
#Data is skewed as visible from plot

print("Skewness is :",house_df.lot_measure.skew())
sns.boxplot(house_df.lot_measure)
house_df.lot_measure.describe()
Skewness is : 13.06001895903175
Out[25]:
count 2.161300e+04
mean 1.510697e+04
std 4.142051e+04
min 5.200000e+02
25% 5.040000e+03
50% 7.618000e+03
75% 1.068800e+04
max 1.651359e+06
Name: lot_measure, dtype: float64
In [26]:
#checking the no. of data points with Lot measure greater than 1250000
house_df[house_df['lot_measure']>1250000]
Out[26]:
2015-03-
1113 1020069017 700000 4 1.0 1300 1651359 1.0 0 3 ... 1920 0 98022 47.2313 -122.02
27
We have only 1 property with more than 12,50,000 lot_measure. So we need to treat this.
Analyzing Feature: ceil
In [27]:
#let's see the ceil count for all the records

house_df.ceil.value_counts()
Out[27]:
1.0 10680
2.0 8241
1.5 1910
3.0 613
2.5 161
3.5 8
Name: ceil, dtype: int64
We can see, most houses have 1 floor

In [28]:
sns.countplot('ceil',data=house_df)
Out[28]:
<matplotlib.axes._subplots.AxesSubplot at 0x225ef19ff60>
Above grapth confirming the same, that most properties have 1 and 2 floors
Analyzing Feature: coast
In [29]:
#coast - most houses donot have waterfront view, very few are waterfront
house_df.coast.value_counts()
Out[29]:
0 21450
1 163
Name: coast, dtype: int64
Analyzing Feature: sight
In [30]:
#sight - most sights have not been viewed
house_df.sight.value_counts()
Out[30]:
0 19489
2 963
3 510
1 332
4 319
Name: sight, dtype: int64
Analyzing Feature: condition
In [31]:
#condition - Overall most houses are rated as 3 and above for its condition overall
house_df.condition.value_counts()
Out[31]:
3 14031
4 5679
5 1701
2 172
1 30
Name: condition, dtype: int64
Analyzing Feature: quality
In [32]:
#Quality - most properties have quality rating between 6 to 10

house_df.quality.value_counts()
sns.countplot('quality',data=house_df)
Out[32]:
<matplotlib.axes._subplots.AxesSubplot at 0x225eedbd358>
In [33]:
#checking the no. of data points with quality rating as 13

house_df[house_df['quality']==13]
Out[33]:
2014-09-
264 9208900037 6890000 6 7.75 9890 31374 2.0 0 4 ... 2001 0 98039 47.6305 -122
19
2014-06-
1123 2303900035 2890000 5 6.25 8670 64033 2.0 0 4 ... 1965 2003 98177 47.7295 -122
11
2015-01-
1583 2426039123 2420000 5 4.75 7880 24250 2.0 0 2 ... 1996 0 98177 47.7334 -122
30
2014-09-
7095 2303900100 3800000 3 4.25 5510 35000 2.0 0 4 ... 1997 0 98177 47.7296 -122
11
2015-04-
8509 4139900180 2340000 4 2.50 4500 35200 1.0 0 0 ... 1988 0 98006 47.5477 -122
20
2014-09-
9446 1068000375 3200000 6 5.00 7100 18200 2.5 0 0 ... 1933 2002 98199 47.6427 -122
23
2014-10-
10387 7237501190 1780000 4 3.25 4890 13402 2.0 0 0 ... 2004 0 98059 47.5303 -122
10
2014-11-
12320 1725059316 2390000 4 4.00 6330 13296 2.0 0 2 ... 2000 0 98033 47.6488 -122
20
2014-07-
12686 853200010 3800000 5 5.50 7050 42840 1.0 0 2 ... 1978 0 98004 47.6229 -122
01
2014-10-
16785 6762700020 7700000 6 8.00 12050 27600 2.5 0 3 ... 1910 1987 98102 47.6298 -122
13
2015-03-
17322 9831200500 2480000 5 3.75 6810 7500 2.5 0 0 ... 1922 0 98102 47.6285 -122
04
2014-12-
20892 3303850390 2980000 5 5.50 7400 18898 2.0 0 3 ... 2001 0 98006 47.5431 -122
12
2014-08-
20917 2470100110 5570000 5 5.75 9200 35069 2.0 0 0 ... 2001 0 98039 47.6289 -122
04
There are only 13 propeties which have the highest quality rating
Analyzing Feature: ceil_measure
In [34]:
#ceil_measure - its highly skewed

print("Skewness is :", house_df.ceil_measure.skew())
sns.distplot(house_df.ceil_measure)
house_df.ceil_measure.describe()
Skewness is : 1.4466644733818372
Out[34]:
count 21613.000000
mean 1788.390691
std 828.090978
min 290.000000
25% 1190.000000
50% 1560.000000
75% 2210.000000
max 9410.000000
Name: ceil_measure, dtype: float64
In [35]:
sns.factorplot(x='ceil',y='ceil_measure',data=house_df, size = 4, aspect = 2)
C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3666: UserWarning: The `factorplot

` function has been renamed to `catplot`. The original name will be removed in a future release. Ple
ase update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'`
in `catplot`.
warnings.warn(msg)
C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3672: UserWarning: The `size` para
mter has been renamed to `height`; please update your code.
warnings.warn(msg, UserWarning)
Out[35]:
<seaborn.axisgrid.FacetGrid at 0x225ef353f28>
There is no pattern in Ceil Vs Ceil_measure

The vertival lines at each point represent the inter quartile range of values at that point
Analyzing Feature: basement
In [36]:
#basement_measure
sns.distplot(house_df.basement)
Out[36]:
<matplotlib.axes._subplots.AxesSubplot at 0x225f1238080>
We can see 2 gaussians, which tells us there are propeties which don't have basements and some have the basements
In [37]:
house_df[house_df.basement==0].shape
Out[37]:
(13126, 24)
We have almost 60% of the properties without basement

In [38]:
#houses have zero measure of basement i.e. they donot have basements
#let's plot boxplot for properties which have basements only
house_df_base=house_df[house_df['basement']>0]
sns.boxplot(house_df_base['basement'])
Out[38]:
<matplotlib.axes._subplots.AxesSubplot at 0x225f0f92a20>
We can clearly see, there are outliers. We need to treat this before our model.
In [39]:
#checking the no. of data points with 'basement' greater than 4000
house_df[house_df['basement']>4000]
Out[39]:
2014-06-
668 1924059029 4670000 5 6.75 9640 13068 1.0 1 4 ... 1983 2009 98040 47.5570 -122
17
2014-05-
20740 1225069038 2280000 7 8.00 13540 307752 3.0 0 4 ... 1999 0 98053 47.6675 -121
05
We have only 2 properties with more than 4,000 measure basement

In [40]:
#Distribution of houses having basement

sns.distplot(house_df_base.basement)
Out[40]:
<matplotlib.axes._subplots.AxesSubplot at 0x225f102bd30>
Distribution having basement is right-skewed
Analyzing Feature: yr_built
In [41]:
#house range from new to very old

sns.distplot(house_df.yr_built)
Out[41]:
<matplotlib.axes._subplots.AxesSubplot at 0x225f125f5c0>
The built year of the properties range from 1900 to 2014 and we can see upward trend with time
Analyzing Feature: yr_renovated

In [42]:
house_df[house_df['yr_renovated']>0].shape
Out[42]:
(914, 24)
Only 914 houses were renovated out of 21613 houses
In [43]:
#yr_renovated - plot of houses which are renovated

house_df_reno=house_df[house_df['yr_renovated']>0]
sns.distplot(house_df_reno.yr_renovated)
Out[43]:
<matplotlib.axes._subplots.AxesSubplot at 0x225ef896208>
Now will create age column from columns : yr_built & yr_renovated
Analyzing Feature: Zipcode, Lat, Long
In [46]:
#For geographic visual
import geopandas as gpd
from shapely.geometry import Point, Polygon
#For current working directory
import os
cwd = os.getcwd()
In [47]:
## Need to add file USA ZipCodes_1.xlsx to current working directory to access this data
USAZip=pd.read_excel("USA ZipCodes_1.xlsx",sheet_name="Sheet8")
USAZip.head()
Out[47]:
zipcode City County Type
0 98001 Auburn King Standard
2 98003 Federal Way King Standard
3 98004 Bellevue King Standard
In [48]:
house_df=house_df.merge(USAZip,how='left',on='zipcode')
#house_df.drop_duplicates()
In [49]:
#let's see the shape of our dataframe

house_df.shape
Out[49]:
(21613, 27)
Now we have 27 features
In [5]:
#Add the folder WA to your current working directory

usa = gpd.read_file(cwd+'\\WA\\WSDOT__City_Limits.shp')
usa.head()
gdf = gpd.GeoDataFrame(
house_df,geometry = [Point(xy) for xy in zip(house_df['long'], house_df['lat'])])
#We can now plot our ``GeoDataFrame``
ax=usa[usa.CityName.isin(house_df.City.unique())].plot(
color='white', edgecolor='black',figsize=(20,8))
plt.figure(figsize=(15,15))
gdf.plot(ax=ax, color='green', marker='o',markersize=0.1)
Out[5]:
<matplotlib.axes._subplots.AxesSubplot at 0x1ccf1142588>
<Figure size 1080x1080 with 0 Axes>
In [51]:
#let's see the columns of dataframe once again

house_df.columns
Out[51]:

'total_area', 'month_year', 'City', 'County', 'Type'],
dtype='object')
So we have 'City', 'Country', 'Type' as new feature in our dataframe
In [52]:
house_df.Type.value_counts()
Out[52]:
Standard 21613
Name: Type, dtype: int64
As the type is same for all the columns, we will remove this column in further analysis
In [53]:
house_df.City.value_counts()
Out[53]:
Seattle 8977
Renton 1597
Bellevue 1407
Kent 1203
Redmond 979
Kirkland 977
Auburn 912
Sammamish 800
Federal Way 779
Issaquah 733
Maple Valley 590
Woodinville 471
Snoqualmie 310
Kenmore 283
Mercer Island 282
Enumclaw 234
North Bend 221
Bothell 195
Duvall 190
Carnation 124
Vashon 118
Black Diamond 100
Fall City 81
Medina 50
Name: City, dtype: int64
So we have most properties in 'Seattle' city and least in 'Medina' city
Analyzing Feature: furnished
In [54]:
sns.countplot('furnished',data=house_df)
house_df.furnished.value_counts()
Out[54]:
0 17362
1 4251
Name: furnished, dtype: int64
Most properties are not furnished. Furnish column need to be converted into categorical column
BIVARIATE ANALYSIS
PairPlot
In [55]:
# let's plot all the variables and confirm our above deduction with more confidence
sns.pairplot(house_df, diag_kind = 'kde')
Out[55]:
<seaborn.axisgrid.PairGrid at 0x225f12a3a90>
From above pair plot, we observed/deduced below
1. price: price distribution is Right-Skewed as we deduced earlier from our 5-factor analysis
2. room_bed: our target variable (price) and room_bed plot is not linear. It's distribution have lot of gaussians
3. room_bath: It's plot with price has somewhat linear relationship. Distribution has number of gaussians.
4. living_measure: Plot against price has strong linear relationship. It also have linear relationship with room_bath variable. So might remove
one of these 2. Distribution is Right-Skewed.
5. lot_measure: No clear relationship with price.
6. ceil: No clear relationship with price. We can see, it's have 6 unique values only. Therefore, we can convert this column into categorical
column for values.
7. coast: No clear relationship with price. Clearly it's categorical variable with 2 unique values.
8. sight: No clear relationship with price. This has 5 unique values. Can be converted to Categorical variable.
9. condition: No clear relationship with price. This has 5 unique values. Can be converted to Categorical variable.
10. quality: Somewhat linear relationship with price. Has discrete values from 1 - 13. Can be converted to Categorical variable.
11. ceil_measure: Strong linear relationship with price. Also with room_bath and living_measure features. Distribution is Right-Skewed.
12. basement: No clear relationship with price.
13. yr_built: No clear relationship with price.
14. yr_renovated: No clear relationship with price. Have 2 unique values. Can be converted to Categorical Variable which tells whether house is
renovated or not.
15. zipcode, lat, long: No clear relationship with price or any other feature.
16. living_measure15: Somewhat linear relationship with target feature. It's same as living_measure. Therefore we can drop this variable.
17. lot_measure15: No clear relationship with price or any other feature.
18. furnished: No clear relationship with price or any other feature. 2 unique values so can be converted to Categorical Variable
19. total_area: No clear relationship with price. But it has Very Strong linear relationship with lot_measure. So one of it can be dropped.
In brief, below featues should be converted to Categorical Variable

ceil, coast, sight, condition, quality, yr_renovated, furnished
And below columns can be dropped after checking pearson factor
zipcode, lat, long, living_measure15, lot_measure15, total_area
In [56]:
# let's see corelatoin between the different features

house_corr = house_df.corr(method ='pearson')
house_corr
Out[56]:
cid price room_bed room_bath living_measure lot_measure ceil coast sight condition ... basement yr_built
cid 1.000000 -0.016797 0.001286 0.005160 -0.012258 -0.132109 0.018525 -0.002721 0.011592 -0.023783 ... -0.005151 0.021380
price -0.016797 1.000000 0.308338 0.525134 0.702044 0.089655 0.256786 0.266331 0.397346 0.036392 ... 0.323837 0.053982
room_bed 0.001286 0.308338 1.000000 0.515884 0.576671 0.031703 0.175429 -0.006582 0.079532 0.028472 ... 0.303093 0.154178
room_bath 0.005160 0.525134 0.515884 1.000000 0.754665 0.087740 0.500653 0.063744 0.187737 -0.124982 ... 0.283770 0.506019
living_measure -0.012258 0.702044 0.576671 0.754665 1.000000 0.172826 0.353949 0.103818 0.284611 -0.058753 ... 0.435043 0.318049
lot_measure -0.132109 0.089655 0.031703 0.087740 0.172826 1.000000 -0.005201 0.021604 0.074710 -0.008958 ... 0.015286 0.053080
ceil 0.018525 0.256786 0.175429 0.500653 0.353949 -0.005201 1.000000 0.023698 0.029444 -0.263768 ... -0.245705 0.489319
coast -0.002721 0.266331 -0.006582 0.063744 0.103818 0.021604 0.023698 1.000000 0.401857 0.016653 ... 0.080588 -0.026161
sight 0.011592 0.397346 0.079532 0.187737 0.284611 0.074710 0.029444 0.401857 1.000000 0.045990 ... 0.276947 -0.053440
condition -0.023783 0.036392 0.028472 -0.124982 -0.058753 -0.008958 -0.263768 0.016653 0.045990 1.000000 ... 0.174105 -0.361417
quality 0.008130 0.667463 0.356967 0.664983 0.762704 0.113621 0.458183 0.082775 0.251321 -0.144674 ... 0.168392 0.446963
ceil_measure -0.010842 0.605566 0.477600 0.685342 0.876597 0.183512 0.523885 0.072075 0.167649 -0.158214 ... -0.051943 0.423898
basement -0.005151 0.323837 0.303093 0.283770 0.435043 0.015286 -0.245705 0.080588 0.276947 0.174105 ... 1.000000 -0.133124
yr_built 0.021380 0.053982 0.154178 0.506019 0.318049 0.053080 0.489319 -0.026161 -0.053440 -0.361417 ... -0.133124 1.000000
yr_renovated -0.016907 0.126442 0.018841 0.050739 0.055363 0.007644 0.006338 0.092885 0.103917 -0.060618 ... 0.071323 -0.224874
zipcode -0.008224 -0.053168 -0.152668 -0.203866 -0.199430 -0.129574 -0.059121 0.030285 0.084827 0.003026 ... 0.074845 -0.346869
lat -0.001891 0.306919 -0.008931 0.024573 0.052529 -0.085683 0.049614 -0.014274 0.006157 -0.014941 ... 0.110538 -0.148122
long 0.020799 0.021571 0.129473 0.223042 0.240223 0.229521 0.125419 -0.041910 -0.078400 -0.106500 ... -0.144765 0.409356
living_measure15 -0.002901 0.585374 0.391638 0.568634 0.756420 0.144608 0.279885 0.086463 0.280439 -0.092824 ... 0.200355 0.326229
lot_measure15 -0.138798 0.082456 0.029244 0.087175 0.183286 0.718557 -0.011269 0.030703 0.072575 -0.003406 ... 0.017276 0.070958
furnished -0.010009 0.565991 0.259268 0.484923 0.632947 0.118883 0.347749 0.069882 0.220250 -0.121902 ... 0.092847 0.305225
total_area -0.131844 0.104796 0.044310 0.104050 0.194209 0.999763 0.002637 0.023809 0.080693 -0.010219 ... 0.024832 0.059889
We have linear relationships in below featues as we got to know from above matrix
1. price: room_bath, living_measure, quality, living_measure15, furnished

2. living_measure: price, room_bath. So we can consider dropping 'room_bath' variable.
3. quality: price, room_bath, living_measure
4. ceil_measure: price, room_bath, living_measure, quality
5. living_measure15: price, living_measure, quality. So we can consider dropping living_measure15 as well. As it's giving same info as
living_measure.
6. lot_measure15: lot_measure. Therefore, we can consider dropping lot_measure15, as it's giving same info.
7. furnished: quality
8. total_area: lot_measure, lot_measure15. Therefore, we can consider dropping total_area feature as well. As it's giving same info as
lot_measure.
We can plot heatmap and can easily confirm our above findings
In [57]:
# Plotting heatmap
plt.subplots(figsize =(15, 8))
sns.heatmap(house_corr,cmap="YlGnBu",annot=True)
Out[57]:
<matplotlib.axes._subplots.AxesSubplot at 0x2258d4f79b0>
Analyzing Bivariate for Feature: month_year

In [58]:
#month,year in which house is sold. Price is not influenced by it, though there are outliers and can be easily se
en.
house_df['month_year'] = pd.to_datetime(house_df['month_year'], format='%B-%Y')
house_df.sort_values(["month_year"], axis=0,
ascending=True, inplace=True)
house_df["month_year"] = house_df["month_year"].dt.strftime('%B-%Y')
sns.factorplot(x='month_year',y='price',data=house_df, size=4, aspect=2)

plt.xticks(rotation=90)
#groupby
house_df.groupby('month_year')['price'].agg(['mean','median','size'])

in `catplot`.
warnings.warn(msg)
Out[58]:
mean median size
month_year
April-2015 561933.463021 476500 2231
August-2014 536527.039691 442100 1940
December-2014 524602.893270 432500 1471
February-2015 507919.603200 425545 1250
January-2015 525963.251534 438500 978
July-2014 544892.161013 465000 2211
June-2014 558123.736239 465000 2180
March-2015 544057.683200 450000 1875
May-2014 548166.600113 465000 1768
May-2015 558193.095975 455000 646
November-2014 522058.861800 435000 1411
October-2014 539127.477636 446900 1878
September-2014 529315.868095 450000 1774
The mean price of the houses tend to be high during March,April, May as compared to that of September, October, November,December period.
Analyzing Bivariate for Feature: room_bed

In [59]:
#Room_bed - outliers can be seen easily. Mean and median of price increases with number bedrooms/house uptill a p
oint
#and then drops
sns.factorplot(x='room_bed',y='price',data=house_df, size=4, aspect=2)
#groupby
house_df.groupby('room_bed')['price'].agg(['mean','median','size'])

in `catplot`.
warnings.warn(msg)
Out[59]:
mean median size
room_bed
0 4.102231e+05 288000.0 13
1 3.176580e+05 299000.0 199
2 4.013877e+05 374000.0 2760
3 4.662766e+05 413000.0 9824
4 6.355647e+05 549997.5 6882
5 7.868741e+05 620000.0 1601
6 8.258535e+05 650000.0 272
7 9.514478e+05 728580.0 38
8 1.105077e+06 700000.0 13
9 8.939998e+05 817000.0 6
10 8.200000e+05 660000.0 3
11 5.200000e+05 520000.0 1
33 6.400000e+05 640000.0 1
There is clear increasing trend in price with room_bed
In [60]:
#room_bath - outliers can be seen easily. Overall mean and median price increares with increasing room_bath
sns.factorplot(x='room_bath',y='price',data=house_df,size=4, aspect=2)
#groupby
house_df.groupby('room_bath')['price'].agg(['mean','median','size'])
in `catplot`.
warnings.warn(msg)
Out[60]:
mean median size
room_bath
0.00 4.490950e+05 317500 10
0.50 2.373750e+05 264000 4
0.75 2.945209e+05 273500 72
1.00 3.470412e+05 320000 3852
1.25 6.217722e+05 516500 9
1.50 4.093457e+05 370000 1446
1.75 4.549158e+05 422900 3048
2.00 4.579050e+05 423250 1930
2.25 5.337688e+05 472500 2047
2.50 5.536618e+05 499950 5380
2.75 6.603505e+05 605000 1185
3.00 7.086619e+05 600000 753
3.25 9.707532e+05 835000 589
3.50 9.324017e+05 820000 731
3.75 1.198179e+06 1070000 155
4.00 1.268405e+06 1055000 136
4.25 1.526653e+06 1380000 79
4.50 1.334211e+06 1060000 100
4.75 2.022300e+06 2300000 23
5.00 1.674167e+06 1430000 21
5.25 1.817962e+06 1420000 13
5.50 2.522500e+06 2340000 10
5.75 2.492500e+06 1930000 4
6.00 2.948333e+06 2895000 6
6.25 3.095000e+06 3095000 2
6.50 1.710000e+06 1710000 2
6.75 2.735000e+06 2735000 2
7.50 4.500000e+05 450000 1
7.75 6.890000e+06 6890000 1
8.00 4.990000e+06 4990000 2
There is upward trend in price with increase in room_bath

Analyzing Bivariate for Feature: living_measure
In [61]:
#living_measure - price increases with increase in living measure

print(sns.scatterplot(house_df['living_measure'],house_df['price']))
house_df['living_measure'].describe()
AxesSubplot(0.125,0.125;0.775x0.755)
Out[61]:
count 21613.000000
mean 2079.899736
std 918.440897
min 290.000000
25% 1427.000000
50% 1910.000000
75% 2550.000000
max 13540.000000
Name: living_measure, dtype: float64
There is clear increment in price of the property with increment in the living measure But there seems to be one outlier to this trend. Need to
evaluate the same
Analyzing Bivariate for Feature: lot_measure

In [62]:
#lot_measure - there seems to be no relation between lot_measure and price

#lot_measure - data value range is very large so breaking it get better view.
print(sns.scatterplot(house_df['lot_measure'],house_df['price']))
house_df['lot_measure'].describe()
AxesSubplot(0.125,0.125;0.775x0.755)
Out[62]:
count 2.161300e+04
mean 1.510697e+04
std 4.142051e+04
min 5.200000e+02
25% 5.040000e+03
50% 7.618000e+03
75% 1.068800e+04
max 1.651359e+06
There doesnt seem to be no relation between lot_measure and price trend

In [63]:
#lot_measure <25000
x=house_df[house_df['lot_measure']<25000]
print(sns.scatterplot(x['lot_measure'],x['price']))
x['lot_measure'].describe()
AxesSubplot(0.125,0.125;0.775x0.755)
Out[63]:
count 19713.000000
mean 7762.510577
std 4252.549162
min 520.000000
25% 4997.000000
50% 7253.000000
75% 9620.000000
max 24969.000000
Almost 95% of the houses have <25000 lot_measure. But there is no clear trend between lot_measure and price
In [64]:
#lot_measure >100000 - price increases with increase in living measure

y=house_df[house_df['lot_measure']<=75000]
print(sns.scatterplot(y['lot_measure'],y['price']))
#y['lot_measure'].describe()
AxesSubplot(0.125,0.125;0.775x0.755)
Analyzing Bivariate for Feature: ceil

In [65]:
#ceil - median price increases initially and then falls

print(sns.factorplot(x='ceil',y='price',data=house_df, size = 4, aspect = 2))
#groupby
house_df.groupby('ceil')['price'].agg(['mean','median','size'])

in `catplot`.
warnings.warn(msg)
<seaborn.axisgrid.FacetGrid object at 0x000002259321B9B0>
Out[65]:
mean median size
ceil
1.0 4.422196e+05 390000 10680
1.5 5.590449e+05 524475 1910
2.0 6.490515e+05 542950 8241
2.5 1.061021e+06 799200 161
3.0 5.826201e+05 490000 613
3.5 9.339375e+05 534500 8
There is some slight upward trend in price with the ceil
Analyzing Bivariate for Feature: coast

In [66]:
#coast - mean and median of waterfront view is high however such houses are very small in compare to non-waterfro
nt
#Also, living_measure mean and median is greater for waterfront house.
print(sns.factorplot(x='coast',y='price',data=house_df, size = 4, aspect = 2))
#groupby
house_df.groupby('coast')['living_measure','price'].agg(['median','mean'])

in `catplot`.
warnings.warn(msg)
<seaborn.axisgrid.FacetGrid object at 0x0000022580B62208>
Out[66]:
living_measure price
median mean median mean
coast
0 1910 2071.587972 450000 5.316534e+05
1 2850 3173.687117 1400000 1.662524e+06
The house properties with water_front tend to have higher price compared to that of non-water_front properties
Analyzing Bivariate for Feature: sight

In [67]:
#sight - have outliers. The house sighted more have high price (mean and median) and have large living area as we
ll.
print(sns.factorplot(x='sight',y='price',data=house_df, size = 4, aspect = 2))
#groupby
house_df.groupby('sight')['price','living_measure'].agg(['mean','median','size'])

in `catplot`.
warnings.warn(msg)
<seaborn.axisgrid.FacetGrid object at 0x00000225960E3080>
Out[67]:
price living_measure
mean median size mean median size
sight
0 4.966235e+05 432500 19489 1997.761660 1850 19489
1 8.125186e+05 690944 332 2568.960843 2420 332
2 7.927462e+05 675000 963 2655.257529 2470 963
3 9.724684e+05 802500 510 3018.564706 2840 510
4 1.464363e+06 1190000 319 3351.473354 3050 319
Properties with higher price have more no.of sights compared to that of houses with lower price
In [68]:
#Sight - Viewed in relation with price and living_measure

#Costlier houses with large living area are sighted more.
print(sns.scatterplot(house_df['living_measure'],house_df['price'],hue=house_df['sight'],palette='Paired',legend=
'full'))
AxesSubplot(0.125,0.125;0.775x0.755)
The above graph also justify that: Properties with higher price have more no.of sights compared to that of houses with lower price
Analyzing Bivariate for Feature: condition
In [69]:
#condition - as the condition rating increases its price and living measure mean and median also increases.
print(sns.factorplot(x='condition',y='price',data=house_df, size = 4, aspect = 2))
#groupby
house_df.groupby('condition')['price','living_measure'].agg(['mean','median','size'])
in `catplot`.
warnings.warn(msg)
<seaborn.axisgrid.FacetGrid object at 0x00000225FFCB87F0>
Out[69]:
condition
1 334431.666667 262500 30 1216.000000 1000 30
2 327316.215116 279000 172 1410.058140 1320 172
3 542097.086024 450000 14031 2149.042050 1970 14031
4 521300.705230 440000 5679 1950.991724 1820 5679
5 612577.742504 526000 1701 2022.911229 1880 1701
The price of the house increases with condition rating of the house
In [70]:
#Condition - Viewed in relation with price and living_measure. Most houses are rated as 3 or more.
#We can see some outliers as well
print(sns.scatterplot(house_df['living_measure'],house_df['price'],hue=house_df['condition'],palette='Paired',leg
end='full'))
AxesSubplot(0.125,0.125;0.775x0.755)
So we found out that smaller houses are in better condition and better condition houses are having higher prices
Analyzing Bivariate for Feature: quality
In [71]:
#quality - with grade increase price and living_measure increase (mean and median)
print(sns.factorplot(x='quality',y='price',data=house_df, size = 4, aspect = 2))

#groupby
house_df.groupby('quality')['price','living_measure'].agg(['mean','median','size'])

in `catplot`.
warnings.warn(msg)
<seaborn.axisgrid.FacetGrid object at 0x000002258C021320>
Out[71]:
quality
1 1.420000e+05 142000.0 1 290.000000 290 1
3 2.056667e+05 262000.0 3 596.666667 600 3
4 2.143810e+05 205000.0 29 660.482759 660 29
5 2.485240e+05 228700.0 242 983.326446 905 242
6 3.019166e+05 275276.5 2038 1191.561335 1120 2038
7 4.025933e+05 375000.0 8981 1689.400401 1630 8981
8 5.428955e+05 510000.0 6068 2184.748517 2150 6068
9 7.737382e+05 720000.0 2615 2868.139962 2820 2615
10 1.072347e+06 914327.0 1134 3520.299824 3450 1134
11 1.497792e+06 1280000.0 399 4395.448622 4260 399
12 2.192500e+06 1820000.0 90 5471.588889 4965 90
13 3.710769e+06 2980000.0 13 7483.076923 7100 13
There is clear increase in price of the house with higher rating on quality
In [72]:
#quality - Viewed in relation with price and living_measure. Most houses are graded as 6 or more.
#We can see some outliers as well
print(sns.scatterplot(house_df['living_measure'],house_df['price'],hue=house_df['quality'],palette='coolwarm_r',
legend='full'))
AxesSubplot(0.125,0.125;0.775x0.755)
Analyzing Bivariate for Feature: ceil_measure
In [73]:
#ceil_measure - price increases with increase in ceil measure

print(sns.scatterplot(house_df['ceil_measure'],house_df['price']))
house_df['ceil_measure'].describe()
AxesSubplot(0.125,0.125;0.775x0.755)
Out[73]:
count 21613.000000
mean 1788.390691
std 828.090978
min 290.000000
25% 1190.000000
50% 1560.000000
75% 2210.000000
max 9410.000000
There is upward trend in price with ceil_measure

Analyzing Bivariate for Feature: basement
In [74]:
#basement - price increases with increase in ceil measure

print(sns.scatterplot(house_df['basement'],house_df['price']))
house_df['basement'].describe()
AxesSubplot(0.125,0.125;0.775x0.755)
Out[74]:
count 21613.000000
mean 291.509045
std 442.575043
min 0.000000
25% 0.000000
50% 0.000000
75% 560.000000
max 4820.000000
Name: basement, dtype: float64
We will create the categorical variable for basement 'has_basement' for houses with basement and no basement.This categorical variable will
be used for further analysis.
In [75]:
#Binning Basement to analyse data

def create_basement_group(series):
if series == 0:
return "No"
elif series > 0:
return "Yes"
house_df['has_basement'] = house_df['basement'].apply(create_basement_group)
In [76]:
#basement - after binning we data shows with basement houses are costlier and have higher
#living measure (mean & median)
print(sns.factorplot(x='has_basement',y='price',data=house_df, size = 4, aspect = 2))
house_df.groupby('has_basement')['price','living_measure'].agg(['mean','median','size'])

in `catplot`.
warnings.warn(msg)
<seaborn.axisgrid.FacetGrid object at 0x0000022580B9A470>
Out[76]:
has_basement
No 486945.394789 411500 13126 1928.879628 1740 13126
Yes 622518.174384 515000 8487 2313.467539 2100 8487
The houses with basement has better price compared to that of houses without basement
In [77]:
#basement - have higher price & living measure

print(sns.scatterplot(house_df['living_measure'],house_df['price'],hue=house_df['has_basement']))
AxesSubplot(0.125,0.125;0.775x0.755)
In [78]:
#yr_built - outliers can be seen easily.

print(sns.scatterplot(house_df['yr_built'],house_df['living_measure']))
#groupby
house_df.groupby('yr_built')['price'].agg(['mean','median','size'])
AxesSubplot(0.125,0.125;0.775x0.755)
Out[78]:
mean median size
yr_built
1900 581536.632184 549000 87
1901 557108.344828 550000 29
1902 673192.592593 624000 27
1903 480958.195652 461000 46
1904 583867.755556 478000 45
1905 753443.932432 597500 74
1906 670027.663043 555000 92
1907 676324.476923 595000 65
1908 564499.848837 519475 86
1909 696448.989362 575500 94
1910 671671.835821 542500 134
1911 632584.246575 606000 73
1912 613193.227848 557510 79
1913 586066.271186 535000 59
1914 615246.074074 553300 54
1915 585036.921875 549500 64
1916 601041.620253 515000 79
1917 528126.785714 450000 56
1918 492346.875000 412450 120
1919 537887.556818 487900 88
1920 477761.030612 448500 98
1921 613224.210526 547500 76
1922 569794.147368 515000 95
1923 618653.773810 498376 84
1924 570419.928058 525000 139
1925 607316.606061 535000 165
1926 625443.377778 560000 180
1927 654154.208696 605000 115
1928 621920.198413 547500 126
1929 574396.842105 523475 114
... ... ... ...
1986 476989.069767 419500 215
1987 517565.010204 471500 294
1988 583930.400000 500000 270
1989 583063.403448 490000 290
1990 564133.384375 457500 320
1991 630630.647321 534150 224
1992 548205.924242 472500 198
1993 556760.455446 435000 202
1994 486864.040161 439000 249
1995 577933.757396 496000 169
1996 639673.528205 540000 195
1997 606173.887006 515000 177
1998 594280.146444 500000 239

1999 640431.177358 499900 265
2000 682003.619266 544250 218
2001 741340.042623 585000 305
2002 578818.481982 447500 222
2003 558791.367299 450500 422
2004 596095.004619 507000 433
2005 580895.468889 486000 450
2006 631041.548458 510500 454
2007 615193.292566 480000 417
2008 642037.716621 500000 367
2009 518462.186957 416375 230
2010 551678.384615 448500 143
2011 544648.384615 440000 130
2012 527436.982353 448475 170
2013 678599.582090 565000 201
2014 683792.685152 599000 559
2015 759970.947368 629500 38
We will create new variable: Houselandratio - This is proportion of living area in the total area of the house. We will explore the trend of price
against this houselandratio.
In [79]:
#HouseLandRatio - Computing new variable as ratio of living_measure/total_area

#Significes - Land used for construction of house
house_df["HouseLandRatio"]=np.round((house_df['living_measure']/house_df['total_area']),2)*100
house_df["HouseLandRatio"].head()
Out[79]:
17786 19.0
3782 16.0
10069 16.0
7114 24.0
10080 22.0
Name: HouseLandRatio, dtype: float64
Analyzing Bivariate for Feature: yr_renovated
In [80]:
#yr_renovated -
x=house_df[house_df['yr_renovated']>0]
print(sns.scatterplot(x['yr_renovated'],x['price']))
#groupby
x.groupby('yr_renovated')['price'].agg(['mean','median','size'])
AxesSubplot(0.125,0.125;0.775x0.755)
Out[80]:
mean median size
yr_renovated
1934 4.599500e+05 459950.0 1
1940 3.784000e+05 378400.0 2
1944 5.210000e+05 521000.0 1
1945 3.986667e+05 375000.0 3
1946 3.511375e+05 351137.5 2
1948 4.100000e+05 410000.0 1
1950 2.914500e+05 291450.0 2
1951 2.760000e+05 276000.0 1
1953 2.458167e+05 247500.0 3
1954 9.000000e+05 900000.0 1
1955 4.421667e+05 399000.0 3
1956 9.306667e+05 1140000.0 3
1957 2.915333e+05 249900.0 3
1958 5.595760e+05 397380.0 5
1959 3.975000e+05 397500.0 1
1960 4.771750e+05 299350.0 4
1962 6.150000e+05 615000.0 2
1963 4.977125e+05 402500.0 4
1964 3.567200e+05 325000.0 5
1965 7.822000e+05 580000.0 5
1967 2.686000e+05 268600.0 2
1968 4.835125e+05 425000.0 8
1969 5.291250e+05 555750.0 4
1970 5.230444e+05 450000.0 9
1971 4.182775e+05 418277.5 2
1972 6.197500e+05 522000.0 4
1973 4.172000e+05 440000.0 5
1974 4.025000e+05 310000.0 3
1975 5.052500e+05 521750.0 6
1976 4.016667e+05 335000.0 3
... ... ... ...
1986 6.230582e+05 520000.0 17
1987 1.206778e+06 624000.0 18
1988 7.227600e+05 588000.0 15
1989 6.397886e+05 560000.0 22
1990 7.491200e+05 730000.0 25
1991 9.650450e+05 792500.0 20
1992 6.967941e+05 599000.0 17
1993 8.480032e+05 805000.0 19
1994 9.430265e+05 780000.0 19
1995 8.055231e+05 536475.0 16
1996 7.496633e+05 710000.0 15
1997 6.203960e+05 569950.0 15
1998 7.737316e+05 526000.0 19
1999 1.030706e+06 840000.0 17
2000 8.090843e+05 755000.0 35
2001 1.089489e+06 675000.0 19
2002 1.216498e+06 890000.0 22
2003 9.923056e+05 767500.0 36
2004 7.820769e+05 721250.0 26

2005 8.151957e+05 744000.0 35
2006 7.890396e+05 654050.0 24
2007 8.389221e+05 797000.0 35
2008 1.034499e+06 801500.0 18
2009 9.006824e+05 521000.0 22
2010 9.926694e+05 845000.0 18
2011 6.074962e+05 577000.0 13
2012 6.251818e+05 515000.0 11
2013 6.649608e+05 560000.0 37
2014 6.550301e+05 575000.0 91
2015 6.591562e+05 651000.0 16
So most houses are renovated after 1980's. We will create new categorical variable 'has_renovated' to categorize the property as renovated and
non-renovated. For further ananlysis we will use this categorical variable.
In [81]:
#Lets try to group yr_renovated

#Binning Basement to analyse data
def create_renovated_group(series):
if series == 0:
return "No"
elif series > 0:
return "Yes"
house_df['has_renovated'] = house_df['yr_renovated'].apply(create_renovated_group)
In [84]:
#has_renovated - renovated have higher mean and median, however it does not confirm if the prices of house renova
ted
#actually increased or not.
#HouseLandRatio - Renovated house utilized more land area for construction of house
print(sns.scatterplot(house_df['living_measure'],house_df['price'],hue=house_df['has_renovated']))
#groupby
house_df.groupby(['has_renovated'])['price','HouseLandRatio'].agg(['mean','median','size'])
AxesSubplot(0.125,0.125;0.775x0.755)
Out[84]:
price HouseLandRatio
has_renovated
No 530447.958597 448000 20699 22.067056 20.0 20699
Yes 760628.777899 600000 914 22.296499 21.0 914
Renovated properties have higher price than others with same living measure space.
In [85]:
#pd.crosstab(house_df['yearbuilt_group'],house_df['has_renovated'])
In [86]:
#has_renovated - have higher price & living measure

x=house_df[house_df['yr_built']<2000]
print(sns.scatterplot(x['living_measure'],x['price'],hue=x['has_renovated']))
AxesSubplot(0.125,0.125;0.775x0.755)
Analyzing Bivariate for Feature: furnished
In [87]:
#furnished - Furnished has higher price value and has greater living_measure
print(sns.scatterplot(house_df['living_measure'],house_df['price'],hue=house_df['furnished']))
#groupby
house_df.groupby('furnished')['price','living_measure','HouseLandRatio'].agg(['mean','median','size'])
AxesSubplot(0.125,0.125;0.775x0.755)
Out[87]:
price living_measure HouseLandRatio
mean median size mean median size mean median size
furnished
0 437300.158968 401000 17362 1792.256652 1720 17362 21.508236 19.0 17362
1 960374.414961 810000 4251 3254.696072 3110 4251 24.398730 24.0 4251
Furnished houses have higher price than that of the Non-furnished houses
Analyzing Bivariate for Feature: city
In [88]:
#City - outliers can be seen easily.
print(sns.factorplot(x='City',y='price',data=house_df, size = 4, aspect = 2))

#groupby
house_df.groupby('City')['price'].agg(['mean','median','size']).sort_values(by='median',ascending=False)

in `catplot`.
warnings.warn(msg)
<seaborn.axisgrid.FacetGrid object at 0x0000022593C63C88>
Out[88]:
mean median size
City
Medina 2.161300e+06 1895000.0 50
Mercer Island 1.194874e+06 993750.0 282
Bellevue 8.984661e+05 749000.0 1407
Sammamish 7.328210e+05 688500.0 800
Redmond 6.589089e+05 625000.0 979
Issaquah 6.151222e+05 572000.0 733
Woodinville 6.174979e+05 570000.0 471
Kirkland 6.465428e+05 510000.0 977
Snoqualmie 5.280031e+05 500000.0 310
Bothell 4.903771e+05 470000.0 195
Vashon 4.874805e+05 463750.0 118
Fall City 5.806379e+05 460000.0 81
Seattle 5.350695e+05 453000.0 8977
Kenmore 4.624889e+05 445000.0 283
Carnation 4.556171e+05 415000.0 124
Duvall 4.248151e+05 401250.0 190
North Bend 4.395073e+05 399500.0 221
Black Diamond 4.236660e+05 359999.5 100
Renton 4.034685e+05 358000.0 1597
Maple Valley 3.668761e+05 342000.0 590
Kent 2.995499e+05 283200.0 1203
Enumclaw 3.157093e+05 279500.0 234
Auburn 2.914815e+05 270000.0 912
Federal Way 2.893913e+05 268000.0 779

From the above graph, few cities have higher average price of the houses compared to others. We need to further analyse why the price varies
among cities.
In [89]:
#City mean price distribution with average

city_price=pd.DataFrame(house_df.groupby('City')['price'].agg(['mean','median','size']))
indx=city_price.index
overall_price_mean=np.mean(house_df['price'])
overall_price_median=np.median(house_df['price'])
fig, ax1 = plt.subplots(figsize=(plotSizeX, plotSizeY))

barlist=ax1.bar(city_price.index,city_price['mean'],color='gray')
ax1.axhline(overall_price_mean, color="red")
ax1.text(1.02, overall_price_mean, "{0:.2f}".format(round(overall_price_mean,2)), va='center', ha="left", bbox=di
ct(facecolor="w",alpha=0.5),
transform=ax1.get_yaxis_transform())
plt.title("Cities and Mean Price")
plt.show()
As we can see from above grapgh, majorly below cities have higher mean house prices
1. Bellevue
2. Fall City
3. Federal Way
4. Kirkland
5. Medina
6. Mercer Island
7. Redmond
8. Sammanmish
9. Woodinville
In [90]:
#City median price distribution with average

fig, ax1 = plt.subplots(figsize=(plotSizeX, plotSizeY))
barlist=ax1.bar(city_price.index,city_price['median'],color='green')
ax1.axhline(overall_price_median, color="red")
ax1.text(1.02, overall_price_median, "{0:.2f}".format(round(overall_price_median,2)), va='center', ha="left", bbo
x=dict(facecolor="w",alpha=0.5),
transform=ax1.get_yaxis_transform())
plt.title("Cities and Median Price")

plt.show()
As we can see from above grapgh, majorly below cities have higher median house prices
1. Bellevue
2. Bothell
3. Issaquah
4. Kirkland
5. Medina
6. Mercer Island
7. Redmond
8. Sammanmish
9. Snoqualmie
10. Woodinville
In [91]:
#let's make the copy of the dataframe, before making any furhter changes
house_df_bdp=house_df.copy()
DATA PROCESSING
Treating Outlilers
We have seen outliers for columns room_bath(33 bed), living_measure, lot_measure, ceil_measure and Basement
In [92]:
def outlier_treatment(datacolumn):
sorted(datacolumn)
Q1,Q3 = np.percentile(datacolumn , [25,75])
IQR = Q3-Q1
lower_range = Q1-(1.5 * IQR)
upper_range = Q3+(1.5 * IQR)
return lower_range,upper_range
Using the above function, lets get the lowerbound and upperbound values
Treating outliers for column - ceil_measure
In [93]:
lowerbound,upperbound = outlier_treatment(house_df.ceil_measure)
print(lowerbound,upperbound)
-340.0 3740.0
Lets check which column is considered as an outlier
In [94]:
house_df[(house_df.ceil_measure < lowerbound) | (house_df.ceil_measure > upperbound)]
Out[94]:
cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... lot_measure15 furnished total_area month_y
2014-05-
7142 7397300220 2750000 4 3.25 4430 21000 2.0 0 0 ... 20000 1 25430 May-2
29
2014-05-
10270 3221059044 799950 4 3.50 4220 196817 2.0 0 0 ... 195395 1 201037 May-2
23
2014-05-
9770 2424049029 3100000 6 4.25 6980 15682 3.0 0 4 ... 18367 1 22662 May-2
29
2014-05-
9909 1724069059 2000000 5 4.00 4580 4443 3.0 1 4 ... 4443 1 9023 May-2
24
2014-05-
3712 9359100500 1800000 4 3.25 4060 13000 2.0 0 3 ... 13800 1 17060 May-2
27
2014-05-
3628 4131900042 2000000 5 4.25 6490 10862 2.0 0 3 ... 14080 1 17352 May-2
16
2014-05-
18900 3521059134 900000 3 3.50 4080 217697 1.5 0 3 ... 217790 1 221777 May-2
23
2014-05-
13664 2481620310 1120000 4 2.25 4470 60373 2.0 0 0 ... 40450 1 64843 May-2
14
2014-05-
20740 1225069038 2280000 7 8.00 13540 307752 3.0 0 4 ... 217800 1 321292 May-2
05
2014-05-
10672 3892500150 1550000 3 2.50 4460 26027 2.0 0 0 ... 26027 1 30487 May-2
21
2014-05-
3964 1829300210 762300 4 2.50 3880 14550 2.0 0 0 ... 14045 1 18430 May-2
06
2014-05-
10294 525069127 1200000 4 3.50 4740 172497 2.0 0 0 ... 49658 1 177237 May-2
23
2014-05-
3996 3630200780 1050000 4 3.75 3860 5474 2.5 0 0 ... 5474 1 9334 May-2
22
2014-05-
10540 2524069097 2240000 5 6.50 7270 130017 2.0 0 0 ... 44890 1 137287 May-2
09
2014-05-
13827 824059042 1890000 5 3.50 4180 17935 2.0 0 0 ... 13760 1 22115 May-2
30
2014-05-
10462 7237550130 1300000 4 3.50 4380 74052 1.0 0 0 ... 62291 1 78432 May-2
20
2014-05-
3089 3616600250 1600000 3 3.25 3790 19000 2.0 0 4 ... 18628 1 22790 May-2
27
2014-05-
15646 98000960 1050000 4 3.25 4400 16625 2.0 0 0 ... 15523 1 21025 May-2
13
2014-05-
8153 425069020 1090000 4 2.50 4340 141570 2.5 0 0 ... 97138 1 145910 May-2
05
2014-05-
8484 4039800080 1360000 5 3.50 5960 13703 2.0 0 2 ... 17320 1 19663 May-2
29
2014-05-
15135 3625700010 1870000 5 4.00 4510 15175 2.0 0 0 ... 13500 1 19685 May-2
06
2014-05-
8769 2524049318 2000000 4 3.00 4260 18000 2.0 0 2 ... 17015 1 22260 May-2
28
2014-05-
15404 3276940100 1000000 4 3.00 4260 18687 2.0 0 0 ... 16772 1 22947 May-2
22
2014-05-
8358 4100500070 1710000 5 4.50 4590 14685 2.0 0 0 ... 9486 1 19275 May-2
27
2014-05-
9507 8691310840 833000 4 2.75 3780 10308 2.0 0 0 ... 10740 1 14088 May-2
09
2014-05-
7743 6613000935 2560000 4 2.50 5300 26211 2.0 1 2 ... 19281 1 31511 May-2
13
2014-05-
3330 5710000005 2150000 4 5.50 5060 10320 2.0 0 0 ... 10080 1 15380 May-2
22
2014-05-
19115 6648150040 1680000 5 3.25 4860 23723 2.0 0 2 ... 13860 1 28583 May-2
13
2014-05-
15697 3758900259 1040000 4 3.50 3900 8391 2.0 0 0 ... 12268 1 12291 May-2
07
2014-05-
3187 1853080640 966000 5 4.50 3810 8019 2.0 0 0 ... 7713 1 11829 May-2
14
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2015-04-
3232 7135520300 1300000 3 2.75 4120 16365 1.0 0 2 ... 14110 1 20485 April-2
07
2015-04-
15540 98000740 945000 5 3.50 4380 14925 2.0 0 0 ... 14633 1 19305 April-2
01
2015-04-
15481 2726059144 1040000 5 3.75 4570 10194 2.0 0 0 ... 7560 1 14764 April-2
10
2015-04-
8270 3401700150 1350000 5 3.00 5530 38816 1.5 0 2 ... 44417 1 44346 April-2
23
2015-04-
19626 3295610080 912000 4 2.75 4030 10888 2.0 0 0 ... 10756 1 14918 April-2
01
2015-04-
14955 3121500150 894000 4 2.50 3800 22029 2.0 0 0 ... 24979 1 25829 April-2
23
2015-04-
15084 2625069070 1390000 4 3.25 4860 181319 2.5 0 0 ... 181319 1 186179 April-2
10
2015-04-
2878 644000040 1780000 4 3.25 3950 10912 2.0 0 0 ... 10998 1 14862 April-2
29
2015-04-
8561 3585900500 1530000 4 4.25 4720 21000 3.0 0 4 ... 20000 1 25720 April-2
02
2015-04-
8682 3860900035 1940000 5 3.50 4230 16526 2.0 0 0 ... 12362 1 20756 April-2
15
2015-04-
15893 98300230 1460000 4 4.00 4620 130208 2.0 0 0 ... 131007 1 134828 April-2
28
2015-04-
7749 6790830090 1060000 4 3.50 4220 8417 3.0 0 0 ... 8435 1 12637 April-2
15
2015-04-
7941 2481630030 965000 4 2.50 3920 41206 2.0 0 0 ... 36562 1 45126 April-2
27
2015-04-
15862 7237550110 1180000 4 3.25 3750 74052 2.0 0 0 ... 74052 1 77802 April-2
24
2015-04-
19172 713500020 1390000 4 4.50 4490 24767 2.0 0 2 ... 32700 1 29257 April-2
21
2015-04-
7847 7853440140 802945 5 3.50 4000 9234 2.0 0 0 ... 6600 1 13234 April-2
09
2015-05-
13999 1126059201 1270000 5 3.25 4410 35192 2.0 0 2 ... 59677 1 39602 May-2
04
2015-05-
9320 1525059261 1900000 5 4.50 5160 44315 2.0 0 0 ... 44315 1 49475 May-2
05
2015-05-
2730 7853440050 771005 5 4.50 4000 6713 2.0 0 0 ... 6600 1 10713 May-2
05
2015-05-
5687 3751600409 510000 4 2.50 4073 17334 2.0 0 0 ... 9625 0 21407 May-2
08
2015-05-
5620 6065300370 4210000 5 6.00 7440 21540 2.0 0 0 ... 19329 1 28980 May-2
06
2015-05-
21004 3303960250 1050000 4 3.25 4020 11588 2.0 0 0 ... 8066 1 15608 May-2
07
2015-05-
15596 1925059254 3000000 5 4.00 6670 16481 2.0 0 0 ... 16607 1 23151 May-2
07
2015-05-
13440 1623089165 920000 4 3.75 4030 503989 2.0 0 0 ... 71874 1 508019 May-2
06
2015-05-
15586 1266200140 1850000 4 3.25 4160 10335 2.0 0 0 ... 10333 1 14495 May-2
06
2015-05-
9588 7237501380 1270000 4 3.50 4640 13404 2.0 0 0 ... 13590 1 18044 May-2
07
2015-05-
17098 2424059174 2000000 4 3.25 5640 35006 2.0 0 2 ... 35033 1 40646 May-2
08
2015-05-
13099 3024059057 1650000 4 4.50 5550 16065 2.0 0 0 ... 16488 1 21615 May-2
01
2015-05-
19121 4389201095 3650000 5 3.75 5020 8694 2.0 0 1 ... 11275 1 13714 May-2
11
2015-05-
13112 7960900060 2900000 4 3.25 5050 20100 1.5 0 2 ... 20060 1 25150 May-2
04
We got 611 records which are outliers
In [95]:
#dropping the record from the dataset

house_df.drop(house_df[ (house_df.ceil_measure > upperbound) | (house_df.ceil_measure < lowerbound) ].index, inpl
ace=True)
In [96]:
house_df.shape
Out[96]:
(21002, 30)
In [97]:
#ceil_measure
print("Skewness is :", house_df.ceil_measure.skew())
sns.distplot(house_df.ceil_measure)
house_df.ceil_measure.describe()
Skewness is : 0.8198869256569326
Out[97]:
count 21002.000000
mean 1712.238168
std 696.044073
min 290.000000
25% 1180.000000
50% 1540.000000
75% 2140.000000
max 3740.000000
After treating outliers of ceil_measure, the data has reduced by about 600(~3%) data points but data is nicely distributed
Treating outliers for column - basement
In [98]:
lowerbound_base,upperbound_base = outlier_treatment(house_df.basement)
print(lowerbound_base,upperbound_base)
-855.0 1425.0
In [99]:
house_df[(house_df.basement < lowerbound_base) | (house_df.basement > upperbound_base)]
Out[99]:
2014-05-
16357 3211270170 404000 4 3.00 4060 35621 1.0 0 0 ... 35259 1 39681 May-2
23
2014-05-
7386 5700003640 2100000 5 3.75 5340 10655 2.5 0 3 ... 9418 1 15995 May-2
19
2014-05-
9727 5119010090 549900 5 2.75 3060 7015 1.0 0 0 ... 7600 0 10075 May-2
10
2014-05-
16069 7663700968 565000 7 4.50 4140 9066 1.0 0 0 ... 1865 0 13206 May-2
28
2014-05-
1783 7430200100 1220000 4 3.50 4910 9444 1.5 0 0 ... 11063 1 14354 May-2
14
2014-05-
1145 7856410030 1030000 5 2.75 3190 16920 1.0 0 3 ... 13100 1 20110 May-2
05
2014-05-
13624 7855801610 1220000 4 2.50 3190 8684 1.0 0 3 ... 8684 1 11874 May-2
19
2014-05-
6610 1424059154 1270000 4 3.00 5520 8313 2.0 0 3 ... 8278 1 13833 May-2
16
2014-05-
13951 9322800210 879950 4 2.25 3500 13875 1.0 0 4 ... 15000 1 17375 May-2
20
2014-05-
13757 4219401236 1690000 3 1.75 3400 8965 1.0 0 2 ... 8500 1 12365 May-2
20
2014-05-
10529 7784400130 497300 6 2.75 3200 9200 1.0 0 2 ... 9500 0 12400 May-2
05
2014-05-
6832 486000510 1330000 4 3.00 3370 7920 1.0 0 3 ... 7380 1 11290 May-2
23
2014-05-
2479 1624049293 390000 5 3.75 2890 5000 1.0 0 0 ... 5117 0 7890 May-2
06
2014-05-
15539 7855200120 1370000 4 2.75 3720 9450 1.0 0 4 ... 8605 1 13170 May-2
09
2014-05-
2752 7922900040 1080000 4 3.00 3600 9200 1.0 0 4 ... 9775 1 12800 May-2
22
2014-05-
8532 3623500205 2450000 4 4.50 5030 11023 2.0 0 2 ... 11490 1 16053 May-2
13
2014-05-
14866 5152700060 465000 6 3.25 4250 23326 1.0 0 3 ... 15983 1 27576 May-2
28
2014-05-
3344 4058800215 430000 3 3.75 3890 7140 1.0 0 2 ... 7320 0 11030 May-2
28
2014-05-
3501 4122900190 1350000 5 1.75 3380 20021 1.0 0 0 ... 19809 0 23401 May-2
12
2014-05-
15783 217500140 464000 5 2.50 3400 8970 1.0 0 0 ... 8475 0 12370 May-2
13
2014-05-
9331 5152100060 472000 6 2.50 4410 14034 1.0 0 2 ... 13988 1 18444 May-2
29
2014-05-
9349 3342700405 585000 4 1.75 3000 42200 1.0 0 3 ... 9821 0 45200 May-2
22
2014-05-
9279 4139420590 1210000 4 3.50 4560 16643 1.0 0 3 ... 15177 1 21203 May-2
20
2014-05-
520 1313000220 675000 5 3.00 3410 9600 1.0 0 0 ... 9679 0 13010 May-2
13
2014-05-
769 7856410430 1390000 6 2.75 5700 20000 1.0 0 4 ... 15700 1 25700 May-2
30
2014-05-
18293 5425700205 1800000 4 3.50 4460 16953 1.0 0 0 ... 13370 1 21413 May-2
20
2014-05-
6088 9558050170 475000 4 2.50 3740 8700 1.0 0 0 ... 6333 1 12440 May-2
13
2014-05-
18107 1180008355 380000 5 1.75 3000 6000 1.0 0 0 ... 7125 0 9000 May-2
07
2014-05-
17068 8562710550 950000 5 3.75 5330 6000 2.0 0 2 ... 5797 1 11330 May-2
21
2014-05-
21501 2021201000 980000 4 3.00 3680 5854 1.0 0 3 ... 5000 1 9534 May-2
23
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2015-04-
16301 9542000275 675000 4 2.50 2420 18470 1.0 0 0 ... 13800 0 20890 April-2
06
2015-04-
7062 3982700250 799900 4 2.50 3030 7800 2.0 0 0 ... 7435 1 10830 April-2
23
2015-04-
18757 8085400376 2320000 4 3.50 5050 9520 2.0 0 0 ... 9248 1 14570 April-2
21
2015-04-
17329 2655500235 1610000 4 3.50 3920 19088 1.0 0 1 ... 13749 1 23008 April-2
10
2015-04-
5618 3524039202 1070000 3 2.25 2950 7232 1.0 0 2 ... 7140 0 10182 April-2
20
2015-04-
18042 5460600110 1050000 6 4.00 5310 12741 2.0 0 2 ... 12632 1 18051 April-2
23
2015-04-
17165 1736800520 662500 3 2.50 3560 9796 1.0 0 0 ... 8925 0 13356 April-2
03
2015-04-
17090 2141300080 707000 5 2.50 3050 13212 1.0 0 0 ... 10826 0 16262 April-2
24
2015-04-
15407 1373800330 1120000 4 2.50 3690 11191 1.0 0 3 ... 8160 1 14881 April-2
20
2015-04-
14911 4147200040 1090000 5 2.25 3650 13068 1.0 0 0 ... 13927 1 16718 April-2
14
2015-04-
2642 2425059074 740000 5 3.00 3655 51836 1.0 0 0 ... 8606 0 55491 April-2
10
2015-04-
2860 9808100150 3350000 5 3.75 5350 15360 1.0 0 1 ... 15940 1 20710 April-2
02
2015-04-
3498 9560500105 957000 4 2.25 2860 11545 1.0 0 0 ... 11396 0 14405 April-2
24
2015-04-
7793 629860010 1350000 4 3.50 4640 9827 2.0 0 2 ... 8207 1 14467 April-2
29
2015-05-
5853 7964410100 700000 4 3.50 5360 25800 1.0 0 0 ... 21781 1 31160 May-2
04
2015-05-
5500 4139420190 2480000 4 5.00 5310 16909 1.0 0 4 ... 15701 1 22219 May-2
12
2015-05-
12295 1742800430 463828 5 1.75 3250 13702 1.0 0 2 ... 11328 0 16952 May-2
04
2015-05-
5347 9541600490 931088 4 2.50 3510 17400 1.0 0 0 ... 12120 1 20910 May-2
05
2015-05-
13793 6065300840 2850000 4 4.00 5040 17208 1.0 0 0 ... 18647 1 22248 May-2
01
2015-05-
19013 1822079046 500000 3 2.00 3040 41072 1.0 0 0 ... 54014 0 44112 May-2
04
2015-05-
5617 1925069082 2200000 5 4.25 4640 22703 2.0 1 4 ... 14200 0 27343 May-2
11
2015-05-
7035 1180007375 625000 5 3.50 4010 6000 2.0 0 3 ... 6000 1 10010 May-2
12
2015-05-
4032 7878400022 390000 4 2.25 3060 7920 1.0 0 0 ... 7800 0 10980 May-2
06
2015-05-
1890 3336000050 435000 6 3.00 3560 4290 1.0 0 0 ... 6000 0 7850 May-2
01
2015-05-
19313 8835401250 1490000 6 2.75 4430 6440 2.0 0 3 ... 7314 1 10870 May-2
06
2015-05-
4404 3523069008 890000 4 3.25 4360 210254 1.0 0 0 ... 87120 1 214614 May-2
05
2015-05-
20299 3286800260 780000 5 2.50 3480 74052 1.0 0 0 ... 65775 0 77532 May-2
06
2015-05-
4712 1924059254 1300000 5 3.75 3490 15246 1.0 0 1 ... 15682 1 18736 May-2
08
2015-05-
8288 2524049108 1380000 5 4.25 4050 18827 1.0 0 2 ... 25120 1 22877 May-2
12
2015-05-
15391 2925059260 800000 5 2.50 3000 10560 1.0 0 0 ... 11616 0 13560 May-2
06
We got 408 records as outliers, let's drop these outliers

In [100]:

house_df.drop(house_df[ (house_df.basement > upperbound_base) | (house_df.basement < lowerbound_base) ].index, in
place=True)
In [101]:
house_df.shape
Out[101]:
(20594, 30)
In [102]:
#basement_measure
sns.distplot(house_df.basement)
Out[102]:
<matplotlib.axes._subplots.AxesSubplot at 0x22593a3e5f8>
After treating outliers of basement, we can see that 400(~2%) data points got imputed. Total about 5% data has been imputed after treating
ceil_measure and basement.
In [103]:
#Let's see the boxplot now for basement

sns.boxplot(house_df['basement'])
Out[103]:
<matplotlib.axes._subplots.AxesSubplot at 0x22593921d30>
Treating outliers for column - living_measure
In [104]:
lowerbound_lim,upperbound_lim = outlier_treatment(house_df.living_measure)
print(lowerbound_lim,upperbound_lim)
-160.0 4000.0
In [105]:
house_df[(house_df.living_measure < lowerbound_lim) | (house_df.living_measure > upperbound_lim)]
Out[105]:
2014-05-
10110 6669100070 900000 4 3.25 4700 38412 2.0 0 0 ... 35571 1 43112 May-2
12
2014-05-
7275 2926069083 900000 5 3.75 4130 226076 2.0 0 0 ... 55321 1 230206 May-2
07
2014-05-
10549 6819100020 1430000 4 4.25 4960 6000 2.5 0 0 ... 4080 1 10960 May-2
29
2014-05-
1438 5093300325 1610000 4 3.50 4390 11600 2.0 0 3 ... 12000 1 15990 May-2
23
2014-05-
13897 7853280350 809000 5 4.50 4630 6324 2.0 0 0 ... 6790 1 10954 May-2
12
2014-05-
10530 6169901185 490000 5 3.50 4460 2975 3.0 0 2 ... 4231 1 7435 May-2
20
2014-05-
6830 425079099 560000 3 3.00 4120 60392 2.0 0 2 ... 64033 1 64512 May-2
07
2014-05-
2969 7853280550 700000 4 3.50 4490 5099 2.0 0 0 ... 5537 1 9589 May-2
28
2014-05-
2864 251620090 2400000 4 3.25 4140 20734 1.0 0 1 ... 20008 1 24874 May-2
30
2014-05-
2680 587550280 625000 4 3.25 4240 25639 2.0 0 3 ... 24967 1 29879 May-2
30
2014-05-
5059 1338600225 1970000 8 3.50 4440 6480 2.0 0 3 ... 8640 1 10920 May-2
28
2014-05-
11490 526069024 950000 5 3.00 4530 258746 1.5 0 0 ... 83199 1 263276 May-2
12
2014-05-
17520 723000114 1400000 5 3.50 4010 8510 2.0 0 1 ... 6128 1 12520 May-2
05
2014-05-
4965 8562710250 890000 4 4.25 4420 5750 2.0 0 0 ... 5750 1 10170 May-2
05
2014-05-
4596 8562710520 890000 5 3.50 4490 6000 2.0 0 0 ... 6000 1 10490 May-2
05
2014-05-
21557 3758900075 1530000 5 4.50 4270 8076 2.0 0 0 ... 10631 1 12346 May-2
07
2014-05-
1071 1924069039 869000 5 3.25 4180 49222 2.0 0 0 ... 8029 0 53402 May-2
19
2014-06-
15797 3127200021 850000 4 3.50 4140 7089 2.0 0 0 ... 8896 1 11229 June-2
16
2014-06-
5235 293760050 1050000 4 4.25 4390 13833 2.0 0 3 ... 11652 1 18223 June-2
27
2014-06-
19273 3629890190 1300000 4 4.00 4270 6002 2.0 0 3 ... 5942 1 10272 June-2
06
2014-06-
17631 1702901180 665000 6 3.00 4250 4400 2.5 0 0 ... 4950 0 8650 June-2
11
2014-06-
4215 8043700300 2700000 4 3.25 4420 7850 2.0 1 4 ... 8525 1 12270 June-2
08
2014-06-
7571 3616600231 960000 4 3.00 4590 9150 2.0 0 0 ... 12348 1 13740 June-2
03
2014-06-
17402 8128600060 600000 4 3.25 4690 14930 2.0 0 2 ... 13320 1 19620 June-2
24
2014-06-
19073 5561300730 530000 4 3.25 4160 35654 2.0 0 0 ... 35675 0 39814 June-2
05
2014-06-
7431 5078400160 1800000 5 4.50 4400 15580 2.0 0 0 ... 14249 1 19980 June-2
05
2014-06-
21125 5700003630 1930000 5 4.25 4830 8050 2.5 0 2 ... 9194 1 12880 June-2
30
2014-06-
1552 1336800010 1340000 5 2.25 4200 5800 2.5 0 0 ... 5800 1 10000 June-2
13
2014-06-
14441 7853280570 765000 4 3.00 4410 5104 2.0 0 0 ... 5537 1 9514 June-2
04
2014-06-
372 7636800041 995000 3 4.50 4380 47044 2.0 1 3 ... 18512 1 51424 June-2
25
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2015-03-
11314 722059020 550000 6 4.50 4520 40164 2.0 0 0 ... 13068 1 44684 March-2
18
2015-03-
11089 745530180 870000 5 3.50 4495 10079 2.0 0 0 ... 10079 1 14574 March-2
17
2015-03-
11340 9362000080 1600000 5 3.50 4050 20925 2.0 0 3 ... 18321 1 24975 March-2
16
2015-03-
13844 1824079073 985000 5 4.25 4650 108464 2.0 0 0 ... 155509 1 113114 March-2
31
2015-03-
9911 3026059085 1290000 5 3.50 4090 290980 1.0 0 0 ... 9255 1 295070 March-2
17
2015-03-
17302 1924059319 1290000 5 4.00 4050 11358 2.0 0 0 ... 13555 1 15408 March-2
20
2015-03-
14676 1333300145 2230000 3 4.00 4200 30120 2.0 0 2 ... 12200 1 34320 March-2
04
2015-03-
11625 2600010220 1250000 4 2.50 4040 11350 2.0 0 2 ... 12382 1 15390 March-2
26
2015-03-
8631 3616600003 1680000 3 2.50 4090 16972 2.0 0 2 ... 16972 1 21062 March-2
02
2015-04-
14545 2579500101 1390000 4 3.50 4010 10880 2.0 0 3 ... 17310 1 14890 April-2
21
2015-04-
567 9185700485 2540000 4 3.50 4350 6000 2.0 0 0 ... 7200 1 10350 April-2
01
2015-04-
21062 3303980140 1150000 4 3.00 4160 13170 2.0 0 0 ... 13148 1 17330 April-2
02
2015-04-
12545 269000970 1300000 5 3.75 4450 7680 2.0 0 0 ... 6400 1 12130 April-2
02
2015-04-
1512 1118000340 3000000 5 3.75 4590 11265 2.0 0 0 ... 8996 1 15855 April-2
08
2015-04-
4292 1115300270 900000 6 3.75 4210 6105 2.0 0 0 ... 6368 1 10315 April-2
28
2015-04-
16585 6645950070 1450000 4 3.50 5000 38012 2.0 0 0 ... 18054 1 43012 April-2
01
2015-04-
16752 8562720420 1350000 4 3.50 4740 8611 2.0 0 3 ... 8321 1 13351 April-2
30
2015-04-
16743 1223089077 718000 3 1.75 4060 136290 1.0 0 0 ... 51836 0 140350 April-2
01
2015-04-
7550 2260300060 2580000 5 3.00 4780 20440 1.0 0 0 ... 20440 1 25220 April-2
10
2015-04-
5755 1069000070 2800000 5 3.25 4590 12793 2.0 0 2 ... 8609 1 17383 April-2
15
2015-04-
5706 4128500380 1200000 4 2.50 4280 12796 2.0 0 0 ... 9593 1 17076 April-2
27
2015-04-
5670 2254100090 887250 5 3.50 4320 7502 2.0 0 0 ... 7538 1 11822 April-2
07
2015-04-
5572 853200040 2410000 5 2.50 4600 23250 1.5 0 2 ... 20066 1 27850 April-2
28
2015-04-
7209 8562750060 825000 5 3.50 4140 6770 2.0 0 0 ... 5431 1 10910 April-2
20
2015-04-
19608 114101505 630000 5 3.50 4060 8309 2.0 0 0 ... 11711 1 12369 April-2
23
2015-04-
7997 5700004028 2450000 4 4.25 4250 6552 2.0 0 3 ... 8841 1 10802 April-2
17
2015-05-
17159 1118000320 3400000 4 4.00 4260 11765 2.0 0 0 ... 10408 1 16025 May-2
08
2015-05-
17742 5428000070 770000 5 3.50 4750 8234 2.0 0 2 ... 14496 1 12984 May-2
11
2015-05-
16333 2421059090 640000 4 2.50 4090 215186 2.0 0 0 ... 142005 0 219276 May-2
11
2015-05-
1152 1525069088 442500 5 3.25 4240 226097 2.0 0 0 ... 217800 0 230337 May-2
04
We got 178 records as outliers. Let's treat this by dropping

In [106]:

house_df.drop(house_df[ (house_df.living_measure > upperbound_lim) | (house_df.living_measure < lowerbound_lim) ]
.index, inplace=True)
In [107]:
#let's see the boxplot after dropping the outliers

sns.boxplot(house_df['living_measure'])
Out[107]:
<matplotlib.axes._subplots.AxesSubplot at 0x22593a3e240>
In [108]:
sns.distplot(house_df.living_measure)
Out[108]:
<matplotlib.axes._subplots.AxesSubplot at 0x22595886198>
By treating outliers of living_measure, we lost 178 data points more and data distribution looks normal
In [109]:
# shape of the data after imputing outliers in living_column

house_df.shape
Out[109]:
(20416, 30)
Treating outliers for column - lot_measure
In [110]:
lowerbound_lom,upperbound_lom = outlier_treatment(house_df.lot_measure)
print(lowerbound_lom,upperbound_lom)
-2774.875 17958.125
In [111]:
house_df[(house_df.lot_measure < lowerbound_lom) | (house_df.lot_measure > upperbound_lom)]
Out[111]:
2014-05-
10082 1121039059 503000 2 1.75 2860 59612 1.0 1 4 ... 59612 0 62472 May-2
22
2014-05-
14089 6070500055 599000 4 2.25 2260 29930 2.0 0 0 ... 29930 0 32190 May-2
06
2014-05-
1611 5561000190 437500 3 2.25 1970 35100 2.0 0 0 ... 35100 1 37070 May-2
02
2014-05-
14068 5111400086 110000 3 1.00 1250 53143 1.0 0 0 ... 217800 0 54393 May-2
12
2014-05-
14081 3022039071 800000 2 2.25 1730 31491 2.0 1 2 ... 12410 0 33221 May-2
30
2014-05-
20351 9808610190 782000 4 2.50 2830 20345 2.0 0 0 ... 13732 1 23175 May-2
09
2014-05-
9981 2324800350 860000 4 2.00 3740 32417 2.0 0 0 ... 32417 1 36157 May-2
06
2014-05-
16273 1823069279 499950 5 3.50 3200 43560 2.0 0 0 ... 43560 0 46760 May-2
20
2014-05-
16325 7214700160 610000 3 3.00 2480 45302 1.0 0 0 ... 14100 0 47782 May-2
09
2014-05-
10030 2025700730 287200 3 3.00 1850 19966 1.0 0 0 ... 6715 0 21816 May-2
02
2014-05-
3870 1330900250 550000 3 2.25 1980 40887 1.0 0 0 ... 35700 0 42867 May-2
15
2014-05-
16422 4047200380 460000 2 1.50 2730 19877 1.0 0 0 ... 19509 0 22607 May-2
26
2014-05-
3865 2924069132 527500 3 1.75 2310 78844 1.0 0 0 ... 6230 0 81154 May-2
27
2014-05-
1589 4045500510 420850 1 1.00 960 40946 1.0 0 0 ... 20350 0 41906 May-2
21
2014-05-
3824 320069049 305000 4 1.50 1590 131551 1.0 0 3 ... 108028 0 133141 May-2
14
2014-05-
18773 1321720140 370000 4 2.50 3090 18645 2.0 0 0 ... 20114 1 21735 May-2
28
2014-05-
14357 3210950080 486000 4 2.50 2150 39449 1.0 0 0 ... 35717 0 41599 May-2
14
2014-05-
14310 1921069082 560000 3 2.00 2560 216777 1.0 0 0 ... 108463 0 219337 May-2
12
2014-05-
16144 7574910780 766950 3 2.50 3030 30007 1.5 0 0 ... 34983 1 33037 May-2
14
2014-05-
9659 1023059365 520000 3 2.50 2460 54885 2.0 0 0 ... 21407 0 57345 May-2
06
2014-05-
7446 4012800010 360000 4 2.00 2680 18768 1.0 0 0 ... 15750 0 21448 May-2
06
2014-05-
18970 3523089019 480000 4 3.50 3370 435600 2.0 0 3 ... 114868 1 438970 May-2
19
2014-05-
7423 4188000670 749400 4 2.50 3240 20301 2.0 0 0 ... 23650 1 23541 May-2
15
2014-05-
16149 9368700031 195000 2 1.00 720 18000 1.0 0 0 ... 7925 0 18720 May-2
09
2014-05-
9913 8856000545 100000 2 1.00 910 22000 1.0 0 0 ... 9891 0 22910 May-2
07
2014-05-
7214 124069032 600000 3 1.75 1670 39639 1.0 0 0 ... 30492 0 41309 May-2
05
2014-05-
7264 2724089019 527550 1 0.75 820 59677 1.0 0 0 ... 14163 0 60497 May-2
23
2014-05-
7323 3761700251 600000 4 2.00 2510 38141 1.0 0 0 ... 11760 1 40651 May-2
28
2014-05-
1809 226059103 570000 3 1.75 1930 36210 1.0 0 0 ... 35060 0 38140 May-2
27
2014-05-
13684 1721069036 412000 3 1.75 1950 52256 1.0 0 0 ... 51836 0 54206 May-2
29
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2015-05-
19300 3924500130 460000 2 2.50 1880 40575 1.0 0 0 ... 32935 1 42455 May-2
06
2015-05-
3221 1775700011 390000 3 2.50 1410 26375 1.0 0 0 ... 12474 0 27785 May-2
12
2015-05-
20818 126039394 525000 4 2.75 2300 26650 1.0 0 0 ... 9879 0 28950 May-2
08
2015-05-
3195 9310300215 652500 4 1.75 3130 18253 2.0 0 0 ... 12220 0 21383 May-2
06
2015-05-
13297 322069010 435000 3 2.00 2570 233481 1.5 0 0 ... 157687 0 236051 May-2
08
2015-05-
11080 1823069088 492000 2 1.75 1300 22239 1.0 0 0 ... 14810 0 23539 May-2
04
2015-05-
11081 625069064 625000 3 2.25 2570 47480 1.0 0 0 ... 106722 1 50050 May-2
07
2015-05-
9892 2124069103 374000 3 1.75 1510 18439 1.0 0 0 ... 34326 0 19949 May-2
05
2015-05-
4291 2426049079 330000 3 1.00 1060 20040 1.0 0 0 ... 10800 0 21100 May-2
06
2015-05-
9880 8011100050 350000 2 1.00 1220 28703 1.0 0 0 ... 6720 0 29923 May-2
08
2015-05-
4246 2722059275 536000 3 2.75 2290 34548 2.0 0 3 ... 275299 0 36838 May-2
12
2015-05-
13312 8835800450 950000 3 2.50 2780 275033 1.0 0 0 ... 16340 1 277813 May-2
04
2015-05-
20752 1326069050 750000 2 2.00 2370 155130 1.0 0 0 ... 14475 0 157500 May-2
04
2015-05-
2011 2591720160 674950 3 2.75 3510 92347 2.0 0 0 ... 37070 1 95857 May-2
01
2015-05-
20375 302000375 250000 3 2.00 1050 18304 1.0 0 0 ... 15675 0 19354 May-2
06
2015-05-
15300 722039087 329000 2 1.00 990 57499 1.0 0 0 ... 27442 0 58489 May-2
04
2015-05-
19102 1774220070 550000 4 2.25 2590 36256 2.0 0 0 ... 35657 0 38846 May-2
07
2015-05-
9615 2316400285 495000 4 3.50 2490 18042 2.0 0 0 ... 21107 0 20532 May-2
13
2015-05-
14043 9406510130 448000 5 3.50 3740 24684 2.0 0 0 ... 26023 1 28424 May-2
05
2015-05-
10637 522079068 513000 3 2.50 2150 161607 2.0 0 0 ... 207781 0 163757 May-2
06
2015-05-
6109 251610020 1580000 4 2.75 3480 19991 2.0 0 2 ... 20271 1 23471 May-2
08
2015-05-
12897 4027701265 480000 3 1.75 2920 21375 1.0 0 0 ... 8482 0 24295 May-2
01
2015-05-
1591 4166600610 335000 3 2.00 1410 44866 1.0 0 0 ... 29152 0 46276 May-2
14
2015-05-
6116 122029066 490000 3 1.75 2020 215622 2.0 0 0 ... 215622 0 217642 May-2
08
2015-05-
7911 3585900460 1060000 6 2.75 2980 20000 1.0 0 4 ... 20000 0 22980 May-2
01
2015-05-
11329 2320069111 449999 4 1.75 2290 36900 1.5 0 2 ... 12434 0 39190 May-2
07
2015-05-
9752 2521059060 490000 3 2.25 2840 107157 2.0 0 0 ... 215622 1 109997 May-2
01
2015-05-
4012 6446200050 540000 3 1.75 2590 25992 1.0 0 0 ... 29250 0 28582 May-2
04
2015-05-
16888 3422059208 390000 3 2.50 1930 64904 1.0 0 0 ... 57500 0 66834 May-2
11
2015-05-
4579 1921069101 399000 3 1.75 2170 73616 1.0 0 0 ... 297514 0 75786 May-2
08
We got 2155 records which are outliers. Let's drop these outlier records.
In [112]:

house_df.drop(house_df[ (house_df.lot_measure > upperbound_lom) | (house_df.lot_measure < lowerbound_lom) ].index
, inplace=True)
In [113]:
#let's plot after treating outliers

sns.boxplot(house_df['lot_measure'])
Out[113]:
<matplotlib.axes._subplots.AxesSubplot at 0x22593975eb8>
In [114]:
house_df.shape
Out[114]:
(18288, 30)
Total outliers in the lot_measure are 2128 data points. But still we are going ahead with imputing the data. We will analyze later whether there is
any impact on the data set or not.
Treating outliers for column - room_bed
In [115]:
#As we know for room_bed = 33 was outlier from our earlier findings, let's see the record and drop it
house_df[house_df['room_bed']==33]
Out[115]:
cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... lot_measure15 furnished total_area month_year
2014-06-
750 2402100895 640000 33 1.75 1620 6000 1.0 0 0 ... 4700 0 7620 June-2014
25
In [116]:

house_df.drop(house_df[ (house_df.room_bed == 33) ].index, inplace=True)
In [117]:
house_df.shape
Out[117]:
(18287, 30)
In summary, after treating outliers, we have lost about 15% of the data. We will analyse the impact of this data loss during the model
evaluation.
In [118]:
#let's see the feature/columns and drop the unneccessary features

house_df.columns
Out[118]:

'total_area', 'month_year', 'City', 'County', 'Type', 'has_basement',
'HouseLandRatio', 'has_renovated'],
dtype='object')
As we already have this information in other features. We will drop the unwanted columns from new copied dataframe instance :
cid,dayhours,yr_renovated,zipcode,lat,long,county,type
In [119]:
#Let's create another dataframe for modeling

df_model=house_df.copy()
In [120]:
#let's check the new copy of dataframe by printing first few records
df_model.head()
Out[120]:
cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... lot_measure15 furnished total_area month_ye
2014-05-
17786 7568700740 430000 3 2.75 2550 11160 2.0 0 0 ... 7440 0 13710 May-20
21
2014-05-
3782 2248000080 385500 3 2.00 1540 7947 1.0 0 0 ... 7950 0 9487 May-20
21
2014-05-
10069 7805450110 736000 4 2.50 2290 12047 2.0 0 0 ... 15666 1 14337 May-20
06
2014-05-
7114 2215500080 580000 5 2.00 1940 6000 1.0 0 0 ... 6000 0 7940 May-20
28
2014-05-
10080 1219000043 315000 5 1.75 2320 8100 1.0 0 0 ... 7271 0 10420 May-20
09
New instance of dataframe for model created successfully
In [121]:
#let's verify the columns

df_model.columns
Out[121]:

'total_area', 'month_year', 'City', 'County', 'Type', 'has_basement',
dtype='object')
In [122]:
#Dropping the feature not required in 1st Iteration

df_final=df_model.drop(['cid','dayhours','yr_renovated','zipcode','lat','long','County','Type'],axis=1)
In [123]:
df_final.shape
Out[123]:
(18287, 22)
In [124]:
df_final.head()
Out[124]:
price room_bed room_bath living_measure lot_measure ceil coast sight condition quality ... yr_built living_measure15 lot_measure15 furnished
17786 430000 3 2.75 2550 11160 2.0 0 0 3 8 ... 1994 1020 7440
3782 385500 3 2.00 1540 7947 1.0 0 0 3 7 ... 1961 1910 7950
10069 736000 4 2.50 2290 12047 2.0 0 0 4 9 ... 1988 3130 15666
7114 580000 5 2.00 1940 6000 1.0 0 0 5 7 ... 1945 1700 6000
10080 315000 5 1.75 2320 8100 1.0 0 0 4 7 ... 1956 1410 7271
In [125]:
df_final.columns
Out[125]:
Index(['price', 'room_bed', 'room_bath', 'living_measure', 'lot_measure',

'ceil', 'coast', 'sight', 'condition', 'quality', 'ceil_measure',
'basement', 'yr_built', 'living_measure15', 'lot_measure15',
'furnished', 'total_area', 'month_year', 'City', 'has_basement',
dtype='object')
Creating dummies for categorical variables: 'room_bed', 'room_bath', 'ceil', 'coast', 'sight', 'condition', 'quality', 'furnished','City',
'has_basement', 'has_renovated'
In [126]:
# Getting dummies for columns ceil, coast, sight, condition, quality, yr_renovated, furnished
dff = pd.get_dummies(df_final, columns=['room_bed', 'room_bath', 'ceil', 'coast', 'sight', 'condition', 'quality'
, 'furnished','City',
'has_basement', 'has_renovated'],drop_first=True)
In [127]:
# let's see the data types of the features

dff.shape
Out[127]:
(18287, 92)
In [128]:
dff.columns
Out[128]:
Index(['price', 'living_measure', 'lot_measure', 'ceil_measure', 'basement',

'yr_built', 'living_measure15', 'lot_measure15', 'total_area',
'month_year', 'HouseLandRatio', 'room_bed_1', 'room_bed_2',
'room_bed_3', 'room_bed_4', 'room_bed_5', 'room_bed_6', 'room_bed_7',
'room_bed_8', 'room_bed_9', 'room_bed_10', 'room_bed_11',
'room_bath_0.5', 'room_bath_0.75', 'room_bath_1.0', 'room_bath_1.25',
'room_bath_5.75', 'ceil_1.5', 'ceil_2.0', 'ceil_2.5', 'ceil_3.0',
'ceil_3.5', 'coast_1', 'sight_1', 'sight_2', 'sight_3', 'sight_4',
'condition_2', 'condition_3', 'condition_4', 'condition_5', 'quality_4',
'quality_5', 'quality_6', 'quality_7', 'quality_8', 'quality_9',
'quality_10', 'quality_11', 'quality_12', 'furnished_1',
'City_Bellevue', 'City_Black Diamond', 'City_Bothell', 'City_Carnation',
'City_Duvall', 'City_Enumclaw', 'City_Fall City', 'City_Federal Way',
'City_Issaquah', 'City_Kenmore', 'City_Kent', 'City_Kirkland',
'City_Maple Valley', 'City_Medina', 'City_Mercer Island',
'City_North Bend', 'City_Redmond', 'City_Renton', 'City_Sammamish',
'City_Seattle', 'City_Snoqualmie', 'City_Vashon', 'City_Woodinville',
'has_basement_Yes', 'has_renovated_Yes'],
dtype='object')
Ready for model building
'dff' is the data frame which is ready for modeling
In [129]:
dff.head()
Out[129]:
City_North
price living_measure lot_measure ceil_measure basement yr_built living_measure15 lot_measure15 total_area month_year ... City_Red
Bend
17786 430000 2550 11160 2550 0 1994 1020 7440 13710 May-2014 ... 0
3782 385500 1540 7947 1120 420 1961 1910 7950 9487 May-2014 ... 0
10069 736000 2290 12047 2290 0 1988 3130 15666 14337 May-2014 ... 0
7114 580000 1940 6000 970 970 1945 1700 6000 7940 May-2014 ... 0
10080 315000 2320 8100 1160 1160 1956 1410 7271 10420 May-2014 ... 0
In [130]:
#let's drop the month_year column as we already analyzed it

dff=dff.drop(['month_year'],axis=1)
In [131]:
#Creating X, y for training and testing set

X = dff.drop("price" , axis=1)
y = dff["price"]
In [132]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=10)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=10)
In [133]:
print(X_train.shape)
print(X_test.shape)
print(X_val.shape)
(11703, 90)
(3658, 90)
(2926, 90)
In [134]:
dff.head()
Out[134]:
City_North
price living_measure lot_measure ceil_measure basement yr_built living_measure15 lot_measure15 total_area HouseLandRatio ... City
Bend
17786 430000 2550 11160 2550 0 1994 1020 7440 13710 19.0 ... 0
3782 385500 1540 7947 1120 420 1961 1910 7950 9487 16.0 ... 0
10069 736000 2290 12047 2290 0 1988 3130 15666 14337 16.0 ... 0
7114 580000 1940 6000 970 970 1945 1700 6000 7940 24.0 ... 0
10080 315000 2320 8100 1160 1160 1956 1410 7271 10420 22.0 ... 0
Model building
Let's build the model and see their performances
Linear Regression (with Ridge and Lasso)
In [135]:
#importing the necessary libraries

from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn import metrics

from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
In [136]:
LR1 = LinearRegression()
LR1.fit(X_train, y_train)
#predicting result over test data
y_LR1_predtr= LR1.predict(X_train)
y_LR1_predvl= LR1.predict(X_val)
LR1.coef_
Out[136]:
array([ 4.65340600e+01, -2.60438290e+01, 4.15085952e+01, 5.02547332e+00,

-2.04412513e+03, 5.35189660e+01, -1.85866571e+00, 2.04902213e+01,
1.71493571e+02, -1.52366392e+04, -1.71559917e+02, -7.41190980e+03,
-2.49409439e+04, -3.38428650e+04, -7.05629386e+04, -1.39570745e+05,
-6.13484472e+04, -5.01063903e+04, -1.54205825e+05, -2.24656830e+05,
1.25982909e+04, 8.20853745e+04, 9.07899656e+04, 2.27686059e+05,
9.03633165e+04, 1.00072705e+05, 1.08362140e+05, 1.12686599e+05,
1.12536191e+05, 1.13322504e+05, 1.30547366e+05, 1.77676262e+05,
1.64433901e+05, 2.85527951e+05, 1.71382012e+05, 1.61500051e+05,
1.74737226e+05, 9.19752797e+05, 1.55294652e+05, 2.95336027e+05,
-7.18864612e-09, 1.64060215e+04, 1.53741629e+04, 5.55177883e+04,
5.71678128e+04, 7.56908538e+04, 2.60659481e+05, 4.01215839e+04,
4.57278795e+04, 1.17418144e+05, 2.55845105e+05, 9.85063804e+04,
1.30265390e+05, 1.58016139e+05, 1.95914534e+05, -1.52225301e+05,
-1.50728574e+05, -1.30295440e+05, -5.38413267e+04, 1.41195146e+04,
-3.31240159e+05, -2.05150668e+05, 3.46837714e+04, 9.74678182e+05,
4.72971126e+05, 2.94909682e+05, 1.32458120e+05, 1.23710820e+05,
1.76625819e+05, 1.09165616e+05, 1.81816493e+04, 1.72163167e+05,
-1.34952467e+04, 1.66521627e+05, 1.21190592e+05, 1.50478273e+04,
2.33952638e+05, 4.07406448e+04, 8.02211265e+05, 4.19684384e+05,
1.32976614e+05, 2.32526660e+05, 6.01053338e+04, 1.61179525e+05,
1.73301885e+05, 1.04083333e+05, 8.65668543e+04, 1.56470485e+05,
2.84414383e+04, 3.11287808e+04])
In [137]:
#Model score and Deduction for each Model in a DataFrame

LR1_trscore=r2_score(y_train,y_LR1_predtr)
LR1_trRMSE=np.sqrt(mean_squared_error(y_train, y_LR1_predtr))
LR1_trMSE=mean_squared_error(y_train, y_LR1_predtr)
LR1_trMAE=mean_absolute_error(y_train, y_LR1_predtr)
LR1_vlscore=r2_score(y_val,y_LR1_predvl)
LR1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_LR1_predvl))
LR1_vlMSE=mean_squared_error(y_val, y_LR1_predvl)
LR1_vlMAE=mean_absolute_error(y_val, y_LR1_predvl)
Compa_df=pd.DataFrame({'Method':['Linear Reg Model1'],'Val Score':LR1_vlscore,'RMSE_vl': LR1_vlRMSE, 'MSE_vl': LR

1_vlMSE, 'MAE_vl': LR1_vlMAE,'train Score':LR1_trscore,'RMSE_tr': LR1_trRMSE, 'MSE_tr': LR1_trMSE, 'MAE_tr': LR1_
trMAE})
#Compa_df = Compa_df[['Method', 'Test Score', 'RMSE', 'MSE', 'MAE']]
Compa_df
Out[137]:
Method Val Score RMSE_vl MSE_vl MAE_vl train Score RMSE_tr MSE_tr MAE_tr
0 Linear Reg Model1 0.718749 137733.698415 1.897057e+10 93994.455301 0.730112 132958.367261 1.767793e+10 92391.001786
The linear regression model performed with scores 0.73 & .72 in training data set and validation data set respectively
In [138]:
sns.set(style="darkgrid", color_codes=True)
with sns.axes_style("white"):
sns.jointplot(x=y_val, y=y_LR1_predvl, kind="reg", color="k")
Lasso model
In [139]:
Lasso1 = Lasso(alpha=1)
Lasso1.fit(X_train, y_train)

y_Lasso1_predtr= Lasso1.predict(X_train)
y_Lasso1_predvl= Lasso1.predict(X_val)
Lasso1.coef_
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:492: Convergen
ceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting
data with very small alpha may cause precision problems.
ConvergenceWarning)
Out[139]:
array([ 9.65902931e+01, -3.93953142e+00, 1.35529095e+01, -2.29061776e+01,

-2.04163675e+03, 5.35553554e+01, -1.85807486e+00, -1.60949858e+00,
1.73465396e+02, 2.58275091e+04, 4.11967331e+04, 3.40052776e+04,
1.64885732e+04, 7.61896026e+03, -2.86912815e+04, -9.69275851e+04,
-1.65210854e+04, -4.12121513e+03, -1.00992716e+05, -1.71298527e+05,
-9.13241618e+04, -2.48925268e+04, -1.60169381e+04, 1.18516529e+05,
-1.64860326e+04, -6.79525494e+03, 1.46220219e+03, 5.75895038e+03,
5.60351070e+03, 6.32472345e+03, 2.34670403e+04, 7.06751057e+04,
5.73457320e+04, 1.78131982e+05, 6.38573909e+04, 5.27207711e+04,
6.65775444e+04, 8.00797041e+05, 4.05485728e+04, 1.84114085e+05,
0.00000000e+00, 1.64275884e+04, 1.53464259e+04, 5.53276759e+04,
5.70768344e+04, 7.27284380e+04, 2.60559411e+05, 3.99774418e+04,
4.57348433e+04, 1.17367493e+05, 2.55764846e+05, 9.44643505e+04,
1.25980055e+05, 1.53729964e+05, 1.91681487e+05, -1.98077420e+05,
-2.00695519e+05, -1.80295562e+05, -1.03865243e+05, -3.58821013e+04,
5.98641869e+04, 1.85986501e+05, 4.26066854e+05, 1.35396787e+06,
3.18911457e+04, 2.94465826e+05, 1.31572941e+05, 1.23199746e+05,
1.75754190e+05, 1.08625950e+05, 1.76695808e+04, 1.70928679e+05,
-1.38642363e+04, 1.66073186e+05, 1.20696485e+05, 1.45879439e+04,
2.33538555e+05, 4.03015196e+04, 8.01308223e+05, 4.19176853e+05,
1.32466072e+05, 2.32091780e+05, 5.96809785e+04, 1.60725157e+05,
1.72962017e+05, 1.03839270e+05, 8.54647044e+04, 1.55943713e+05,
2.84495484e+04, 3.11881238e+04])
In [140]:

Lasso1_trscore=r2_score(y_train,y_Lasso1_predtr)
Lasso1_trRMSE=np.sqrt(mean_squared_error(y_train, y_Lasso1_predtr))
Lasso1_trMSE=mean_squared_error(y_train, y_Lasso1_predtr)
Lasso1_trMAE=mean_absolute_error(y_train, y_Lasso1_predtr)
Lasso1_vlscore=r2_score(y_val,y_Lasso1_predvl)
Lasso1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_Lasso1_predvl))
Lasso1_vlMSE=mean_squared_error(y_val, y_Lasso1_predvl)
Lasso1_vlMAE=mean_absolute_error(y_val, y_Lasso1_predvl)
Lasso1_df=pd.DataFrame({'Method':['Linear-Reg Lasso1'],'Val Score':Lasso1_vlscore,'RMSE_vl': Lasso1_vlRMSE, 'MSE_

vl': Lasso1_vlMSE, 'MAE_vl': Lasso1_vlMAE,'train Score':Lasso1_trscore,'RMSE_tr': Lasso1_trRMSE, 'MSE_tr': Lasso1
_trMSE, 'MAE_tr': Lasso1_trMAE})
Compa_df = pd.concat([Compa_df, Lasso1_df])
Compa_df
Out[140]:
0 Linear-Reg Lasso1 0.719117 137643.639712 1.894577e+10 93939.441186 0.730092 132963.180396 1.767921e+10 92403.854117
The lasso linear regression model performed with scores 0.73 & .72 in training data set and validation data set respectively. The coefficeints of
1 variable in lasso model is almost '0', signifying that the variable with '0' coefficient can be dropped.
In [141]:
sns.jointplot(x=y_val, y=y_Lasso1_predvl, kind="reg", color="k")
Ridge model
In [142]:
Ridge1 = Ridge(alpha=0.5)
Ridge1.fit(X_train, y_train)

y_Ridge1_predtr= Ridge1.predict(X_train)
y_Ridge1_predvl= Ridge1.predict(X_val)
Ridge1.coef_
Out[142]:
array([ 4.66834622e+01, -2.60918244e+01, 4.15530007e+01, 5.13138900e+00,

-2.04329070e+03, 5.40305072e+01, -1.83894732e+00, 2.05899911e+01,
1.99149390e+02, 4.75922037e+04, 6.34342640e+04, 5.62050467e+04,
3.84394798e+04, 2.95688617e+04, -6.95633001e+03, -7.29503951e+04,
2.07847666e+03, 1.15197429e+04, -6.08136975e+04, -1.07506532e+05,
-1.24763131e+05, -6.99157975e+04, -6.12127626e+04, 6.84637990e+04,
-6.18022272e+04, -5.21975799e+04, -4.39150711e+04, -3.96547935e+04,
-4.00644775e+04, -3.93829165e+04, -2.22301737e+04, 2.63190006e+04,
1.13660948e+04, 1.31445711e+05, 1.79480665e+04, 7.08429226e+03,
2.07235857e+04, 5.06032623e+05, 1.96912226e+03, 1.26354803e+05,
0.00000000e+00, 1.62595610e+04, 1.52860502e+04, 5.48436035e+04,
5.68361232e+04, 6.73075467e+04, 2.58145115e+05, 3.96081664e+04,
4.58498930e+04, 1.16570206e+05, 2.54669644e+05, 8.12334931e+04,
1.12963518e+05, 1.40760828e+05, 1.78571191e+05, -1.31486307e+05,
-1.40442366e+05, -1.20195568e+05, -4.38816320e+04, 2.39956530e+04,
-2.60492990e+05, -1.34329618e+05, 1.11752710e+05, 6.95080119e+05,
4.12010220e+05, 2.90368008e+05, 1.26135853e+05, 1.18985675e+05,
1.69604765e+05, 1.04432882e+05, 1.42752316e+04, 1.63147681e+05,
-1.74612788e+04, 1.62055462e+05, 1.16592568e+05, 1.10585170e+04,
2.29612234e+05, 3.67503018e+04, 7.71949916e+05, 4.13369607e+05,
1.28389461e+05, 2.28151004e+05, 5.60163760e+04, 1.56620270e+05,
1.69467429e+05, 9.98907258e+04, 8.00564608e+04, 1.51440054e+05,
2.85123132e+04, 3.11686377e+04])
In [143]:

Ridge1_trscore=r2_score(y_train,y_Ridge1_predtr)
Ridge1_trRMSE=np.sqrt(mean_squared_error(y_train, y_Ridge1_predtr))
Ridge1_trMSE=mean_squared_error(y_train, y_Ridge1_predtr)
Ridge1_trMAE=mean_absolute_error(y_train, y_Ridge1_predtr)
Ridge1_vlscore=r2_score(y_val,y_Ridge1_predvl)
Ridge1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_Ridge1_predvl))
Ridge1_vlMSE=mean_squared_error(y_val, y_Ridge1_predvl)
Ridge1_vlMAE=mean_absolute_error(y_val, y_Ridge1_predvl)
Ridge1_df=pd.DataFrame({'Method':['Linear-Reg Ridge1'],'Val Score':Ridge1_vlscore,'RMSE_vl': Ridge1_vlRMSE, 'MSE_

vl': Ridge1_vlMSE, 'MAE_vl': Ridge1_vlMAE,'train Score':Ridge1_trscore,'RMSE_tr': Ridge1_trRMSE, 'MSE_tr': Ridge1
_trMSE, 'MAE_tr': Ridge1_trMAE})
Compa_df = pd.concat([Compa_df, Ridge1_df])
Compa_df
Out[143]:
0 Linear-Reg Ridge1 0.718929 137689.597398 1.895843e+10 93992.809617 0.729789 133037.735155 1.769904e+10 92497.255174
The Ridge linear regression model performed with scores 0.73 & .72 in training data set and validation data set respectively. The coefficeints of
variables in ridge model are all non-zero, indicating that non of the variables can be dropped.
In [144]:
sns.jointplot(x=y_val, y=y_Ridge1_predvl, kind="reg", color="k")
In summary, Linear models have performed almost with similar results in both regularized model and non-regularized models
KNN Regressor
In [145]:
from sklearn.neighbors import KNeighborsRegressor
In [146]:
knn1 = KNeighborsRegressor(n_neighbors=4,weights='distance')
knn1.fit(X_train, y_train)

y_knn1_predtr= knn1.predict(X_train)
y_knn1_predvl= knn1.predict(X_val)
In [147]:

knn1_trscore=r2_score(y_train,y_knn1_predtr)
knn1_trRMSE=np.sqrt(mean_squared_error(y_train, y_knn1_predtr))
knn1_trMSE=mean_squared_error(y_train, y_knn1_predtr)
knn1_trMAE=mean_absolute_error(y_train, y_knn1_predtr)
knn1_vlscore=r2_score(y_val,y_knn1_predvl)
knn1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_knn1_predvl))
knn1_vlMSE=mean_squared_error(y_val, y_knn1_predvl)
knn1_vlMAE=mean_absolute_error(y_val, y_knn1_predvl)
knn1_df=pd.DataFrame({'Method':['knn1'],'Val Score':knn1_vlscore,'RMSE_vl': knn1_vlRMSE, 'MSE_vl': knn1_vlMSE, 'M

AE_vl': knn1_vlMAE,'train Score':knn1_trscore,'RMSE_tr': knn1_trRMSE, 'MSE_tr': knn1_trMSE, 'MAE_tr': knn1_trMAE}
)
Compa_df = pd.concat([Compa_df, knn1_df])
Compa_df
Out[147]:
0 knn1 0.425008 196935.451160 3.878357e+10 138494.383286 0.998628 9480.192071 8.987404e+07 887.708707

Though KNN regressor performed well in training set, the performance score in validation set is very less. This shows that the model is
overfitted in training set
Support vector regressor
In [148]:
from sklearn.svm import SVR
In [149]:
SVR1 = SVR(gamma='auto',C=10.0, epsilon=0.2,kernel='rbf')

SVR1.fit(X_train, y_train)
y_SVR1_predtr= SVR1.predict(X_train)
y_SVR1_predvl= SVR1.predict(X_val)
In [150]:

SVR1_trscore=r2_score(y_train,y_SVR1_predtr)
SVR1_trRMSE=np.sqrt(mean_squared_error(y_train, y_SVR1_predtr))
SVR1_trMSE=mean_squared_error(y_train, y_SVR1_predtr)
SVR1_trMAE=mean_absolute_error(y_train, y_SVR1_predtr)
SVR1_vlscore=r2_score(y_val,y_SVR1_predvl)
SVR1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_SVR1_predvl))
SVR1_vlMSE=mean_squared_error(y_val, y_SVR1_predvl)
SVR1_vlMAE=mean_absolute_error(y_val, y_SVR1_predvl)
SVR1_df=pd.DataFrame({'Method':['SVR1'],'Val Score':SVR1_vlscore,'RMSE_vl': SVR1_vlRMSE, 'MSE_vl': SVR1_vlMSE, 'M

AE_vl': SVR1_vlMAE,'train Score':SVR1_trscore,'RMSE_tr': SVR1_trRMSE, 'MSE_tr': SVR1_trMSE, 'MAE_tr': SVR1_trMAE}
)
Compa_df = pd.concat([Compa_df, SVR1_df])
Compa_df
Out[150]:
0 knn1 0.425008 196935.451160 3.878357e+10 138494.383286 0.998628 9480.192071 8.987404e+07 887.708707
0 SVR1 -0.055489 266820.956555 7.119342e+10 183639.593215 -0.046405 261802.341726 6.854047e+10 179434.350170
The above negative scores in SVR model is due to non-learning of the model in the training set which results in non-performance in validation
set
In [151]:
SVR2 = SVR(gamma='auto',C=0.1,kernel='linear')
SVR2.fit(X_train, y_train)
y_SVR2_predtr= SVR2.predict(X_train)
y_SVR2_predvl= SVR2.predict(X_val)

SVR2_trscore=r2_score(y_train,y_SVR2_predtr)
SVR2_trRMSE=np.sqrt(mean_squared_error(y_train, y_SVR2_predtr))
SVR2_trMSE=mean_squared_error(y_train, y_SVR2_predtr)
SVR2_trMAE=mean_absolute_error(y_train, y_SVR2_predtr)
SVR2_vlscore=r2_score(y_val,y_SVR2_predvl)
SVR2_vlRMSE=np.sqrt(mean_squared_error(y_val, y_SVR2_predvl))
SVR2_vlMSE=mean_squared_error(y_val, y_SVR2_predvl)
SVR2_vlMAE=mean_absolute_error(y_val, y_SVR2_predvl)
SVR2_df=pd.DataFrame({'Method':['SVR2'],'Val Score':SVR2_vlscore,'RMSE_vl': SVR2_vlRMSE, 'MSE_vl': SVR2_vlMSE, 'M

AE_vl': SVR2_vlMAE,'train Score':SVR2_trscore,'RMSE_tr': SVR2_trRMSE, 'MSE_tr': SVR2_trMSE, 'MAE_tr': SVR2_trMAE}
)
Compa_df = pd.concat([Compa_df, SVR2_df])
Compa_df
Out[151]:
0 knn1 0.425008 196935.451160 3.878357e+10 138494.383286 0.998628 9480.192071 8.987404e+07 887.708707
0 SVR1 -0.055489 266820.956555 7.119342e+10 183639.593215 -0.046405 261802.341726 6.854047e+10 179434.350170
0 SVR2 0.458252 191157.623415 3.654124e+10 132876.663665 0.454410 189041.408746 3.573665e+10 130250.868504
The SVR model with modified parameters has not performed well with just ~0.45 in both training and validation data sets
Decision Tree Regressor
In [152]:
from sklearn.tree import DecisionTreeRegressor

In [153]:
DT1 = DecisionTreeRegressor()
DT1.fit(X_train, y_train)
y_DT1_predtr= DT1.predict(X_train)
y_DT1_predvl= DT1.predict(X_val)

DT1_trscore=r2_score(y_train,y_DT1_predtr)
DT1_trRMSE=np.sqrt(mean_squared_error(y_train, y_DT1_predtr))
DT1_trMSE=mean_squared_error(y_train, y_DT1_predtr)
DT1_trMAE=mean_absolute_error(y_train, y_DT1_predtr)
DT1_vlscore=r2_score(y_val,y_DT1_predvl)
DT1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_DT1_predvl))
DT1_vlMSE=mean_squared_error(y_val, y_DT1_predvl)
DT1_vlMAE=mean_absolute_error(y_val, y_DT1_predvl)
DT1_df=pd.DataFrame({'Method':['DT1'],'Val Score':DT1_vlscore,'RMSE_vl': DT1_vlRMSE, 'MSE_vl': DT1_vlMSE, 'MAE_vl

': DT1_vlMAE,'train Score':DT1_trscore,'RMSE_tr': DT1_trRMSE, 'MSE_tr': DT1_trMSE, 'MAE_tr': DT1_trMAE})
Compa_df = pd.concat([Compa_df, DT1_df])
Compa_df
Out[153]:
0 knn1 0.425008 196935.451160 3.878357e+10 138494.383286 0.998628 9480.192071 8.987404e+07 887.708707
0 SVR1 -0.055489 266820.956555 7.119342e+10 183639.593215 -0.046405 261802.341726 6.854047e+10 179434.350170
0 SVR2 0.458252 191157.623415 3.654124e+10 132876.663665 0.454410 189041.408746 3.573665e+10 130250.868504
0 DT1 0.542495 175667.376246 3.085903e+10 109891.238551 0.998628 9480.192071 8.987404e+07 887.708707
Above performance of initial Decision tree model shows overfit in training set with 0.99 score and low performance in validation set
In [154]:
DT2 = DecisionTreeRegressor(max_depth=10,min_samples_leaf=5)
DT2.fit(X_train, y_train)
y_DT2_predtr= DT2.predict(X_train)
y_DT2_predvl= DT2.predict(X_val)

DT2_trscore=r2_score(y_train,y_DT2_predtr)
DT2_trRMSE=np.sqrt(mean_squared_error(y_train, y_DT2_predtr))
DT2_trMSE=mean_squared_error(y_train, y_DT2_predtr)
DT2_trMAE=mean_absolute_error(y_train, y_DT2_predtr)
DT2_vlscore=r2_score(y_val,y_DT2_predvl)
DT2_vlRMSE=np.sqrt(mean_squared_error(y_val, y_DT2_predvl))
DT2_vlMSE=mean_squared_error(y_val, y_DT2_predvl)
DT2_vlMAE=mean_absolute_error(y_val, y_DT2_predvl)
DT2_df=pd.DataFrame({'Method':['DT2'],'Val Score':DT2_vlscore,'RMSE_vl': DT2_vlRMSE, 'MSE_vl': DT2_vlMSE, 'MAE_vl

': DT2_vlMAE,'train Score':DT2_trscore,'RMSE_tr': DT2_trRMSE, 'MSE_tr': DT2_trMSE, 'MAE_tr': DT2_trMAE})
Compa_df = pd.concat([Compa_df, DT2_df])
Compa_df
Out[154]:
0 knn1 0.425008 196935.451160 3.878357e+10 138494.383286 0.998628 9480.192071 8.987404e+07 887.708707
0 SVR1 -0.055489 266820.956555 7.119342e+10 183639.593215 -0.046405 261802.341726 6.854047e+10 179434.350170
0 SVR2 0.458252 191157.623415 3.654124e+10 132876.663665 0.454410 189041.408746 3.573665e+10 130250.868504
0 DT1 0.542495 175667.376246 3.085903e+10 109891.238551 0.998628 9480.192071 8.987404e+07 887.708707
0 DT2 0.637513 156364.920550 2.444999e+10 102458.587308 0.794647 115977.718333 1.345083e+10 82537.840190
Above decision tree model with modified parameter has better performed on the training set and validation set compared to initial decision tree
model.But overall decision tree has not performed well than linear regression models.
In [155]:
sns.jointplot(x=y_val, y=y_DT2_predvl, kind="reg", color="k")
In summary, KNN regressor model and decision tree models have not performed well in comparison with linear regression models
Ensemble techniques
Boosting and Bagging
In [156]:
from sklearn.ensemble import GradientBoostingRegressor, BaggingRegressor
In [157]:
GB1=GradientBoostingRegressor(n_estimators = 200, learning_rate = 0.1, random_state=22)

GB1.fit(X_train, y_train)
y_GB1_predtr= GB1.predict(X_train)
y_GB1_predvl= GB1.predict(X_val)

GB1_trscore=r2_score(y_train,y_GB1_predtr)
GB1_trRMSE=np.sqrt(mean_squared_error(y_train, y_GB1_predtr))
GB1_trMSE=mean_squared_error(y_train, y_GB1_predtr)
GB1_trMAE=mean_absolute_error(y_train, y_GB1_predtr)
GB1_vlscore=r2_score(y_val,y_GB1_predvl)
GB1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_GB1_predvl))
GB1_vlMSE=mean_squared_error(y_val, y_GB1_predvl)
GB1_vlMAE=mean_absolute_error(y_val, y_GB1_predvl)
GB1_df=pd.DataFrame({'Method':['GB1'],'Val Score':GB1_vlscore,'RMSE_vl': GB1_vlRMSE, 'MSE_vl': GB1_vlMSE, 'MAE_vl

': GB1_vlMAE,'train Score':GB1_trscore,'RMSE_tr': GB1_trRMSE, 'MSE_tr': GB1_trMSE, 'MAE_tr': GB1_trMAE})
Compa_df = pd.concat([Compa_df, GB1_df])
Compa_df
Out[157]:
0 knn1 0.425008 196935.451160 3.878357e+10 138494.383286 0.998628 9480.192071 8.987404e+07 887.708707
0 SVR1 -0.055489 266820.956555 7.119342e+10 183639.593215 -0.046405 261802.341726 6.854047e+10 179434.350170
0 SVR2 0.458252 191157.623415 3.654124e+10 132876.663665 0.454410 189041.408746 3.573665e+10 130250.868504
0 DT1 0.542495 175667.376246 3.085903e+10 109891.238551 0.998628 9480.192071 8.987404e+07 887.708707
0 DT2 0.637513 156364.920550 2.444999e+10 102458.587308 0.794647 115977.718333 1.345083e+10 82537.840190
0 GB1 0.782471 121129.989228 1.467247e+10 82824.319932 0.820821 108334.766538 1.173642e+10 76533.619644
Gradient boosting model has provided good scores in both training and validation sets
In [158]:
BGG1=BaggingRegressor(n_estimators=50, oob_score= True,random_state=14)

BGG1.fit(X_train, y_train)
y_BGG1_predtr= BGG1.predict(X_train)
y_BGG1_predvl= BGG1.predict(X_val)

BGG1_trscore=r2_score(y_train,y_BGG1_predtr)
BGG1_trRMSE=np.sqrt(mean_squared_error(y_train, y_BGG1_predtr))
BGG1_trMSE=mean_squared_error(y_train, y_BGG1_predtr)
BGG1_trMAE=mean_absolute_error(y_train, y_BGG1_predtr)
BGG1_vlscore=r2_score(y_val,y_BGG1_predvl)
BGG1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_BGG1_predvl))
BGG1_vlMSE=mean_squared_error(y_val, y_BGG1_predvl)
BGG1_vlMAE=mean_absolute_error(y_val, y_BGG1_predvl)
BGG1_df=pd.DataFrame({'Method':['BGG1'],'Val Score':BGG1_vlscore,'RMSE_vl': BGG1_vlRMSE, 'MSE_vl':BGG1_vlMSE, 'MA

E_vl': BGG1_vlMAE,'train Score':BGG1_trscore,'RMSE_tr': BGG1_trRMSE, 'MSE_tr': BGG1_trMSE, 'MAE_tr': BGG1_trMAE})
Compa_df = pd.concat([Compa_df, BGG1_df])
Compa_df
Out[158]:
0 knn1 0.425008 196935.451160 3.878357e+10 138494.383286 0.998628 9480.192071 8.987404e+07 887.708707
0 SVR1 -0.055489 266820.956555 7.119342e+10 183639.593215 -0.046405 261802.341726 6.854047e+10 179434.350170
0 SVR2 0.458252 191157.623415 3.654124e+10 132876.663665 0.454410 189041.408746 3.573665e+10 130250.868504
0 DT1 0.542495 175667.376246 3.085903e+10 109891.238551 0.998628 9480.192071 8.987404e+07 887.708707
0 DT2 0.637513 156364.920550 2.444999e+10 102458.587308 0.794647 115977.718333 1.345083e+10 82537.840190
0 GB1 0.782471 121129.989228 1.467247e+10 82824.319932 0.820821 108334.766538 1.173642e+10 76533.619644
0 BGG1 0.769319 124738.101557 1.555959e+10 80102.360544 0.966466 46867.181534 2.196533e+09 29441.780117
Bagging model also performed well in training and validation sets.There seems to be overfitting in training set. We need to analyse further by
hypertuning
Random forest
In [159]:
from sklearn.ensemble import RandomForestRegressor

In [160]:
RF1=RandomForestRegressor()
RF1.fit(X_train, y_train)
y_RF1_predtr= RF1.predict(X_train)
y_RF1_predvl= RF1.predict(X_val)

RF1_trscore=r2_score(y_train,y_RF1_predtr)
RF1_trRMSE=np.sqrt(mean_squared_error(y_train, y_RF1_predtr))
RF1_trMSE=mean_squared_error(y_train, y_RF1_predtr)
RF1_trMAE=mean_absolute_error(y_train, y_RF1_predtr)
RF1_vlscore=r2_score(y_val,y_RF1_predvl)
RF1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_RF1_predvl))
RF1_vlMSE=mean_squared_error(y_val, y_RF1_predvl)
RF1_vlMAE=mean_absolute_error(y_val, y_RF1_predvl)
RF1_df=pd.DataFrame({'Method':['RF1'],'Val Score':RF1_vlscore,'RMSE_vl': RF1_vlRMSE, 'MSE_vl':RF1_vlMSE, 'MAE_vl'

: RF1_vlMAE,'train Score':RF1_trscore,'RMSE_tr': RF1_trRMSE, 'MSE_tr': RF1_trMSE, 'MAE_tr': RF1_trMAE})
Compa_df = pd.concat([Compa_df, RF1_df])
Compa_df
Out[160]:
0 knn1 0.425008 196935.451160 3.878357e+10 138494.383286 0.998628 9480.192071 8.987404e+07 887.708707
0 SVR1 -0.055489 266820.956555 7.119342e+10 183639.593215 -0.046405 261802.341726 6.854047e+10 179434.350170
0 SVR2 0.458252 191157.623415 3.654124e+10 132876.663665 0.454410 189041.408746 3.573665e+10 130250.868504
0 DT1 0.542495 175667.376246 3.085903e+10 109891.238551 0.998628 9480.192071 8.987404e+07 887.708707
0 DT2 0.637513 156364.920550 2.444999e+10 102458.587308 0.794647 115977.718333 1.345083e+10 82537.840190
0 GB1 0.782471 121129.989228 1.467247e+10 82824.319932 0.820821 108334.766538 1.173642e+10 76533.619644
0 BGG1 0.769319 124738.101557 1.555959e+10 80102.360544 0.966466 46867.181534 2.196533e+09 29441.780117
0 RF1 0.754483 128686.871977 1.656031e+10 82901.717082 0.954362 54674.891629 2.989344e+09 33099.588581
Random forest model has performed well in training and validation set. There is scope of further analysis on this model
Enseble models: in summary ensemble models have performed well on training and validation sets. These models will be selected for further
analysis with hypertuning and feature selection
In [161]:
#feature importance
rf_imp_feature_1=pd.DataFrame(RF1.feature_importances_, columns = ["Imp"], index = X_val.columns)
rf_imp_feature_1.sort_values(by="Imp",ascending=False)
rf_imp_feature_1['Imp'] = rf_imp_feature_1['Imp'].map('{0:.5f}'.format)
rf_imp_feature_1=rf_imp_feature_1.sort_values(by="Imp",ascending=False)
rf_imp_feature_1.Imp=rf_imp_feature_1.Imp.astype("float")
rf_imp_feature_1[:30].plot.bar(figsize=(plotSizeX, plotSizeY))
#First 20 features have an importance of 90.5% and first 30 have importance of 95.15
print("First 20 feature importance:\t",(rf_imp_feature_1[:20].sum())*100)
print("First 30 feature importance:\t",(rf_imp_feature_1[:30].sum())*100)
First 20 feature importance: Imp 90.184

dtype: float64
dtype: float64
Above are top 30 important features that account for 95% of variation in model. This need to be further analysed during hypertuning of the
models for better scores
Model performance Summary:
Ensemble methods are performing better than linear models. Of all the ensemble models, Gradient boosting regressor is giving better R2
score. we identified top 30 features that are explaining the 95% variation in model(Random Forest). Will further hypertune the model to improve
the model performance. Will further explore and evaluate the features while hyperturning the ensemble models
Building Function/Pipeline for models

In [162]:
rf_imp_feature_1[:30]
Out[162]:
Imp
furnished_1 0.28448
yr_built 0.14227
living_measure 0.09463
living_measure15 0.06691
quality_8 0.05062
HouseLandRatio 0.04008
lot_measure15 0.03731
City_Bellevue 0.02532
ceil_measure 0.02459
quality_9 0.02049
total_area 0.01527
lot_measure 0.01319
City_Seattle 0.01268
City_Kirkland 0.01245
City_Federal Way 0.01224
City_Kent 0.01089
City_Mercer Island 0.01047
sight_4 0.00945
quality_7 0.00942
basement 0.00908
City_Redmond 0.00830
coast_1 0.00648
City_Medina 0.00556
quality_10 0.00545
City_Renton 0.00521
room_bed_4 0.00393
City_Maple Valley 0.00388
City_Sammamish 0.00379
sight_3 0.00351
City_Issaquah 0.00303
In [163]:
from sklearn.pipeline import Pipeline
In [164]:
def result (model,pipe_model,X_train_set,y_train_set,X_val_set,y_val_set):

pipe_model.fit(X_train_set,y_train_set)
y_train_predict= pipe_model.predict(X_train_set)
y_val_predict= pipe_model.predict(X_val_set)
trscore=r2_score(y_train_set,y_train_predict)
trRMSE=np.sqrt(mean_squared_error(y_train_set,y_train_predict))
trMSE=mean_squared_error(y_train_set,y_train_predict)
trMAE=mean_absolute_error(y_train_set,y_train_predict)
vlscore=r2_score(y_val,y_val_predict)
vlRMSE=np.sqrt(mean_squared_error(y_val,y_val_predict))
vlMSE=mean_squared_error(y_val,y_val_predict)
vlMAE=mean_absolute_error(y_val,y_val_predict)
result_df=pd.DataFrame({'Method':[model],'val score':vlscore,'RMSE_val':vlRMSE,'MSE_val':vlMSE,'MSE_vl': vlMS
E,
'train Score':trscore,'RMSE_tr': trRMSE,'MSE_tr': trMSE, 'MAE_tr': trMAE})
return result_df
Above function will run the model and return the r2 score,rmse,mse of the model
In [165]:
#Creating empty dataframe to capture results

result_dff=pd.DataFrame()
pipe_LR = Pipeline([('LR', LinearRegression())])
result_dff=pd.concat([result_dff,result('LR',pipe_LR,X_train,y_train,X_val,y_val)])
pipe_knr = Pipeline([('KNNR', KNeighborsRegressor(n_neighbors=4,weights='distance'))])

result_dff=pd.concat([result_dff,result('KNNR',pipe_knr,X_train,y_train,X_val,y_val)])
pipe_DTR = Pipeline([('DTR', DecisionTreeRegressor())])

result_dff=pd.concat([result_dff,result('DTR',pipe_DTR,X_train,y_train,X_val,y_val)])
pipe_GBR = Pipeline([('GBR', GradientBoostingRegressor(n_estimators = 200, learning_rate = 0.1, random_state=22))

])
result_dff=pd.concat([result_dff,result('GBR',pipe_GBR,X_train,y_train,X_val,y_val)])
pipe_BGR = Pipeline([('BGR', BaggingRegressor(n_estimators=50, oob_score= True,random_state=14))])

result_dff=pd.concat([result_dff,result('BGR',pipe_BGR,X_train,y_train,X_val,y_val)])
pipe_RFR = Pipeline([('RFR', RandomForestRegressor())])

result_dff=pd.concat([result_dff,result('RFR',pipe_RFR,X_train,y_train,X_val,y_val)])
result_dff
Out[165]:
Method val score RMSE_val MSE_val MSE_vl train Score RMSE_tr MSE_tr MAE_tr
0 LR 0.718749 137733.698415 1.897057e+10 1.897057e+10 0.730112 132958.367261 1.767793e+10 92391.001786
0 KNNR 0.425008 196935.451160 3.878357e+10 3.878357e+10 0.998628 9480.192071 8.987404e+07 887.708707
0 DTR 0.537219 176677.375867 3.121490e+10 3.121490e+10 0.998628 9480.192071 8.987404e+07 887.708707
0 GBR 0.782471 121129.989228 1.467247e+10 1.467247e+10 0.820821 108334.766538 1.173642e+10 76533.619644
0 BGR 0.769319 124738.101557 1.555959e+10 1.555959e+10 0.966466 46867.181534 2.196533e+09 29441.780117
0 RFR 0.757473 127900.773592 1.635861e+10 1.635861e+10 0.955380 54061.682258 2.922665e+09 32834.525684
Above sequence of steps with pipeline function will run all the models and compile the scores in result_dff dataframe. We can see that the
above 2 steps are concise instead of running individual models and compiling the scores as earlier.
We can clearly see gradient boosting is giving better result in comparison with other ensemble methods. Also the score of 0.82 on training set
indicates no overfitting of the model
In [166]:
#Storing results of initial data set - dff
result_ds1=result_dff.copy()
result_ds1
Out[166]:
Method val score RMSE_val MSE_val MSE_vl train Score RMSE_tr MSE_tr MAE_tr
0 LR 0.718749 137733.698415 1.897057e+10 1.897057e+10 0.730112 132958.367261 1.767793e+10 92391.001786
0 KNNR 0.425008 196935.451160 3.878357e+10 3.878357e+10 0.998628 9480.192071 8.987404e+07 887.708707
0 DTR 0.537219 176677.375867 3.121490e+10 3.121490e+10 0.998628 9480.192071 8.987404e+07 887.708707
0 GBR 0.782471 121129.989228 1.467247e+10 1.467247e+10 0.820821 108334.766538 1.173642e+10 76533.619644
0 BGR 0.769319 124738.101557 1.555959e+10 1.555959e+10 0.966466 46867.181534 2.196533e+09 29441.780117
0 RFR 0.757473 127900.773592 1.635861e+10 1.635861e+10 0.955380 54061.682258 2.922665e+09 32834.525684
FEATURE SELECTION (PCA)
Now, we will explore the possibility of features reduction using PCA
In [167]:
dff.shape
Out[167]:
(18287, 91)
In [168]:
dff.columns
Out[168]:
Index(['price', 'living_measure', 'lot_measure', 'ceil_measure', 'basement',

'yr_built', 'living_measure15', 'lot_measure15', 'total_area',
'HouseLandRatio', 'room_bed_1', 'room_bed_2', 'room_bed_3',
'room_bed_4', 'room_bed_5', 'room_bed_6', 'room_bed_7', 'room_bed_8',
'room_bed_9', 'room_bed_10', 'room_bed_11', 'room_bath_0.5',
'ceil_1.5', 'ceil_2.0', 'ceil_2.5', 'ceil_3.0', 'ceil_3.5', 'coast_1',
'sight_1', 'sight_2', 'sight_3', 'sight_4', 'condition_2',
'condition_3', 'condition_4', 'condition_5', 'quality_4', 'quality_5',
'quality_11', 'quality_12', 'furnished_1', 'City_Bellevue',
'City_Black Diamond', 'City_Bothell', 'City_Carnation', 'City_Duvall',
'City_Enumclaw', 'City_Fall City', 'City_Federal Way', 'City_Issaquah',
'City_Kenmore', 'City_Kent', 'City_Kirkland', 'City_Maple Valley',
'City_Medina', 'City_Mercer Island', 'City_North Bend', 'City_Redmond',
'City_Renton', 'City_Sammamish', 'City_Seattle', 'City_Snoqualmie',
'City_Vashon', 'City_Woodinville', 'has_basement_Yes',
'has_renovated_Yes'],
dtype='object')
will drop the price column as it is the target variable
In [169]:
df_pca = dff.drop(['price'], axis = 1)
In [170]:
numerical_cols = df_pca.copy()
numerical_cols.shape
Out[170]:
(18287, 90)
In [171]:
# Let's first transform the entire X (independent variable data) to zscores.

# We will create the PCA dimensions on this distribution.
from scipy.stats import zscore
# As PCA for Independent columns of Numerical types, let's pass numerical_cols (16 numerical features)
numerical_cols = numerical_cols.apply(zscore)
cov_matrix = np.cov(numerical_cols.T)
print('Covariance Matrix \n%s', cov_matrix)
Covariance Matrix
%s [[ 1.00005469 0.20028185 0.84597846 ... 0.01415428 0.20094885
0.05257785]
[ 0.20028185 1.00005469 0.1663024 ... 0.08035946 -0.02988448
-0.00617414]
[ 0.84597846 0.1663024 1.00005469 ... 0.01649371 -0.27730605
0.01739462]
...
[ 0.01415428 0.08035946 0.01649371 ... 1.00005469 -0.0056238
-0.01445085]
[ 0.20094885 -0.02988448 -0.27730605 ... -0.0056238 1.00005469
0.04524435]
[ 0.05257785 -0.00617414 0.01739462 ... -0.01445085 0.04524435
1.00005469]]
As we can see, near the value to 1, more the features related.

In [172]:
eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)

print('Eigen Vectors \n%s', eigenvectors)
print('\n Eigen Values \n%s', eigenvalues)
Eigen Vectors
%s [[ 3.38140157e-01 -5.91272225e-02 2.10933458e-01 ... -5.27282174e-03
5.54142192e-03 -1.67124034e-04]
[ 7.12659835e-02 -4.34121260e-01 -8.88436080e-02 ... -1.68818774e-02
4.57078107e-03 -8.52995967e-03]
[ 3.49772357e-01 -8.00383876e-03 -4.05156781e-02 ... -4.68496505e-03
-1.39683527e-02 2.43243338e-03]
...
[ 1.29688720e-02 -3.80398560e-02 -3.41813657e-02 ... 1.07174062e-01
8.41737918e-02 -1.95413367e-01]
[-2.50075399e-02 -3.74622282e-02 4.39476539e-01 ... 3.99982110e-03
4.08181763e-02 2.32617269e-02]
[-3.18537004e-03 -6.23000043e-04 1.02661626e-01 ... 2.84579669e-02
-1.47963772e-02 -6.62692406e-02]]
Eigen Values
%s [ 6.40030103e+00 4.23053272e+00 3.02200570e+00 2.36069955e+00
1.72278028e+00 1.70533047e+00 5.17634008e-02 7.84864255e-02
1.23323929e-01 1.58239483e+00 1.94704947e-01 2.10588552e-01
2.45372409e-01 3.37764061e-01 3.52383334e-01 2.24756725e-03
9.93351422e-04 1.28503648e-04 8.54683326e-05 1.51669793e+00
3.97816689e-01 1.48400510e+00 4.25049450e-01 -5.20329656e-16
-1.83229560e-15 3.57406920e-15 1.39212554e+00 1.33812387e+00
5.71411667e-01 6.48215227e-01 6.60453404e-01 1.27455883e+00
6.90208644e-01 7.30900855e-01 1.22358633e+00 1.21781188e+00
7.54916613e-01 7.61951753e-01 7.89272221e-01 1.19439921e+00
1.18354682e+00 8.08765828e-01 8.31761100e-01 1.17521503e+00
1.16073113e+00 8.62975337e-01 1.14847039e+00 8.79158894e-01
1.11948938e+00 1.10960276e+00 8.90644524e-01 8.88567656e-01
9.01761603e-01 1.10493861e+00 9.16012433e-01 9.31041146e-01
1.09143428e+00 1.08460485e+00 1.08273453e+00 1.07118893e+00
9.33793856e-01 1.06368893e+00 9.41694315e-01 9.44273389e-01
9.49801385e-01 9.52927340e-01 1.05455290e+00 1.04955645e+00
1.04815072e+00 1.04163633e+00 9.69808694e-01 1.03813696e+00
1.03345195e+00 1.02768165e+00 1.02381893e+00 9.82887562e-01
9.81254198e-01 9.86986081e-01 1.01521697e+00 9.89390243e-01
9.93680575e-01 9.93261992e-01 1.00195909e+00 1.01363137e+00
1.01189827e+00 1.00051625e+00 1.00419597e+00 1.00622516e+00
1.00552678e+00 1.00928050e+00]
In [173]:
# Let's Sort eigenvalues in descending order
# Make a set of (eigenvalue, eigenvector) pairs

eig_pairs = [(eigenvalues[index], eigenvectors[:,index]) for index in range(len(eigenvalues))]
# Sort the (eigenvalue, eigenvector) pairs from highest to lowest with respect to eigenvalue
eig_pairs.sort()
eig_pairs.reverse()
print(eig_pairs)
# Extract the descending ordered eigenvalues and eigenvectors

eigvalues_sorted = [eig_pairs[index][0] for index in range(len(eigenvalues))]
eigvectors_sorted = [eig_pairs[index][1] for index in range(len(eigenvalues))]
# Let's confirm our sorting worked, print out eigenvalues

print('Eigenvalues in descending order: \n%s' %eigvalues_sorted)
[(6.400301029851477, array([ 3.38140157e-01, 7.12659835e-02, 3.49772357e-01, 1.21826278e-02,

2.45688226e-01, 3.13509594e-01, 5.85711636e-02, 1.32118243e-01,
1.24188241e-01, -5.39557358e-02, -1.55322767e-01, -7.38082454e-02,
1.69280354e-01, 6.71341866e-02, 1.76807150e-02, 8.40382554e-03,
2.25638942e-03, 3.61933268e-03, 1.42331836e-03, 5.17260476e-04,
-5.63803984e-03, -2.85596080e-02, -2.20914937e-01, -2.56073225e-03,
-6.59873013e-02, -6.94849495e-02, -5.10691490e-02, 2.99009064e-02,
1.96828062e-01, 7.27758185e-02, 4.47037234e-02, 6.43544191e-02,
8.98255374e-02, 3.57441062e-02, 2.39042699e-02, 1.82629160e-02,
2.35903040e-02, 9.52118613e-03, 6.36445584e-03, 6.71514053e-03,
4.22319789e-03, -8.25472478e-02, 2.65712150e-01, 2.24314600e-02,
-1.44008864e-03, -1.59380115e-04, 1.36351362e-02, 1.21584859e-02,
2.79651538e-02, 3.46589095e-02, 2.87868177e-02, -3.59443998e-02,
1.23735329e-01, -9.23789546e-02, -5.48045043e-02, -2.04647878e-02,
-5.44353495e-02, -1.55877412e-01, -1.66909198e-01, 9.78807036e-02,
2.09856238e-01, 1.36794862e-01, 5.65981168e-02, 8.75831765e-03,
2.64286457e-01, 4.83149865e-02, -8.63848741e-03, 2.05515336e-02,
-1.04425520e-02, 2.46487074e-02, -1.36322635e-02, -8.34913899e-03,
5.58571616e-03, 5.97895799e-02, 1.67219076e-02, 1.26648246e-02,
1.73286710e-02, 4.02906127e-02, 1.35579146e-02, 4.06324757e-02,
3.20240017e-03, 5.54293286e-02, 2.79650623e-02, 1.07319938e-01,
-1.80305747e-01, 5.49209909e-02, -5.88852015e-03, 1.29688720e-02,
-2.50075399e-02, -3.18537004e-03])), (4.23053271502051, array([-0.05912722, -0.43412126, -0.0
0800384, -0.09492954, 0.13056396,
-0.09447739, -0.370925 , -0.42105132, 0.38853903, 0.00818031,
0.09903886, -0.00322975, -0.05039931, -0.0397953 , -0.01135684,
-0.00195537, 0.00187349, 0.00561416, -0.00403777, 0.00329149,
0.00403127, -0.01266106, -0.02264596, 0.00982359, 0.00641089,
-0.1047631 , -0.01951081, -0.01220764, 0.1019734 , -0.03086617,
0.0212307 , 0.04710655, 0.04081646, -0.00117921, 0.00207246,
0.00066914, -0.00061282, -0.00468101, -0.00095167, 0.00073386,
0.0008508 , -0.00237293, 0.15398148, 0.03143851, 0.20703993,
0.0235899 , -0.04163645, -0.02784399, -0.01428115, -0.01787219,
-0.04086905, -0.02070046, 0.21897162, -0.20078997, -0.05272298,
-0.00838391, -0.0126264 , -0.02109097, -0.07477947, 0.10406576,
0.01781302, -0.03510238, -0.02017662, -0.00081656, -0.00543061,
-0.12229314, -0.02405586, -0.01865954, -0.03518784, -0.00616237,
-0.02599962, -0.02240527, -0.06022892, 0.0647382 , -0.03107846,
-0.03563074, -0.05546802, 0.03231613, -0.02937766, -0.07517532,
-0.03530176, 0.00217373, -0.04298141, -0.03445572, 0.1795959 ,
0.03400426, -0.04414761, -0.03803986, -0.03746223, -0.000623 ])), (3.0220056962108997, arra
y([ 0.21093346, -0.08884361, -0.04051568, 0.45923331, -0.20948573,
0.06094638, -0.08769888, -0.04338803, 0.18999279, -0.03429171,
-0.04303535, -0.14081992, 0.09630094, 0.14413304, 0.08075213,
0.03087204, 0.0216305 , 0.02874812, 0.01352604, 0.01099741,
-0.00046969, -0.0253926 , -0.12810462, 0.00801375, -0.00546132,
0.08028745, 0.05174082, 0.02460706, -0.17141934, 0.10063382,
0.10513547, 0.08968952, 0.09751819, 0.04988048, 0.04180795,
0.0360652 , 0.04631527, 0.01510422, 0.00364895, 0.00749181,
0.00784258, 0.14066273, -0.13476605, 0.0508906 , 0.01324093,
0.00143509, 0.04433479, 0.0745123 , 0.11139792, 0.10282392,
0.08280422, -0.02017733, -0.15120561, 0.08089571, 0.1413166 ,
-0.03014322, -0.04765138, -0.07345488, -0.00271769, 0.02248996,
0.03418131, 0.03990794, 0.02663322, 0.01611412, 0.05546425,
0.03544567, -0.03015266, -0.01667398, -0.0401713 , -0.05250393,
-0.04129285, -0.02118915, -0.04979088, -0.04044887, -0.021319 ,
-0.10458144, -0.01955768, -0.10198589, 0.00537009, 0.0366488 ,
-0.056401 , -0.05468041, -0.08192034, -0.05254563, 0.26984296,
-0.05563353, -0.00285831, -0.03418137, 0.43947654, 0.10266163])), (2.3606995503606907, arra
y([-0.00720377, 0.00224618, 0.07573216, -0.14547718, -0.20077718,
0.05062147, 0.01439989, 0.00073809, -0.07253922, 0.14716631,
0.26402065, -0.27661743, 0.05624118, 0.02404274, 0.00526146,
0.01216159, -0.00804137, 0.00257074, -0.00579735, 0.0020183 ,
0.01828875, 0.09307033, 0.32371789, 0.01903027, -0.05286308,
-0.17577786, -0.06178217, -0.14789745, -0.06749212, 0.02029172,
0.01273173, 0.05388027, 0.072458 , 0.05551253, 0.03184281,
0.03028663, 0.02775791, 0.0149579 , -0.00807084, 0.00573534,
0.00706833, 0.10759155, 0.00949538, 0.0310457 , -0.05454233,
-0.00721205, 0.10292123, 0.0113677 , 0.03206102, 0.04935804,
0.11456154, 0.0672181 , 0.02340023, -0.05185769, 0.01734833,
0.07277432, 0.13346116, 0.2975113 , -0.2129159 , -0.23715036,
0.24099786, 0.16208952, 0.06896703, 0.00896484, 0.30668776,
-0.03052305, 0.01273904, -0.03055353, 0.01560851, -0.02733405,
-0.00267958, 0.02118784, -0.08928568, -0.02238202, -0.04074563,
-0.10262501, -0.05898896, -0.06153782, 0.01533014, 0.04045805,
-0.03335646, -0.03231935, -0.03451797, 0.10004965, 0.16962266,
0.00513287, 0.03516863, -0.02673597, -0.16413819, 0.08959676])), (1.7227802802980245, arra
y([-2.75891943e-02, 6.22663979e-02, -5.87154230e-02, 5.16876256e-02,
4.84901490e-02, 1.72202394e-02, 1.03722364e-01, 5.34730987e-02,
2.21522986e-04, 1.75531777e-01, 7.48912126e-02, -2.40660821e-02,
-6.99276559e-02, -1.38633482e-02, 3.87175860e-02, 1.78451840e-02,
-6.46767863e-03, 1.31501431e-02, -1.18895907e-02, -2.37987660e-03,
2.57882460e-02, 2.03529963e-01, -2.22415917e-02, 4.43677778e-02,
-7.22222733e-02, -9.70041136e-03, -8.15912951e-02, 1.25976552e-01,
1.12053923e-02, -1.57737740e-02, 7.36220296e-03, 1.20741003e-02,
-1.82694555e-02, -3.03469013e-03, 2.37479559e-02, -1.61044922e-03,
6.17820745e-02, -5.39089932e-03, 1.58457297e-02, -9.48287976e-04,
2.06938171e-02, -1.19344878e-01, -1.05065660e-02, -8.78509029e-03,
1.16979339e-01, 1.86698328e-02, 3.82937199e-01, 2.08740410e-02,
1.00421405e-02, 3.70161361e-02, 3.50285374e-01, 4.35659169e-02,
8.36701642e-02, -5.79924748e-02, -6.94195526e-02, 1.50096333e-01,
1.05193372e-01, 8.15102561e-02, -3.43044936e-01, 4.55724591e-01,
-2.36634142e-01, -3.34803092e-02, 3.91109747e-02, -3.62507673e-03,
-2.18796507e-01, 2.77370246e-02, 1.42157352e-02, 2.65573237e-03,
5.98256022e-02, 3.65080662e-02, -7.91148709e-03, 3.45710951e-02,
4.41525866e-02, 3.01403507e-02, 2.24997611e-02, -6.09415475e-02,
-5.32191501e-02, -3.87528058e-02, 6.06045961e-03, 2.43959995e-02,
1.73155089e-02, 2.91906845e-02, -1.62743839e-02, -1.11980582e-02,
-5.78750293e-03, -2.82725675e-02, 1.74298356e-01, 4.41241036e-03,
5.27653264e-02, 8.14799322e-02])), (1.7053304684180373, array([-0.04304967, 0.03835547, -0
.049328 , 0.00682403, 0.06603838,
-0.00678647, 0.07094793, 0.0279412 , 0.01134168, -0.03214826,
-0.1074207 , 0.44510897, -0.38919037, -0.0100558 , -0.00243799,
-0.01259858, -0.00250479, 0.01319458, -0.0099384 , 0.00592628,
-0.02191463, -0.00796351, -0.02695272, 0.04413847, 0.07172762,
0.10503604, 0.00454758, -0.01146529, -0.12597964, -0.10725771,
-0.01547844, 0.12430449, 0.07892846, 0.04778133, 0.04080463,
0.0258032 , 0.044399 , 0.01198528, -0.00248636, 0.01644248,
-0.00839736, -0.10453813, -0.05859151, 0.02176298, 0.14312427,
-0.00625977, 0.25531019, 0.00308563, 0.02034035, 0.07405128,
0.25196789, -0.03486212, 0.21168428, -0.15449296, -0.11058998,
-0.02796998, -0.07348146, -0.09426926, 0.18688518, -0.28658732,
0.12769751, 0.15962844, 0.10054937, 0.0230538 , 0.21152038,
-0.03815552, 0.00385181, 0.01635949, 0.02444802, -0.0144477 ,
-0.00507272, 0.00141885, -0.00511901, -0.0331081 , -0.01035132,
-0.03981033, 0.05873867, -0.04542492, 0.01465507, 0.01155537,
0.03969387, -0.04650538, -0.12865868, 0.08653509, 0.08155747,
-0.01284676, 0.07491223, 0.04722783, 0.05148136, 0.08652505])), (1.5823948341725016, arra
y([-0.02579091, -0.10323643, 0.06233093, -0.15631637, -0.01142927,
0.04732042, -0.07678036, -0.10237832, 0.11412126, -0.06317361,
-0.05834357, 0.22674486, -0.12169029, -0.08095512, -0.09161384,
-0.00748711, -0.06773762, -0.05158664, 0.0104614 , -0.03208893,
-0.01127987, -0.07419635, -0.03121154, 0.04688368, 0.04296296,
0.01776535, -0.01940218, 0.10655432, 0.007984 , -0.06009444,
-0.08995488, 0.07290427, -0.04851162, -0.00111173, -0.03052541,
-0.00506415, -0.04276741, 0.01211364, -0.06485362, -0.03264406,
-0.01695773, 0.06618993, -0.01794634, 0.07672332, 0.18427271,
0.00645225, 0.1060034 , -0.02057187, -0.0020446 , 0.02221682,
0.11947425, -0.01927545, -0.48625332, 0.48400686, 0.0765737 ,
-0.07828707, 0.00855355, 0.04808957, -0.20662059, 0.08991486,
0.13986082, 0.02992472, 0.01678794, -0.00525954, 0.14152616,
0.17926009, 0.00538621, -0.05599342, -0.0716547 , -0.08164325,
0.02650792, -0.0351676 , 0.00165846, 0.09514562, -0.07842145,
0.03097681, -0.03501136, -0.0624873 , -0.00370757, 0.11001634,
-0.06511834, 0.00478484, 0.0232029 , -0.00802633, -0.05344581,
-0.03140204, 0.02211096, -0.04893219, -0.14940765, -0.12861492])), (1.5166979298471606, arra
y([ 0.03512953, -0.05998021, 0.10666813, -0.12151286, -0.0823675 ,
-0.01772739, -0.04739566, -0.04986913, 0.017378 , -0.07612813,
-0.06798112, -0.12944801, 0.22644149, -0.07273589, 0.06650317,
0.02817224, 0.02324206, 0.03927625, 0.01264389, 0.0187637 ,
-0.02279888, -0.08033275, -0.08179817, 0.02656226, 0.07572413,
-0.03952647, 0.13790474, -0.1450146 , 0.1294266 , 0.00847142,
0.01929373, -0.1169406 , -0.12399375, 0.02649847, 0.03841585,
-0.01188958, 0.08773861, -0.01351911, 0.0249923 , 0.01544068,
0.01212473, 0.23575178, 0.08854544, -0.02308546, -0.08859317,
-0.00819852, 0.41200847, -0.10110496, -0.11640363, -0.04605829,
0.39971337, -0.05494921, -0.07489579, -0.00541504, 0.15577549,
-0.12236071, -0.02983097, -0.15292934, 0.2874068 , -0.11035494,
-0.10694081, -0.04685742, 0.03030548, -0.03453339, -0.11326956,
-0.07205484, -0.02058402, -0.01707269, -0.05880667, 0.01668012,
0.0440469 , -0.03878011, -0.08398554, -0.07418455, -0.02799173,
0.10882041, -0.01572402, 0.13624518, -0.02676439, -0.05427701,
-0.01529219, -0.05176724, 0.05367334, -0.00107446, -0.00264562,
0.05796855, 0.11527288, -0.04911701, -0.15494206, 0.11725425])), (1.4840050960170488, arra
y([-0.01647521, -0.07834627, -0.03795622, 0.0359174 , 0.03689281,
-0.00897316, -0.04864121, -0.07710089, 0.05563915, 0.38743286,
-0.14582129, -0.01310922, 0.06974777, -0.01969408, -0.06385795,
-0.04212032, -0.00445327, -0.02754135, 0.01184245, -0.0039486 ,
0.02341597, 0.51007806, -0.16105521, 0.03972466, 0.0224396 ,
0.0429155 , 0.02903634, -0.02010313, 0.05042578, 0.00409658,
-0.02356482, 0.00522301, -0.00311867, -0.02191703, -0.04233229,
-0.01161743, -0.06236782, -0.0080576 , -0.02258844, -0.02122084,
-0.0362343 , -0.02652322, 0.03201913, -0.00857236, -0.02097777,
-0.01584929, 0.00656723, 0.01599246, -0.04447129, -0.00803429,
-0.01061251, 0.10817242, -0.12478379, 0.08557665, 0.0460519 ,
0.47181799, 0.16525984, -0.20280355, 0.17355343, -0.19177728,
0.1237945 , -0.04323163, -0.03597109, -0.00419303, 0.08125443,
-0.00628357, 0.04920573, -0.00522381, 0.17418589, -0.01709192,
0.00150704, 0.01570768, -0.04528673, 0.04077768, -0.0136485 ,
0.07800799, 0.04705179, 0.04622736, -0.02957357, -0.02562351,
-0.04611269, -0.03345367, 0.0315392 , -0.03166596, -0.06843772,
0.01994364, 0.02590641, -0.00907726, 0.03793356, -0.06581374])), (1.3921255385114626, arra
y([-7.44798782e-02, -2.04057215e-02, -1.70213024e-01, 1.59969471e-01,
1.16745291e-01, 1.22570505e-02, -8.60491578e-03, -3.35444803e-02,
-1.63747860e-02, -8.81691725e-02, 2.29829604e-01, -1.91980144e-01,
2.00935995e-01, -9.24163310e-02, -3.58916386e-01, -1.55938407e-01,
-8.01419058e-02, -1.67543214e-01, -1.58590080e-02, -3.33998688e-02,
1.42090660e-02, -1.01937834e-01, 3.97490694e-02, 3.11486801e-02,
-3.28015525e-02, 1.03256895e-01, -1.58237285e-01, 7.35541952e-02,
8.49211152e-02, 3.08494796e-02, -1.74544509e-01, -8.39287128e-03,
2.78487996e-02, -9.45944424e-02, -1.31726300e-01, -4.56532786e-02,
-2.77369029e-01, -1.65078423e-02, -1.47011482e-01, -1.20544364e-01,
-1.24713634e-01, -2.12675713e-01, -1.59140924e-06, -1.70492395e-01,
-4.71749996e-02, -2.64371816e-02, 1.93074103e-01, 5.10871246e-03,
-4.51392455e-02, -1.12503289e-02, 1.79238763e-01, -3.64441742e-02,
4.17966706e-02, 9.30520985e-03, -7.65341683e-02, -1.14372474e-01,
-3.05317310e-02, 3.57304046e-03, 6.67812385e-02, -1.18929855e-01,
1.15080118e-01, -2.72141011e-02, -6.35371089e-02, 1.76916211e-03,
7.65631431e-02, 1.08834391e-02, -3.97408678e-02, 9.80381325e-03,
-7.29636808e-02, -3.18072086e-02, -1.22301352e-01, -4.27613032e-02,
5.05198151e-02, 2.49743560e-02, 5.62910547e-02, 8.95831493e-03,
3.61783200e-02, 3.30025629e-02, -7.22496957e-02, -5.24225212e-02,
-3.43834525e-02, 3.51711601e-02, -1.93835937e-02, 3.89598662e-02,
-3.88440865e-02, -4.38094026e-03, 7.82669060e-02, 3.50383193e-02,
1.99411248e-01, -1.66581180e-01])), (1.338123871891159, array([-0.00548775, -0.03843159, -0.
03061153, 0.04333244, 0.13993639,
-0.03706519, -0.02074415, -0.03732315, 0.03333833, -0.01952747,
0.22987535, -0.21522097, -0.06384675, 0.08397588, 0.34627929,
-0.04155743, 0.15962213, 0.17622303, 0.03179456, -0.00631348,
0.01499474, -0.03142747, 0.07360389, 0.03405806, 0.05415617,
-0.04012176, -0.21879165, 0.10111445, -0.06312339, -0.02052279,
0.07695396, -0.03795474, 0.02675084, 0.09266905, 0.02364962,
0.00294061, 0.32037628, 0.04085849, 0.22018367, 0.15760833,
-0.07506903, -0.26341749, 0.045752 , 0.046454 , 0.0182172 ,
-0.00876758, 0.02950111, -0.07382824, -0.09627891, -0.03991851,
0.05040326, 0.00198497, -0.07373555, 0.24160928, -0.26190678,
-0.01717238, 0.05946142, -0.07260429, 0.15436931, -0.14236139,
-0.00075114, -0.02015509, 0.12668078, -0.02670258, 0.01202094,
0.07757646, -0.01062797, -0.00859107, -0.03864637, -0.04924234,
-0.12007782, -0.04770347, 0.01634402, 0.02039453, 0.00370991,
0.06774831, 0.07424346, 0.01216043, 0.06891506, -0.01504565,
-0.05261312, 0.05696704, 0.06306901, -0.0986121 , -0.09694006,
-0.02580626, 0.00048615, 0.025849 , 0.04712735, -0.18562838])), (1.2745588269563284, arra
y([-1.35554753e-03, -6.07757998e-03, 1.94705449e-02, -3.64873032e-02,
2.57177745e-02, -1.73766948e-03, -7.66767391e-03, -5.99583989e-03,
1.28081101e-02, 3.13125071e-03, 7.72361809e-02, -9.08219621e-02,
-3.02525331e-02, 1.04350856e-01, -1.07304336e-01, 5.79268206e-01,
-1.92585215e-02, -1.43456269e-01, 1.44049771e-02, 5.54833874e-03,
-3.02947376e-03, 1.53579737e-02, -1.61490155e-05, 1.64651561e-03,
1.70440407e-01, -9.39241300e-02, -1.00536529e-01, 1.19542486e-01,
-8.20623539e-02, 2.74726274e-02, -1.98189598e-02, -4.39066930e-02,
5.74034124e-02, 1.96783609e-02, 2.65304936e-01, 1.24948072e-01,
-2.00105962e-01, 4.30617383e-03, -4.64578459e-02, -2.17425881e-02,
4.94733434e-01, -2.54029837e-02, 4.39087477e-02, -1.04778047e-01,
2.20919184e-02, -5.46829160e-03, 8.31957137e-03, 1.71626353e-01,
-3.94012165e-02, -9.76566932e-02, 2.29359231e-02, -4.01148302e-02,
-2.68252725e-04, 1.11777672e-01, -1.67456052e-01, 3.17300476e-02,
-1.64373986e-02, -9.39181037e-02, 1.05514889e-01, -2.96055848e-02,
-5.97183383e-02, 7.01084344e-02, -5.40776784e-02, -2.26117483e-02,
-2.87330811e-02, 6.69012887e-02, -3.01012039e-02, -1.84687406e-02,
6.26595732e-04, 1.64181008e-02, -5.79387053e-02, -1.26763711e-02,
1.79376732e-02, -4.92679658e-02, 9.23604464e-03, 1.07068444e-02,
2.38723881e-02, 1.23594923e-02, 1.44704544e-02, 5.22613286e-03,
-2.06610793e-02, -5.53429277e-02, -2.80537402e-02, 3.68670983e-02,
7.42196857e-03, -9.91201404e-03, -2.55594189e-02, -2.38138885e-02,
-2.70204929e-02, 1.48195145e-02])), (1.2235863310684403, array([ 0.02162233, -0.02507778, 0
.00346684, 0.0337724 , 0.06339719,
-0.01546107, -0.00802168, -0.01952018, 0.01578492, -0.00374358,
-0.0030274 , 0.06604012, -0.29852924, 0.43570771, -0.01070749,
0.00227028, 0.27598502, -0.19355632, 0.01200504, 0.00137478,
0.01009588, -0.02943156, 0.04368344, -0.03455022, -0.09548836,
-0.02263805, -0.0043047 , -0.14906378, -0.03932275, 0.210181 ,
0.13741778, 0.03943821, 0.06420845, 0.01942264, 0.02616773,
-0.00049999, -0.2467183 , -0.06123914, 0.25875485, -0.07995135,
-0.04528393, -0.03659067, 0.03028853, -0.037632 , -0.06830891,
0.03123251, 0.08586152, -0.08303141, -0.02826377, -0.10886284,
0.07045707, -0.00895639, -0.05643817, -0.07005928, 0.21322098,
-0.07408769, 0.15301434, 0.04851492, -0.03619714, -0.00610947,
0.05533285, -0.0628817 , -0.20534149, -0.00580645, -0.02051046,
-0.03196354, 0.04860021, 0.05638867, -0.03977628, 0.01021369,
0.03052457, -0.01043534, -0.10746098, -0.01933172, -0.00225848,
-0.02649758, 0.02510083, -0.01685237, -0.10001644, -0.0289755 ,
0.01190971, 0.05171331, 0.29344975, -0.06493778, -0.10987834,
0.09982846, 0.07016599, -0.05809229, 0.00350068, -0.11836791])), (1.2178118832338365, arra
y([-1.82521939e-02, 3.98266974e-02, 4.37101951e-02, -1.09924289e-01,
3.81072863e-02, -1.04404282e-02, 5.66479026e-02, 3.40861744e-02,
3.19219948e-02, -2.82643180e-04, 4.60365840e-02, -1.36618844e-01,
5.69508913e-02, 1.79428208e-01, -6.80695887e-02, -2.57879383e-01,
4.86807903e-02, -1.58414203e-01, 1.91672236e-02, 1.04592425e-03,
-1.56431618e-02, 3.50710634e-02, -7.20908512e-02, 2.81295729e-02,
3.12644629e-01, -3.29212614e-01, 1.65765318e-01, 2.72942611e-01,
-2.49384356e-01, 1.43915743e-01, 7.70159238e-02, 3.60081867e-02,
-1.32977978e-02, 5.37143042e-02, -4.73669917e-02, -5.38248529e-02,
-1.66768399e-01, 3.17515143e-02, 2.62159366e-02, 5.40694050e-02,
-2.64704135e-01, 1.22058412e-01, -7.54748962e-02, -7.31124780e-02,
2.54286662e-01, 1.81960202e-02, -7.81860561e-03, -5.30373138e-02,
-2.94341276e-02, -2.58421362e-02, -3.85029912e-02, -4.36028096e-02,
4.67960202e-02, -4.67573892e-03, -6.04959326e-02, 3.44279004e-02,
-4.32294352e-02, -1.27615956e-01, 7.55820622e-02, 3.45839877e-02,
-1.19639293e-01, 1.45885832e-01, 4.67547906e-03, -7.37539521e-03,
-3.27779542e-02, 6.05062047e-02, -7.71285160e-02, 9.55809826e-04,
4.73524121e-02, 3.80451458e-02, 6.64970709e-02, 1.19055043e-02,
3.29736108e-02, -1.62921568e-01, 2.94258653e-02, -2.22492552e-02,
2.72471490e-03, -4.42259233e-02, 9.79541765e-02, 8.86809373e-02,
2.98735972e-02, -3.75493601e-02, -9.15437233e-02, 1.57466241e-01,
3.52801176e-02, -1.31400982e-01, 6.26552634e-04, -4.00723013e-02,
-1.13996015e-01, -5.55332480e-02])), (1.1943992057853123, array([-0.00931702, -0.00424866, -0
.01216005, 0.004068 , -0.02263428,
0.00471673, 0.03084057, -0.00579687, 0.01061565, 0.00286548,
-0.05429538, 0.0424345 , 0.17139521, -0.3992305 , 0.06337345,
0.03866668, 0.50312022, -0.17641882, -0.04637666, -0.05400847,
-0.00154834, 0.00562841, -0.01500136, 0.01718973, -0.02036575,
0.08496288, -0.03571607, 0.04040876, 0.05591341, -0.080265 ,
-0.17623252, -0.02222803, -0.03419361, 0.13241458, 0.04712548,
-0.03352416, -0.12270217, -0.00351138, 0.52701109, -0.04683177,
0.03675955, 0.04148061, -0.02656703, -0.07510523, 0.05351867,
-0.0249037 , -0.0318897 , 0.02718563, 0.01596879, 0.05101118,
-0.02903321, 0.06317588, -0.01326512, -0.03020909, 0.05275157,
0.01706168, -0.05856816, 0.03988794, -0.05196665, 0.01620932,
-0.04609711, 0.13712312, 0.03485711, 0.01156724, 0.03409038,
-0.05809907, -0.03100509, -0.06844818, 0.00742539, -0.0287335 ,
-0.01777249, -0.01164615, 0.05614855, 0.01500253, -0.0393923 ,
-0.04277097, 0.09562431, -0.04900554, -0.06115623, 0.00210678,
-0.00588897, -0.10569315, -0.05087745, 0.14847456, 0.06302812,
-0.06454742, -0.01456505, 0.02526081, 0.02779199, -0.06739155])), (1.1835468203512876, arra
y([ 0.03717678, -0.02535452, 0.02706796, 0.02121948, -0.00643199,
0.06230364, -0.04062001, -0.01679781, -0.00330309, 0.05101256,
-0.03300083, 0.02863528, -0.04126303, 0.08946389, -0.04040111,
-0.05012179, -0.1012712 , -0.14580506, -0.01925575, -0.05337933,
0.01891649, -0.00116377, 0.06500961, 0.03899285, 0.0213303 ,
0.07741037, -0.17050636, -0.16609609, 0.07488031, -0.08878289,
-0.1630476 , 0.11557036, 0.18380201, 0.082032 , 0.10960807,
0.03167498, -0.10141503, 0.23255009, -0.11216596, 0.22622171,
-0.11895868, 0.02902314, 0.07728901, -0.11490983, -0.10045019,
0.00619202, -0.04623066, -0.13040619, 0.17791992, 0.00567379,
-0.01654318, 0.03274446, -0.05062707, 0.01265202, 0.05737324,
-0.0064413 , 0.09038616, -0.01321635, 0.0524206 , 0.02059814,
-0.35873099, 0.27435224, 0.338374 , 0.07888839, -0.11605764,
-0.03570986, 0.01418727, -0.08400272, -0.02308667, 0.02757074,
-0.02740659, -0.04492104, -0.04322222, 0.0325053 , -0.09134924,
0.01900407, -0.05265527, 0.01393194, 0.1479187 , -0.02768956,
-0.01705519, -0.19410776, 0.13552946, 0.05032767, 0.01248076,
0.20909184, -0.08041327, 0.0217424 , 0.02548379, -0.14547926])), (1.175215028494778, array
([-0.01003806, 0.00929693, -0.00839073, -0.00384031, -0.02025381,
0.01108479, 0.03082652, 0.00684872, 0.0040802 , -0.00835238,
-0.00888267, 0.02343615, 0.00107126, -0.02599632, -0.01324817,
0.02040201, 0.06855982, -0.1738684 , -0.02001343, -0.00324887,
-0.00960466, -0.00413402, -0.00319067, 0.06760916, -0.03168774,
-0.04264339, 0.04942483, 0.03072312, 0.02012888, 0.09466353,
-0.06283886, 0.09247561, -0.16311953, -0.00957255, -0.18114619,
0.01839285, -0.12161602, 0.06257645, 0.07093488, 0.41841154,
0.09921985, 0.03727513, -0.0018813 , -0.13057926, 0.03516641,
0.00174659, 0.02598584, 0.16304889, 0.07714149, -0.05721265,
0.01428726, -0.04961955, 0.0233701 , -0.02559996, 0.0155587 ,
-0.01602477, -0.01613317, 0.00661517, -0.02768675, 0.00053859,
0.21964819, -0.41933767, 0.27741611, 0.0572856 , 0.03932133,
-0.01142053, -0.01319052, 0.03717639, -0.0075554 , 0.00712815,
0.03087199, 0.01612459, -0.04848844, -0.00117159, 0.03997727,
-0.06363599, 0.04734225, 0.01954812, 0.45531874, -0.07439785,
0.02489246, 0.03331119, 0.01755671, -0.22160939, 0.01271444,
0.02131253, 0.0366985 , 0.07920369, -0.01420712, 0.07780367])), (1.1607311253642982, arra
y([ 2.87769241e-02, -8.58732180e-03, 5.11720150e-02, -3.63311597e-02,
-1.14374568e-01, 4.54263578e-02, -5.78887157e-03, -2.58465735e-03,
-1.40020678e-02, 2.94326379e-02, -6.44111057e-02, 4.26378371e-02,
-4.62103817e-02, 1.10513405e-01, -6.83260112e-02, -1.32374633e-01,
1.57587691e-01, -4.88596364e-02, 6.47743069e-02, 1.21486343e-01,
-2.98135439e-02, 2.38420919e-02, 4.41077246e-02, -2.58264723e-03,
1.19040856e-01, 7.37694514e-02, -2.99108580e-01, -1.43489044e-01,
1.30655959e-01, -6.51230081e-02, 2.07728568e-01, 8.58640129e-03,
-7.58933110e-02, -2.93781059e-02, -1.13635802e-01, 1.40724954e-02,
-1.51357057e-01, -1.16784720e-03, 1.09990489e-01, -8.49939518e-02,
-1.03762028e-01, 1.61494462e-01, 8.54602535e-02, 1.05684727e-02,
-1.58935743e-01, 7.89426677e-02, 1.10384311e-02, -2.27832673e-02,
1.53458821e-01, 5.05888760e-02, 3.94104789e-04, -9.09718920e-02,
3.38157682e-02, 2.34985521e-01, -4.08935670e-01, 5.66997577e-02,
-1.70489242e-01, 6.78470832e-03, 1.46005574e-02, 5.25839206e-02,
5.42972822e-03, -6.47901863e-02, -1.28497758e-01, 4.30876884e-02,
-4.98891316e-02, 9.35210472e-02, 4.30546194e-03, 2.88052594e-02,
4.62027218e-02, 6.67473103e-02, -8.45943593e-02, 5.52658733e-02,
5.18162141e-02, -2.87932257e-02, 2.95775228e-02, 7.21685520e-03,
-1.11638155e-01, 9.76728264e-02, -1.98870673e-02, -1.41592603e-02,
1.73892059e-03, -9.33084113e-02, -2.54801086e-01, -1.00897644e-01,
1.03824118e-01, 2.08909237e-01, -1.09038150e-04, -2.96310066e-02,
-3.01645687e-02, 2.05984863e-01])), (1.148470393300462, array([-3.61964599e-05, 2.65794249e
-02, 3.60987797e-02, -6.30859519e-02,
3.53506277e-02, 2.35960207e-02, 3.72210055e-02, 2.50778801e-02,
2.32019288e-02, -5.01718666e-02, 8.58251787e-02, -6.55178810e-02,
1.59312468e-02, -1.04240857e-02, 2.31768437e-02, 4.78816367e-02,
-3.16944574e-02, 3.16321506e-02, -5.56196280e-02, -8.64990801e-02,
-1.94608535e-02, 2.91957870e-02, -1.19483897e-01, 7.98492218e-02,
2.69925041e-01, 2.97472967e-01, -1.46214798e-01, -3.88005261e-01,
1.58821192e-02, 2.46261316e-01, -1.12748529e-01, -3.44188737e-02,
-6.91164070e-02, -6.31484345e-02, 9.59373799e-03, -4.91259929e-02,
1.29723837e-01, 4.52989851e-02, -1.12190427e-02, -6.85997785e-02,
3.44038836e-02, 6.86329050e-02, -1.49008856e-01, -1.39422356e-01,
2.51729095e-01, 6.54074575e-02, -3.38979458e-02, -3.67616247e-02,
5.93783187e-02, 8.05105269e-02, -3.99044291e-02, -1.30776028e-02,
2.26413898e-02, -2.68176282e-03, -3.03462048e-02, 3.52784116e-02,
-2.99922388e-02, -6.90536366e-02, -2.80968997e-02, 7.02937675e-02,
4.56930898e-02, -8.37216294e-02, 8.48566660e-02, -1.34127177e-02,
1.41981688e-02, 9.92679438e-02, -3.34988956e-03, -2.94506250e-02,
1.32539787e-01, -6.00787755e-02, -2.51151415e-02, 9.89540495e-03,
-9.42027640e-02, -2.66558520e-01, 4.05104637e-02, -1.54805616e-01,
-3.76218661e-02, -1.74510560e-01, -1.30386733e-01, -1.52019447e-01,
1.40425150e-01, 2.42306623e-01, 8.44530920e-02, 6.53919613e-02,
6.36447079e-02, 5.02101913e-02, 8.80789722e-03, 6.75093163e-03,
-9.41898938e-02, -1.37811683e-01])), (1.1194893763589782, array([ 9.75212477e-03, -2.14980571
e-02, 5.02994180e-04, 1.70836411e-02,
4.84416567e-02, 5.29926893e-03, -1.76297741e-02, -1.84186047e-02,
-7.34638265e-04, -9.65676047e-02, 1.62105007e-01, -6.97746038e-02,
-9.07416941e-02, 1.00229032e-01, 7.87474923e-02, -1.30837342e-03,
3.72779209e-03, -1.66772165e-01, 5.80607165e-02, -5.92666217e-02,
-5.06231372e-02, 5.00420706e-02, -5.76942041e-02, -2.33076116e-02,
-2.28977381e-01, 4.53010866e-02, 4.30024819e-01, -1.87498337e-01,
2.95267449e-02, -5.09800705e-02, -1.76023158e-01, -4.50249163e-04,
9.18409746e-02, 5.71992742e-02, 8.82050093e-02, 4.67645473e-02,
-5.21500033e-02, 4.66011022e-02, 4.25656486e-02, 8.54790914e-02,
-2.34339574e-02, -6.76744639e-02, 2.23879271e-02, -7.27912296e-02,
-6.87466408e-02, -2.27698963e-02, 9.92739113e-03, -4.40896135e-02,
2.10350772e-02, -2.74350657e-02, 1.63940431e-02, 3.26246694e-02,
-4.88825022e-02, 1.08413925e-01, -9.60192428e-02, 1.04039975e-01,
-2.07286354e-01, 7.28417267e-02, 2.54386057e-03, 1.68173847e-03,
-6.82465550e-02, 9.38517009e-02, -2.04653192e-02, -1.25056428e-02,
-1.79577705e-02, 3.59411927e-02, -3.10549696e-02, -1.86131538e-02,
1.11532990e-01, -2.17296757e-02, 6.31582889e-02, 3.33960948e-02,
-8.06849400e-03, 4.77517866e-03, -1.55688143e-01, 1.15610590e-01,
-4.39679617e-03, 8.33406560e-02, 1.63903924e-02, -9.29607729e-03,
-5.89541465e-03, 4.54125479e-01, -3.76626344e-01, -1.84220010e-02,
-3.05357641e-02, 2.68322063e-02, 8.90767292e-03, -9.60404650e-02,
3.64751766e-03, 1.39357795e-02])), (1.1096027631760017, array([-0.03055394, 0.02006923, -0
.01184309, -0.03560009, 0.03543938,
-0.03474633, 0.07979204, 0.01308007, 0.02382763, -0.12809165,
0.17073957, -0.1205178 , 0.0791998 , -0.06903869, -0.05864014,
-0.03898209, 0.02089783, -0.09830454, 0.06454565, 0.13005107,
0.01883959, -0.01667412, -0.14323759, -0.10465112, 0.15049143,
0.20588045, 0.09559427, -0.24177582, 0.01675801, -0.32678463,
0.22266998, 0.06855058, 0.06707538, 0.11026669, -0.11583863,
0.03869523, -0.15933217, -0.04541662, 0.01445586, 0.07800317,
0.05061759, -0.20758707, -0.02736484, 0.1909889 , 0.09907509,
0.02327433, -0.02997833, 0.13052392, -0.08841585, 0.1226507 ,
-0.08356291, -0.00260784, 0.00180205, -0.00090463, -0.00061247,
-0.03144258, 0.13068261, -0.09081928, 0.01904294, 0.01296567,
-0.05249401, 0.10687873, -0.03820156, -0.00198812, -0.00053131,
0.01615025, 0.15613067, -0.08968868, 0.05123437, -0.08701044,
0.00300159, 0.00884269, -0.14801765, 0.05676924, 0.00349479,
-0.0093203 , -0.16312161, 0.01851605, 0.13961157, 0.22166984,
0.0680938 , -0.00547623, 0.20430917, -0.07466997, 0.00161567,
-0.15138955, 0.17433659, -0.09265044, -0.03864607, 0.16236077])), (1.1049386135422454, arra
y([ 0.00390238, 0.01124098, -0.00319226, 0.01276039, 0.02215105,
0.00251291, 0.03190408, 0.01135741, -0.01215839, -0.02391252,
0.06608213, -0.03203576, 0.01005787, 0.00093562, -0.1271413 ,
0.0043077 , 0.09066048, 0.13911112, -0.02572941, -0.06674581,
0.01267361, 0.00205398, -0.00689407, -0.01659854, 0.21426127,
-0.25949547, 0.18789231, -0.04147826, -0.04967394, 0.02929571,
-0.23312165, 0.31962601, 0.08978283, -0.06819931, -0.1871029 ,
0.05366093, 0.13418003, 0.11565465, 0.089745 , -0.28088451,
0.09355484, -0.05455608, 0.06352312, -0.00994818, -0.06054641,
-0.07468322, -0.00445091, 0.10529331, 0.15791134, 0.12185223,
-0.03846646, -0.07495017, 0.00653686, 0.05899556, -0.08369328,
-0.03960324, 0.072001 , -0.02163773, 0.03886216, -0.00853727,
-0.05292571, -0.05905547, 0.13561911, 0.25990598, -0.04319963,
-0.08102355, -0.00085609, -0.09659159, -0.01459406, 0.01992934,
0.19136588, -0.0134315 , 0.02139756, 0.10682446, -0.01656247,
-0.02167538, 0.13389442, -0.00706919, -0.22614349, 0.07601096,
0.04167494, -0.00923226, 0.0642536 , -0.25490954, -0.04759977,
0.16218892, 0.13475305, -0.05883087, 0.01153269, 0.06958206])), (1.0914342813543214, arra
y([ 0.02189241, -0.03443139, 0.00945862, 0.02380971, -0.00049311,
0.01258217, -0.0174738 , -0.02829601, 0.00979614, 0.1627378 ,
-0.1235408 , 0.07101763, 0.00283509, -0.03326187, -0.01674985,
-0.00679414, 0.01288757, -0.04424785, 0.05538182, 0.05505606,
0.12584791, -0.07862583, 0.04550909, 0.13851923, -0.17950897,
-0.06372055, 0.19302631, -0.06672413, -0.02183436, 0.1176496 ,
0.01215119, -0.07630287, 0.10243681, -0.08379255, -0.05316322,
-0.04903896, -0.01993359, -0.05983417, -0.00481602, 0.04853408,
0.04414344, 0.04041471, -0.00319666, -0.11173285, 0.00396367,
-0.04073454, -0.01805713, 0.03039337, -0.19989023, 0.36637598,
-0.0792469 , 0.14439615, -0.00384276, 0.17040613, -0.31548541,
-0.14895144, 0.25802145, -0.08622959, 0.01933251, -0.00363105,
-0.0267289 , 0.03013368, 0.01289024, -0.10870153, -0.00955271,
-0.16005998, -0.01934533, -0.00140748, -0.13589127, -0.0481737 ,
-0.00224047, -0.14533548, 0.28976179, -0.17671099, -0.07014446,
-0.14335317, -0.15035157, 0.02526417, 0.04124523, -0.07163569,
-0.00110598, 0.09289375, 0.14714881, 0.11619341, 0.01358176,
0.0716517 , 0.10363371, 0.04587821, 0.01845485, 0.09531802])), (1.0846048546102214, arra
y([ 1.36416745e-02, -3.76527152e-02, 5.28365265e-02, -6.71135070e-02,
-3.35727071e-02, -5.32886764e-03, -5.07830935e-02, -3.29187958e-02,
1.40798118e-02, -6.18918016e-02, -3.69930704e-02, -4.26516643e-03,
3.49837879e-02, 6.77725364e-02, -7.07306194e-02, -4.57095020e-02,
-4.05979345e-03, 1.08793143e-02, -1.85499966e-01, -5.90737535e-02,
-9.00830552e-02, 9.51727326e-03, -2.42416783e-03, -1.30326030e-01,
8.31577790e-02, 1.58379027e-01, -1.54707423e-01, -3.69055337e-02,
-1.34675257e-01, 2.16944049e-01, -1.67535471e-01, 1.42152860e-04,
1.40949060e-01, 1.29576748e-01, -5.52087481e-02, -4.38598682e-02,
-2.11872636e-02, -1.64992513e-01, -5.00899381e-02, 1.04220238e-01,
-1.85693009e-02, 1.69091507e-01, -6.76802710e-03, 2.43984780e-01,
-1.34055526e-01, -9.39195572e-02, 5.67596962e-02, 7.37155245e-02,
-2.13678509e-01, -2.10050985e-02, 4.96224151e-02, -4.66060006e-02,
1.94950335e-02, 6.18430541e-02, -1.17830163e-01, 9.37997796e-02,
-1.38472573e-01, 2.39545267e-02, 1.11274762e-02, 2.76187965e-02,
-6.56220134e-02, 5.16874758e-02, -2.63416474e-02, -1.08027231e-01,
-4.05131629e-02, -3.64657569e-01, -1.49822854e-03, -2.17486451e-02,
9.54450737e-02, -1.93217907e-03, -4.04227011e-02, 9.06092703e-02,
1.76503982e-01, 2.64354242e-01, -4.30849875e-03, -1.45214699e-01,
2.28849015e-01, -1.66326592e-01, 1.17558268e-01, 1.23194184e-01,
8.76691304e-03, 1.37073531e-01, 9.53029811e-02, -3.55104498e-02,
-7.08018442e-02, -3.27677377e-02, -4.75275678e-02, -5.51551190e-02,
-3.94044368e-02, 5.05800808e-02])), (1.0827345320601791, array([ 0.02608929, 0.01371157, -0
.01316454, 0.07103393, 0.06334471,
-0.01052515, 0.05838226, 0.01794481, 0.01605202, -0.08722451,
-0.00212057, 0.01563993, -0.040189 , 0.06106663, 0.04471129,
0.03679496, -0.03609148, 0.05505417, -0.01942624, -0.074719 ,
-0.07492881, 0.02029782, 0.01929931, -0.07387488, 0.10632031,
-0.12726389, -0.06496204, -0.16477549, 0.12185701, 0.03147709,
0.0849513 , 0.27178893, -0.19333652, -0.1181441 , 0.08843202,
-0.09415497, 0.04062823, -0.18361823, -0.01325183, 0.13323152,
0.04412477, -0.00660519, 0.01855727, -0.07165366, 0.03654398,
-0.01583086, -0.00077891, 0.09646896, -0.17793183, 0.19398858,
-0.06899618, 0.14358455, -0.08394391, 0.00725759, 0.09285048,
0.04445943, -0.15460279, 0.13528103, -0.03727375, -0.00432682,
-0.00749074, 0.00194684, -0.09744172, 0.22268596, -0.01705897,
-0.2809283 , -0.02343944, -0.01155507, 0.04431471, 0.02610905,
-0.09651504, -0.00723791, 0.11274514, -0.06014666, 0.07767799,
0.31518473, -0.08841039, -0.00263312, 0.09328507, 0.02644484,
-0.05111885, -0.13368294, -0.13157751, 0.13679515, 0.05311068,
0.01467942, 0.2654361 , -0.05453342, 0.05882195, -0.26311322])), (1.0711889349293784, arra
y([ 7.35032739e-03, -6.05294219e-03, -3.18961523e-02, 6.92206083e-02,
2.11146054e-02, -1.80449277e-02, 1.73558765e-02, -4.30268813e-03,
8.40429472e-03, -5.06717369e-02, -3.20152398e-02, 8.37848800e-02,
-6.93265601e-02, 6.89973963e-02, -1.43561936e-01, -7.60267861e-05,
1.08240437e-01, 9.97229941e-02, 6.25795502e-02, -2.68711516e-02,
-8.95563476e-02, 3.11804891e-02, 2.92291847e-03, 6.99087376e-02,
1.72306788e-01, -1.70191104e-01, 4.92745375e-02, -1.16659803e-01,
8.01172970e-02, 2.19604626e-02, 4.57797290e-03, -3.18535647e-01,
2.83039200e-01, -4.42865328e-02, -2.14172005e-01, 8.65819959e-02,
4.59897609e-02, 1.67839804e-01, 4.74444353e-02, -2.04096693e-03,
4.45972907e-02, -1.84784859e-02, 3.67002221e-02, -3.89093533e-02,
-5.44284701e-02, -7.35226162e-02, -1.79040400e-02, 1.03478098e-01,
-2.36485139e-01, 1.56970299e-01, 8.61790027e-04, -7.76936897e-02,
-9.22260724e-03, -2.88677542e-02, 8.94937864e-02, 7.70136302e-02,
-2.26364718e-01, 1.80732695e-01, -7.33757398e-02, 3.04025298e-03,
-1.29459403e-02, -9.29285890e-03, 1.67078601e-01, -3.92809793e-01,
3.15256265e-03, 1.38861430e-01, -8.32651408e-02, -4.50246873e-02,
7.95186158e-02, -6.99649803e-03, 4.56267492e-02, 1.02014106e-01,
-5.46238406e-02, 8.18037302e-02, 8.04374447e-02, 1.31697986e-01,
-1.52482944e-01, -2.38383139e-02, 1.89103034e-03, -1.06697907e-01,
-8.98353059e-02, -1.41209456e-01, 2.83590603e-02, -6.28238278e-03,
-6.59767662e-03, -4.96758182e-02, 6.35906472e-02, 6.19351502e-02,
6.61072339e-02, -6.05161345e-02])), (1.0636889264741323, array([-1.78741662e-02, -3.35305864
e-03, -4.63080385e-02, 4.79209043e-02,
-4.41204495e-02, 2.51674495e-02, 8.04611723e-03, -6.59300723e-03,
-2.84730843e-02, 2.41629079e-02, 3.87817790e-02, 6.71582876e-02,
-5.03348601e-02, -1.62125338e-01, 1.74122016e-01, -5.42585379e-02,
-8.18383340e-02, -3.52185647e-02, -3.77420648e-02, -5.34238699e-02,
1.17142345e-01, 1.94547224e-02, -7.08832856e-02, -1.27047112e-01,
3.39755313e-01, -1.35466949e-01, 1.04595107e-02, -1.30035681e-02,
7.39993858e-02, -5.81725018e-02, -1.44061875e-01, -1.99191906e-01,
1.79286967e-01, -1.21190591e-01, 8.01792571e-02, -7.64494994e-02,
1.66262543e-02, -2.39493290e-01, 3.83750787e-02, 1.83870406e-01,
-7.47679673e-02, -8.07918404e-02, 6.24524946e-02, -3.99129133e-02,
-1.42741438e-01, -7.21310505e-03, 4.96211043e-03, 8.48621898e-02,
2.15515680e-01, -8.26625518e-02, 2.15747714e-02, -4.48358208e-02,
-2.04433750e-04, -1.59766103e-02, 3.70072966e-02, -6.60417365e-02,
1.58815573e-01, -3.26744578e-02, -5.40505568e-02, 2.52737200e-02,
7.66776346e-02, 2.97530563e-02, -2.57170177e-01, -1.02254792e-01,
3.18499749e-02, 6.61156058e-02, 4.39845954e-03, -4.25431053e-02,
-1.01418002e-01, -5.75079706e-02, 1.35511675e-01, -6.17755409e-02,
3.27842198e-02, -1.01334015e-02, 1.06259242e-01, 3.39699852e-02,
1.81752213e-02, -2.13502123e-01, 1.44220602e-01, -1.17824890e-01,
-2.69799529e-02, 3.04793508e-02, -4.85665970e-02, 5.10212773e-02,
1.21739561e-03, 2.39552260e-01, 3.37575025e-02, -2.97237033e-01,
7.24032687e-02, 6.35448512e-02])), (1.054552899346843, array([ 0.00189664, -0.02418795, 0.
01666143, -0.02559334, -0.04723553,
0.08762733, -0.01844839, -0.02246402, -0.00971444, -0.00164947,
0.06009845, 0.05435715, -0.11430688, 0.05245795, -0.07366107,
-0.09214164, 0.03386417, 0.17952946, -0.00685296, -0.06151238,
-0.01797202, 0.02444293, 0.01174569, -0.09673668, -0.0569378 ,
0.06625694, -0.09712417, 0.08872945, 0.04657131, -0.03924046,
-0.08835877, 0.01618904, -0.02475417, 0.00304524, -0.16224386,
-0.05104083, 0.13126887, -0.09246795, 0.0206345 , 0.02608835,
0.03934495, 0.01252745, 0.08147797, -0.24957468, -0.0223957 ,
-0.07817746, -0.02509523, 0.3154523 , -0.13966946, 0.06831692,
-0.02458838, 0.00925475, -0.05591175, -0.02468803, 0.13381471,
0.02909932, -0.00545406, -0.03017539, 0.02461014, 0.02148193,
-0.09844483, 0.14146013, -0.10725505, -0.04139617, -0.03799853,
0.10957219, -0.02784257, -0.01238004, -0.05261667, 0.07543871,
-0.08444205, -0.0357948 , -0.23817986, -0.24027782, 0.01643644,
-0.32423113, 0.18525905, 0.38181714, 0.10277854, 0.21109554,
-0.12364024, 0.06989283, -0.04347523, 0.05079479, 0.03984204,
0.09982685, -0.03860539, -0.12303043, -0.03791448, -0.01685344])), (1.0495564454254236, arra
y([ 0.01064236, 0.01687834, 0.00774352, 0.00608319, -0.0663222 ,
0.02874585, -0.01472903, 0.01797059, -0.01661496, 0.00404952,
-0.01793021, 0.02676529, -0.03435304, 0.06351985, -0.08183986,
-0.05729709, 0.11622493, 0.1527865 , 0.00436274, -0.09122086,
0.05417958, -0.04566393, -0.01298813, -0.05333558, 0.05338584,
0.00566828, 0.08695468, -0.00570582, 0.01474224, -0.05766692,
-0.1724336 , 0.14035376, -0.03804171, -0.15738222, -0.08266736,
0.04978232, 0.05402185, 0.01735426, 0.1069416 , 0.0049564 ,
0.01182338, 0.08756753, 0.00285441, -0.00483389, -0.06538106,
-0.19215988, 0.02330812, 0.16395316, -0.11995921, -0.30957911,
0.08650203, -0.02370349, 0.04727246, -0.01419063, -0.05444469,
0.00958458, 0.10879339, -0.09002233, 0.0206424 , 0.0085553 ,
-0.01636564, 0.03688603, -0.02780243, 0.21259285, 0.00504157,
0.10054532, 0.06241784, 0.10927469, 0.06292144, 0.0116295 ,
-0.23382899, 0.05088776, 0.06485522, 0.10607442, -0.03182513,
0.03136658, -0.51409216, -0.09419313, 0.05395911, -0.09983506,
-0.09806359, 0.12642438, 0.1162335 , 0.16582754, 0.05573822,
-0.12724108, -0.25410565, -0.01573826, 0.01954463, 0.0113436 ])), (1.048150719936755, array
([ 6.32926518e-03, 1.93752966e-02, -6.53840904e-04, 1.27988606e-02,
-2.23016780e-02, 2.23062907e-02, -1.25351527e-02, 1.94998288e-02,
-1.66564719e-02, 1.12505578e-01, -2.63086807e-02, 1.07711355e-02,
-2.42326174e-02, 4.17755806e-02, -2.74605434e-02, 1.52271476e-03,
3.39426545e-02, -5.25063210e-02, -2.08645711e-01, -9.29383028e-02,
1.55201829e-01, -1.05959186e-02, 1.46360546e-02, 7.17564931e-02,
-2.21967333e-03, 3.61857593e-03, -9.09097678e-02, 8.68340523e-02,
5.21987840e-02, -1.96675177e-02, -1.77688729e-01, -6.89211306e-02,
-8.28044588e-03, 2.71875449e-01, -5.61288151e-02, 9.82268066e-02,
1.41919011e-02, -9.46419057e-02, -2.21997555e-02, -6.73477724e-02,
2.91777474e-04, 3.23841883e-02, -7.05894469e-03, -2.66193621e-02,
3.08786946e-02, -1.17226362e-01, -3.41913073e-02, -6.49851301e-02,
-3.64756074e-02, 1.53302816e-01, -2.73293681e-02, -9.04331988e-02,
3.10836892e-02, -5.49365525e-03, -1.92861272e-02, -7.27238605e-02,
1.20203541e-01, -5.49052633e-02, -8.10101525e-03, 2.82562220e-02,
2.27497961e-02, -4.56225291e-02, -4.50583496e-03, -8.53459705e-02,
-5.91057225e-03, -1.42297715e-01, 1.29745173e-02, 9.10621502e-02,
-1.10595308e-01, 1.60057000e-01, -6.90532574e-02, -2.61851164e-02,
-3.13303654e-01, -1.87659790e-01, 8.05732976e-02, 4.47115037e-01,
-1.32055824e-01, -1.90136154e-01, -8.59790525e-03, 3.21439334e-01,
1.76632464e-02, 1.51021356e-01, -9.00769966e-02, -1.30557213e-01,
6.19954135e-02, 4.48562221e-02, -8.80594436e-02, 1.19985457e-01,
7.29911054e-03, 5.90922916e-02])), (1.0416363339168457, array([ 1.46623820e-02, 1.54303991
e-02, 1.79455242e-02, -4.32272660e-03,
2.35530131e-02, -9.42976306e-03, 2.03551780e-02, 1.73751586e-02,
-2.37510925e-04, -4.17669585e-02, -2.87456320e-02, 1.30099513e-02,
6.80922864e-03, -8.69418294e-04, 5.81911122e-02, 6.02727274e-03,
-3.76176668e-02, -2.61469369e-02, -1.77654206e-02, -1.02570906e-01,
-1.40732267e-01, -2.30153512e-02, -7.99214816e-03, -8.46987746e-02,
6.93018879e-02, -6.99778433e-02, 4.52420607e-02, 3.34853621e-02,
2.43491611e-02, -1.19113095e-01, -7.75098025e-03, -1.21760579e-01,
1.14685497e-01, 2.25428210e-01, -1.10637738e-01, 3.40121251e-02,
1.57074247e-03, -8.49852459e-03, -3.95465819e-02, -5.60547606e-02,
5.59753541e-03, -1.02641664e-02, -1.56158744e-03, 8.08025724e-03,
3.88694509e-02, 5.18183511e-02, 2.87509168e-02, 4.34750839e-02,
1.31356215e-01, -1.76073092e-01, 6.86705263e-02, 4.02618874e-01,
-2.79742194e-02, -8.14257034e-02, 5.59803184e-02, 4.68295630e-02,
-7.85045448e-02, 1.19823169e-02, 1.26926437e-02, -2.64786421e-02,
9.56324796e-02, -1.10901569e-01, -1.39180410e-02, -6.63077191e-02,
2.49854692e-02, -2.51737936e-02, -2.19902848e-01, -6.11794604e-02,
5.36039258e-02, 6.68968645e-02, -2.45041001e-01, -8.18407346e-02,
3.21649268e-01, -1.64446235e-01, -1.29230020e-01, -8.38560239e-02,
-2.05545607e-01, 5.81414621e-02, -5.98264940e-02, 2.33343621e-01,
2.48402369e-01, -3.12717029e-02, 9.50193588e-02, -2.30041984e-01,
7.45476523e-02, 1.64200297e-01, -1.60027315e-02, -3.97527715e-02,
-3.80590486e-04, -1.18257727e-01])), (1.0381369601703594, array([-0.01452019, -0.01310388, -0
.00336089, -0.02087642, -0.02197353,
0.05608255, 0.01455196, -0.01515219, 0.00758196, 0.02941782,
0.06903329, 0.01999798, -0.07352511, -0.00069396, -0.0552507 ,
0.0230458 , 0.07026268, 0.06683153, 0.01384287, -0.12941273,
0.07686492, 0.00546715, 0.0082493 , -0.00252667, 0.01492917,
-0.00691446, -0.04071411, -0.01726711, -0.01130595, 0.22201318,
-0.01064372, -0.05323656, -0.16254632, 0.01430831, -0.04516928,
0.07372426, -0.0349304 , 0.102635 , 0.01170177, 0.0255023 ,
0.04308332, 0.00141148, 0.02590048, 0.21401737, -0.1421845 ,
0.12882908, -0.00345029, 0.02566968, 0.01811789, 0.0097195 ,
-0.02130792, 0.14354843, -0.0388769 , -0.03878133, 0.08394915,
-0.05331129, 0.11921897, -0.14265847, 0.07302256, 0.0005525 ,
-0.05354975, 0.0696055 , 0.02237221, -0.09216676, -0.01143389,
0.01236562, 0.02386611, -0.04510227, -0.08527455, -0.04482601,
0.10257254, -0.30206641, -0.05343934, 0.31309607, 0.32663363,
-0.14205531, -0.21014011, 0.1071922 , 0.02449223, 0.06901395,
0.31163194, 0.06347318, -0.34719675, 0.06869492, 0.02253956,
-0.01624024, 0.08994234, 0.10084055, -0.0057217 , -0.14646712])), (1.0334519547258465, arra
y([ 5.77656009e-04, -6.14921415e-03, 1.52996270e-02, -2.56453126e-02,
1.72353854e-02, 3.11267022e-02, -3.45374061e-02, -5.69263252e-03,
-6.27046938e-03, 1.14765169e-02, 8.08286924e-03, 1.94396853e-02,
2.26049114e-03, -9.07174977e-03, -1.92518361e-01, -4.57572252e-02,
1.19314884e-01, 2.15268187e-01, 2.73962655e-03, 1.88035665e-01,
-1.82317506e-02, 1.35226005e-02, 3.52944128e-02, 3.01395938e-02,
-1.35578444e-01, 2.14035930e-02, 2.08187780e-02, -8.50202768e-03,
-4.90208524e-02, 7.78140518e-02, 1.57332492e-01, -4.31036636e-02,
4.91241884e-02, -6.15175316e-02, -1.73908341e-01, -2.43889746e-01,
-2.10275312e-04, 1.67260936e-01, 8.27515080e-03, -3.85271443e-02,
1.40574031e-01, -6.91170120e-02, -3.03533770e-02, 4.36044513e-02,
6.53480795e-03, 3.48562972e-01, 3.81207646e-02, 2.12294650e-01,
1.37878922e-01, -2.19623701e-01, 4.09898494e-02, -1.31708774e-02,
-1.78900973e-02, 2.42607087e-03, 3.14801899e-02, 3.44814079e-02,
2.59723498e-03, -2.00335054e-02, 3.68773565e-03, 1.75328683e-02,
-6.44631385e-02, 6.46585338e-02, 5.75447016e-02, -1.56181005e-02,
-1.47293524e-02, -1.61766587e-01, 3.15482798e-03, -2.36391663e-01,
-7.59884246e-02, -1.98443256e-01, -4.50284225e-02, -1.03828833e-01,
3.53474695e-02, -1.75930249e-01, -2.24283935e-02, 3.15305407e-01,
1.11137397e-01, -1.87891103e-01, 1.00704819e-01, 6.24550923e-02,
-3.41999807e-02, 1.13475557e-01, -2.60730071e-02, 1.43323613e-01,
-1.56250938e-03, 1.02246588e-01, -1.27730473e-01, -3.73056009e-02,
-2.51357486e-02, 2.17889221e-02])), (1.027681645874375, array([-0.01310177, 0.01012866, -0.
03677743, 0.04007282, 0.02208434,
-0.0266942 , -0.05120676, 0.00704602, -0.03124247, 0.12697724,
-0.06647692, -0.00186713, -0.01114223, 0.08695685, -0.03312939,
-0.03077175, 0.0148754 , 0.02452668, 0.01383393, 0.00964574,
0.28695189, -0.07626678, -0.06120312, 0.2266083 , 0.12077144,
0.00223681, 0.03746042, -0.23434108, 0.07633301, 0.07424533,
-0.00093397, 0.04150766, -0.15909224, 0.23787766, -0.03534287,
0.0836789 , 0.00457522, 0.07283674, -0.06969312, -0.02903364,
-0.01943321, -0.09437936, -0.01090542, 0.01431064, 0.00078577,
-0.06842964, -0.00583048, 0.09406928, -0.11215129, -0.05293754,
0.04177536, 0.10423728, 0.01473155, -0.02739377, -0.01496355,
-0.05298254, 0.00411919, 0.11092835, -0.08741769, 0.01822837,
0.01908275, -0.02756407, 0.0354024 , -0.02845482, 0.00891159,
0.1017309 , 0.1356951 , -0.05839743, -0.1241226 , 0.19573594,
-0.0861197 , -0.07853395, 0.15779429, 0.01583231, -0.08672095,
0.0854994 , 0.26404478, 0.0554805 , 0.0046045 , -0.09116275,
0.14921383, -0.13585461, -0.09655757, 0.02634402, -0.0284895 ,
-0.23497157, -0.21359727, -0.42175165, 0.02789145, 0.04223427])), (1.0238189302225627, arra
y([-3.34466103e-03, 6.38920502e-03, -3.61879641e-02, 5.70146856e-02,
2.37921035e-02, 1.36315975e-02, -3.13055516e-02, 5.38838099e-03,
-2.36030724e-02, 2.13974787e-02, 3.25216864e-02, 2.21149601e-02,
-5.06359500e-02, -1.44759871e-04, -6.38409007e-03, 3.93762100e-02,
2.92729685e-02, 4.69509273e-03, -1.87429134e-01, -9.71013168e-02,
3.43571003e-02, 5.75364424e-02, -3.26306519e-02, 1.28664041e-01,
7.77140198e-02, -4.93662232e-02, 6.92259270e-03, -4.87507395e-02,
9.57992859e-02, -3.19971853e-02, -5.98961019e-03, -6.69245285e-02,
-1.58410107e-01, 1.68967514e-01, 9.27890488e-02, 4.63448002e-02,
-1.19614283e-02, 9.31119485e-02, -2.89798864e-02, -1.81792535e-02,
-2.19585018e-02, -7.84419532e-02, -2.71152907e-02, 6.45713023e-02,
-4.00682871e-02, 1.38066375e-01, 6.14912115e-03, -7.93693725e-02,
1.36834518e-01, 4.22202882e-03, 1.75654360e-02, -2.64308583e-01,
1.18320984e-02, 1.30152840e-02, 4.13198580e-02, 2.01704456e-02,
-5.94070552e-02, 6.11268842e-02, -2.01341437e-02, -3.41170818e-02,
3.44108864e-02, -1.73302164e-03, 2.84298457e-02, 3.41668876e-03,
3.49527597e-02, -1.89583555e-01, 6.14688462e-02, 9.06845883e-03,
4.80229754e-02, -1.62294049e-01, 2.15218250e-01, 3.15533805e-02,
3.14501896e-01, -1.74348423e-01, 2.61057759e-01, -1.32057252e-01,
-2.01552514e-01, 2.76094879e-01, -3.33620929e-03, 1.74634167e-01,
-3.68153250e-01, 9.10455881e-02, 1.19570172e-01, -1.59658147e-02,
-1.08267500e-02, -1.04737207e-01, -1.22950130e-01, -1.03392515e-01,
5.15375610e-02, -3.83322394e-02])), (1.015216974820168, array([-0.01618276, -0.02360014, -0.
02926011, 0.02127483, 0.00887336,
-0.01031012, -0.03890124, -0.02537713, -0.00709545, 0.0183462 ,
0.02564321, 0.03171079, -0.02552684, -0.05290231, -0.05858041,
0.01397781, 0.04483173, 0.10630102, -0.09710735, 0.21751889,
-0.16086921, 0.03953656, 0.01445865, 0.05711042, -0.05360371,
0.04744034, -0.10515641, 0.06592877, -0.06065161, 0.19837085,
-0.08167092, 0.001723 , -0.05500338, 0.0178873 , -0.02935533,
0.23903312, 0.05112242, -0.20404582, -0.00608127, 0.05924824,
-0.03380718, -0.13369095, 0.09619283, -0.16325832, -0.08844357,
0.05882671, 0.00166718, -0.12655043, 0.08310361, 0.10773846,
-0.0047094 , -0.0187615 , -0.0257083 , 0.03410192, -0.00164383,
0.06775397, -0.13520662, 0.03043049, 0.01271346, -0.02116568,
-0.00908414, 0.07359619, -0.09341584, 0.15139447, 0.01570467,
0.03237653, -0.36783292, -0.1593625 , -0.00612681, -0.02159892,
0.23910808, -0.0372705 , -0.09220093, 0.02165323, -0.11853159,
0.10044386, -0.19909785, 0.07842302, 0.10094837, -0.02818856,
0.28365203, -0.00366906, 0.21658786, 0.07471534, -0.0130536 ,
-0.20419061, -0.08081454, -0.1592173 , 0.02630263, 0.15241717])), (1.0136313723660826, arra
y([ 0.0047057 , 0.00537221, -0.00935277, 0.02499465, 0.01681789,
-0.00869954, -0.01181598, 0.00597274, 0.00296281, 0.05173311,
-0.02052558, 0.02072112, 0.01177928, -0.05292222, -0.06871935,
0.04976477, 0.04398419, 0.05734055, 0.28848477, -0.02145317,
-0.02022251, 0.03238485, 0.0190972 , 0.06581662, 0.02599537,
-0.04334498, -0.02015788, -0.01891913, -0.00523579, -0.00176664,
0.00598669, 0.01938069, 0.05690915, 0.25457602, 0.00047613,
-0.43171399, -0.04496656, -0.15158992, -0.0630216 , 0.01819853,
0.08063392, -0.05862379, 0.01054687, 0.15530031, -0.0435065 ,
-0.11169971, 0.00510733, -0.11902368, -0.03422134, -0.00074141,
0.02469103, -0.08633909, -0.00487503, 0.00313326, 0.03130132,
-0.00756799, -0.09661252, 0.04985106, 0.01110038, -0.0254607 ,
0.0017376 , 0.01130168, -0.05056147, 0.16722782, 0.00277896,
0.11068128, -0.00414049, -0.33462666, 0.0250716 , 0.44450508,
0.13689589, -0.1373213 , -0.01468817, -0.07279611, 0.0601332 ,
-0.1086719 , -0.10858592, -0.05833797, 0.07717707, -0.12419186,
-0.09329828, 0.13223833, 0.03631919, -0.06444136, 0.00906957,
-0.00439056, -0.04548441, 0.17891649, 0.03655766, -0.03796676])), (1.0118982712943438, arra
y([ 0.00694201, -0.0018722 , -0.00281566, 0.01770142, -0.0496463 ,
0.02999209, -0.00662784, -0.00043536, -0.00665007, -0.04269924,
0.01364174, 0.0757743 , -0.08204845, -0.03454817, 0.116922 ,
-0.07129044, -0.08191329, -0.15735237, 0.02548282, 0.28429731,
-0.17313292, 0.00367994, 0.01073114, -0.12907741, 0.05887494,
-0.09817751, 0.04700288, -0.04577495, 0.10941022, -0.01155122,
-0.01163151, -0.03089187, -0.18285806, -0.0729333 , 0.15773669,
0.25080006, -0.00395445, 0.17083143, -0.00613639, -0.07731857,
-0.16149356, 0.00252433, 0.04088276, -0.09682253, -0.06168423,
0.02003816, -0.00543857, 0.15537622, -0.00710586, -0.02816829,
-0.00369263, 0.15853635, -0.02918925, 0.0329961 , -0.05282505,
0.0276843 , 0.03692497, -0.0533023 , -0.00063083, 0.01558268,
0.02200682, -0.03344913, 0.06746109, -0.0555776 , 0.01370926,
-0.11782188, -0.07465837, -0.33355448, -0.07237854, 0.27166404,
-0.16659519, -0.0669088 , -0.07566776, 0.051166 , 0.26762813,
-0.06626345, 0.10173895, -0.15699645, -0.11477982, 0.02515217,
-0.21230598, 0.16731697, 0.12222344, 0.04058821, 0.02654391,
-0.14220437, -0.00807149, 0.07995315, 0.02476597, 0.07745563])), (1.0092805032957575, arra
y([-1.67124034e-04, -8.52995967e-03, 2.43243338e-03, -4.55422356e-03,
-4.47697903e-02, 4.11149142e-02, 4.35385290e-03, -8.08236351e-03,
-2.23270588e-02, 4.54276255e-02, 2.24447699e-02, 3.72905391e-02,
-4.58866899e-02, -4.43140590e-02, -3.17861262e-03, 1.63223223e-02,
-1.44613585e-02, -1.95174729e-03, 4.26371234e-02, 5.62827516e-02,
9.19523422e-02, 3.32038518e-03, -2.69015123e-02, 6.20466908e-02,
1.08340927e-01, -8.06021310e-03, -8.97465505e-03, 1.76064776e-02,
3.46254356e-02, -1.46048653e-01, 5.39475564e-02, -3.71279001e-02,
-1.13952238e-02, 2.91946391e-02, -4.12869437e-02, -2.42218191e-01,
-2.60603425e-02, 2.30730290e-01, -1.20611776e-02, -1.97548654e-02,
8.48527712e-02, 4.62356706e-02, 3.98832642e-02, 3.56493600e-02,
-1.63797314e-01, 1.28019247e-01, 5.54600893e-07, -2.04307386e-02,
5.26514477e-02, 3.96836522e-02, -3.91024288e-02, 4.98853407e-02,
-4.92069634e-03, 1.52943052e-02, -3.13390345e-02, -2.35876291e-02,
-4.92749417e-02, 2.72170140e-03, -8.76600831e-03, 1.43572974e-02,
-6.00816508e-03, 1.31966835e-02, 2.88326136e-02, 5.78179715e-02,
8.31240235e-03, -1.10802895e-01, -4.18330250e-01, 5.32012386e-01,
8.97901621e-02, 1.36166215e-01, -2.68026171e-02, -2.12739925e-01,
-1.58496722e-01, 7.91803374e-02, -5.45815088e-02, -5.92837221e-02,
6.01606163e-02, -1.03465752e-01, -9.80008579e-03, -4.96013262e-03,
-9.50121718e-02, 1.54394759e-01, 1.14727370e-01, 4.03985262e-02,
1.69140520e-02, -1.89851376e-01, 1.36970462e-01, -1.95413367e-01,
2.32617269e-02, -6.62692406e-02])), (1.0062251615105764, array([-0.00527282, -0.01688188, -0
.00468497, -0.00153288, 0.01518627,
-0.02070147, 0.01180818, -0.01694398, 0.00687768, -0.02536273,
0.02915883, -0.0167478 , 0.01175675, 0.00314778, -0.07007524,
0.01965608, 0.05609278, 0.0854933 , -0.30186512, 0.197778 ,
-0.0217644 , 0.05291083, -0.04169708, -0.32132285, 0.11409975,
0.04440943, -0.14455571, 0.03647878, -0.01699231, 0.07447113,
-0.00450323, -0.00360366, -0.01549301, -0.02353203, -0.0314328 ,
0.0648022 , 0.01905337, 0.16487256, 0.01712573, 0.02184085,
0.014108 , -0.04253202, 0.02446466, -0.06003886, 0.00268156,
0.05985967, -0.01170449, -0.06945482, -0.00672605, 0.007663 ,
-0.02848922, 0.05508542, -0.02811783, -0.05140486, 0.1162407 ,
-0.01263096, 0.0140447 , -0.0037905 , 0.010043 , -0.03004912,
0.02598565, 0.01165745, -0.00817854, -0.09190378, 0.024504 ,
0.06228976, 0.29323598, 0.11533381, -0.11551801, 0.29810874,
0.20107726, -0.17000041, 0.20522724, -0.05589527, -0.43514698,
-0.00324024, -0.10267147, -0.01583151, 0.04671075, -0.05257544,
-0.15347668, 0.09413035, -0.12768923, -0.01715487, -0.0009201 ,
-0.06927188, 0.16380177, 0.10717406, 0.00399982, 0.02845797])), (1.0055267813795965, arra
y([ 0.00554142, 0.00457078, -0.01396835, 0.03459154, -0.02500141,
-0.01279514, 0.03819387, 0.00537668, 0.01086488, -0.03214228,
-0.01963652, 0.04784108, -0.02883686, 0.03075782, -0.09864469,
0.02593862, 0.04063425, 0.12849444, 0.00061827, -0.37222894,
0.23468193, 0.0169766 , -0.01899214, -0.15736419, 0.03104597,
-0.08805389, 0.06859634, 0.02762659, 0.06709668, -0.02284565,
-0.05722747, -0.08353022, -0.03362415, 0.0023764 , 0.06809688,
0.01539331, -0.00184929, -0.00796171, -0.01225957, 0.00114309,
-0.04512352, 0.06240917, 0.00397843, 0.02635132, -0.07712403,
0.40446577, -0.00588481, -0.0807644 , 0.13050743, 0.02701916,
-0.04189661, -0.09141902, 0.01572212, 0.06521795, -0.10411642,
-0.02482952, 0.0830768 , -0.0319343 , -0.01365172, 0.00232728,
0.0491055 , -0.02366732, -0.04097945, -0.05273117, 0.02251404,
-0.08276872, -0.03561784, -0.23494425, -0.04202028, 0.13276423,
-0.22751433, 0.326788 , -0.11771476, 0.02225007, -0.22325178,
-0.01935321, 0.04904005, 0.13896845, 0.08632332, -0.0863012 ,
0.07668643, 0.03075034, 0.09717638, 0.07397294, 0.04806132,
-0.22366605, 0.19813719, 0.08417379, 0.04081818, -0.01479638])), (1.0041959740870794, arra
y([ 0.00822194, -0.02209017, 0.00179947, 0.01200197, -0.02659481,
-0.00764602, -0.01011892, -0.01927093, 0.00778091, -0.05763482,
-0.00875465, 0.04845326, -0.01277327, -0.02985779, -0.04904464,
0.04505036, 0.03809419, 0.02666303, -0.06238701, 0.1059027 ,
-0.24762375, 0.04243724, 0.01488936, -0.1477348 , -0.05574035,
-0.02966242, -0.02771328, 0.14853575, -0.01779685, 0.04024805,
-0.04920764, -0.03116516, -0.03367022, 0.14545037, 0.04058069,
-0.29717955, -0.00541922, 0.34204901, -0.03090687, -0.01240487,
0.05571747, 0.01055122, 0.03993054, 0.01027291, -0.02747238,
-0.24191898, -0.01100649, -0.16332822, 0.04035829, 0.12041442,
-0.03116228, 0.07143561, -0.01542356, 0.00947767, -0.01068069,
0.04649149, 0.06563955, -0.02545183, 0.00819118, -0.02578114,
0.02388872, -0.00179482, -0.00264004, 0.01443597, 0.02018351,
-0.00036709, 0.17338862, 0.02049706, -0.2299085 , -0.05434517,
-0.12314334, 0.36342366, -0.02171211, -0.07421302, 0.22051444,
0.02311083, -0.09116037, 0.04618421, 0.03368351, -0.12647943,
0.2145283 , 0.09262533, 0.02206381, 0.01459231, 0.00105971,
-0.01764596, 0.11883972, -0.33359707, 0.0272489 , 0.03302657])), (1.0019590929564808, arra
y([ 1.23686212e-02, -5.74858375e-03, 1.28667027e-02, 3.18889545e-04,
-1.11497815e-02, -7.92571051e-03, -1.29181317e-02, -3.05287058e-03,
-6.21565856e-03, 3.37501736e-02, -5.36172042e-02, 4.12248652e-02,
-2.63260342e-02, 2.97645063e-03, 6.91072991e-02, 8.48349472e-03,
-3.65125316e-02, -9.07608200e-02, -5.00614804e-02, 3.02119105e-01,
5.47601346e-01, 1.22451603e-03, -2.54262020e-02, -2.63624794e-01,
1.47799419e-01, -8.11331299e-03, -8.87887105e-02, -3.65799921e-02,
-2.21336504e-02, 2.27787137e-02, 5.50243628e-02, 1.89375610e-02,
1.02874548e-02, 8.69723865e-02, -5.44377833e-02, -9.23134139e-02,
-6.87575836e-03, 8.50482063e-02, -2.91144402e-02, -5.19158070e-04,
2.72972991e-02, 3.06226585e-02, 1.67418376e-02, -2.18970537e-02,
-4.24693396e-02, -2.24557075e-01, 1.05633996e-02, -1.17081779e-02,
-8.54985571e-02, -5.48092323e-02, 2.88630646e-02, 5.09284942e-02,
-8.70277980e-03, -4.38590457e-02, 6.95909530e-02, -1.31847092e-02,
-1.28531482e-02, 5.13641392e-02, -4.43967172e-02, 1.39497982e-02,
-2.62894441e-03, 4.11256283e-03, 1.50977741e-02, 1.02750976e-01,
5.59174440e-03, 2.15061961e-02, -3.03984631e-01, -1.64982643e-01,
-2.21324250e-02, -3.01645261e-01, 3.32622827e-02, 1.78737544e-01,
4.47817036e-02, -1.86339019e-02, -1.31331866e-02, 1.70534610e-02,
7.72245188e-03, 1.52724696e-01, -5.91228172e-02, -2.68464737e-02,
6.80235221e-03, 1.28991505e-01, -8.18517388e-02, -4.24656582e-02,
-1.03033094e-02, -2.74849429e-02, 5.25201610e-03, 2.37418675e-01,
9.48423952e-03, -5.23983741e-02])), (1.0005162465340969, array([ 3.04267717e-03, -1.29479763
e-02, -4.44458112e-03, 1.33631972e-02,
-9.33292211e-03, 1.54291205e-02, -4.73800601e-02, -1.16362669e-02,
3.04550598e-03, -3.17009338e-02, -3.74833233e-03, 3.23998573e-02,
5.86232357e-03, -7.50056174e-02, -1.70478255e-03, -9.50892730e-04,
8.94366331e-03, -2.46116099e-02, 4.01358936e-01, -1.37224653e-01,
-2.78080056e-02, 3.66459641e-02, 5.88338432e-02, -4.73700705e-01,
-7.82532804e-02, -3.20303058e-02, 6.81517973e-03, -4.04013209e-02,
3.08022972e-02, 1.42444360e-01, -4.70847004e-02, 3.52799485e-02,
-2.20051786e-01, 3.64519976e-01, -1.65562998e-01, 1.86948483e-01,
2.49965282e-02, 4.49718315e-04, -1.04893408e-01, -2.70135651e-02,
3.39406504e-02, -4.57270480e-02, -3.20001865e-02, -9.09485457e-02,
3.93327053e-02, 1.91371073e-01, -3.04620513e-03, -1.61414711e-02,
-1.02539124e-01, 1.02693088e-01, 4.55301940e-02, -5.51003928e-02,
3.75783992e-02, -1.67074257e-02, -2.24419941e-02, 4.82987324e-02,
5.09746900e-02, -2.54777753e-02, 6.10867595e-03, -7.68573197e-03,
-5.37906032e-02, 1.04949406e-01, -9.46918350e-03, 2.55699582e-02,
3.51676513e-03, 1.13913878e-01, -5.14912042e-02, 1.92816611e-01,
-2.01767844e-02, -1.29106051e-01, 5.41321727e-02, -8.14963631e-04,
9.72048527e-02, -2.91598197e-02, 6.02243306e-02, 4.79375887e-02,
3.04488822e-02, -2.10908861e-01, 1.83075881e-02, -1.12247174e-01,
-1.22270980e-01, -6.68043476e-02, 4.66913562e-02, -4.16616129e-02,
-3.00273118e-02, 9.55748449e-02, -7.54848923e-02, -2.59008996e-02,
2.91027290e-03, 4.74474999e-02])), (0.9936805751250474, array([-0.00764089, -0.00593278, 0
.02204877, -0.05256475, 0.00842695,
-0.01501379, 0.02522124, -0.00706479, 0.01803719, -0.02255416,
-0.02159876, -0.04619848, 0.0606624 , 0.06655174, -0.13064015,
0.04125599, 0.07308245, 0.14042212, -0.19026609, -0.08782066,
0.22890378, -0.02309647, 0.02434943, -0.18001576, -0.27168967,
0.16065591, 0.03658186, 0.08846365, -0.06826107, -0.06298364,
0.04706691, 0.03390692, -0.01736851, -0.09409377, 0.05434106,
0.12652597, -0.0080583 , 0.15550196, 0.03400027, 0.02964071,
-0.0185398 , 0.05566647, -0.07869831, 0.049827 , 0.15379865,
-0.15366605, -0.00384803, -0.12618657, 0.0222475 , 0.05088107,
-0.0265037 , -0.06968476, 0.04039596, -0.02097603, -0.0149297 ,
-0.06291175, 0.02400028, -0.0155054 , 0.03367925, -0.0381606 ,
0.03294177, -0.00982517, -0.04136651, -0.08259673, 0.01412005,
0.11410254, -0.23336836, -0.22037642, 0.33824349, 0.16398198,
-0.03704472, -0.17364007, 0.08558085, -0.01478625, 0.20608258,
0.05323019, 0.00586049, -0.08892334, 0.0996065 , -0.07159233,
-0.11680752, -0.08930236, 0.01112726, -0.05935329, -0.01780898,
0.14924434, 0.15771004, -0.2990459 , -0.06971069, -0.00463926])), (0.9932619924220245, arra
y([-4.80983034e-03, -8.33491749e-03, 1.43478570e-02, -3.39065901e-02,
8.83024216e-03, 7.12824218e-03, 4.75620170e-03, -8.78881996e-03,
1.12222071e-02, -5.61743165e-04, 6.82925646e-02, -5.90986511e-02,
1.59812288e-02, 2.54101713e-03, -1.04896277e-01, 3.51737986e-02,
6.67716613e-02, 1.29244810e-01, 2.98451160e-01, 7.20075288e-02,
1.32998042e-01, 8.42103921e-03, -5.60365919e-02, 2.59706507e-01,
1.64749395e-03, 1.48056990e-01, -8.53023881e-02, 4.35266088e-02,
-3.46030524e-02, -8.41578541e-02, 7.04725333e-02, 2.81031391e-02,
-6.20476632e-02, -1.09682870e-01, -3.49301258e-03, 2.10854352e-01,
5.52984391e-03, -2.87410233e-02, 3.55661568e-02, 4.25020090e-02,
9.28653091e-03, 9.60221857e-03, -1.80697323e-03, -4.10870665e-02,
2.65839708e-02, 1.96398351e-02, -1.76642116e-03, -5.26396067e-02,
-4.08825297e-05, -1.47565964e-02, -3.35712506e-02, 1.11012873e-01,
-2.84716492e-02, -8.03036754e-02, 1.45558330e-01, -9.02451607e-03,
1.73580715e-02, -5.84138883e-02, 7.40185841e-02, -4.79314092e-02,
-1.53805564e-02, 2.60951831e-02, 2.65729607e-02, -5.54298295e-02,
2.61356655e-03, -7.91097579e-02, -1.44960160e-01, 4.12691982e-02,
-2.39118825e-01, 2.14212567e-01, 1.71250062e-01, 4.78605901e-01,
1.71255685e-01, 6.99230903e-02, -9.17598680e-03, -9.36932301e-02,
-5.15256326e-02, -2.10854344e-01, 4.20936287e-02, 1.67529425e-01,
-8.86265705e-02, 1.31353716e-01, -9.90303711e-02, 2.10599781e-03,
-4.12338886e-02, 1.80605862e-01, 4.81498794e-02, -9.01894399e-02,
-3.02906660e-02, -9.57214123e-02])), (0.989390243013148, array([ 0.0027574 , 0.0061329 , -0.
02627846, 0.05095417, -0.00642473,
0.00568102, 0.02281071, 0.00631694, 0.00755118, -0.06691076,
0.06289873, 0.02585864, -0.03773079, -0.02254831, -0.16308598,
0.04149849, 0.08420201, 0.19496387, 0.39379716, 0.17782559,
0.11539291, 0.0197762 , -0.02419242, -0.05530074, 0.06499963,
-0.09218429, 0.01384134, 0.05434109, 0.07868315, 0.01245479,
-0.04835957, -0.06038021, -0.16823152, -0.02203941, 0.22528898,
0.04443168, 0.00703952, 0.03605971, -0.01016621, -0.03127999,
-0.11144527, -0.02134988, 0.04999845, -0.07390443, 0.00045527,
-0.29444954, -0.00354035, -0.11068436, 0.14758104, -0.02428335,
-0.01217597, -0.16673344, -0.03072165, 0.04867053, 0.02903683,
0.02513363, -0.04544139, 0.04575122, -0.04047531, 0.01634853,
0.04759149, -0.06510098, 0.02291705, -0.20123276, 0.00801083,
-0.26857335, 0.18454691, 0.05340024, 0.24677043, -0.10037036,
-0.09368322, -0.18139713, -0.02268909, -0.03296994, -0.1736654 ,
-0.070339 , 0.02794478, -0.01026421, 0.10866061, 0.16523264,
0.21315456, 0.01944449, 0.07190493, 0.07717728, 0.03061382,
0.07312778, -0.00948213, 0.03159338, 0.06226249, -0.01358497])), (0.9869860806882016, arra
y([-2.43715809e-02, -1.07128953e-02, -2.79427241e-02, 3.89261512e-03,
3.60937407e-02, 8.57047187e-03, -1.08440482e-02, -1.47852771e-02,
2.35617945e-02, -5.52860667e-02, 8.90308989e-02, -3.95984242e-02,
-9.60278839e-03, -1.10836591e-02, 6.15373627e-02, -8.70643248e-02,
-2.15993332e-02, -3.64849330e-02, -6.41388458e-02, -5.37994962e-02,
1.01884145e-01, 1.92638366e-02, -2.82527603e-02, -1.53857810e-01,
-1.07907271e-01, 6.36331411e-02, 9.89305735e-02, -1.33425549e-01,
2.25705919e-02, 1.88958506e-01, -1.01889068e-01, 7.94116198e-02,
-1.12291832e-01, -9.18573887e-02, 2.79692392e-01, -3.16952567e-01,
2.27300392e-03, 1.33228325e-01, 3.25689301e-02, -3.09505476e-02,
-5.68545611e-02, -2.02397441e-01, -6.35443881e-03, 6.98686485e-02,
8.01945161e-02, 9.92512023e-03, 1.27386606e-02, 2.51450191e-01,
-3.89108234e-02, -6.28715652e-02, 2.70404571e-02, -2.01667306e-02,
-3.59301358e-02, 1.05886343e-01, -1.00517967e-01, 5.56644863e-02,
-4.85317142e-02, 1.73674110e-03, 2.56392918e-02, -2.74969992e-02,
3.33287649e-02, -3.81662601e-02, 1.12145451e-02, -2.12696445e-01,
6.24958136e-03, 5.91309092e-04, -1.93620042e-01, 1.22308284e-01,
-8.50767120e-02, 2.07259697e-01, 2.06397007e-01, 1.49853779e-01,
-3.93533862e-02, -6.78712515e-03, -1.85078872e-02, 2.14694222e-03,
-1.44830363e-01, 8.07264254e-03, -3.51936040e-02, 1.82751221e-01,
9.46179819e-02, -3.26090474e-01, 6.50092847e-02, 3.40593128e-02,
4.38808117e-05, 1.04720971e-01, -1.48980575e-01, 8.79513418e-02,
4.67517524e-03, 1.49568000e-01])), (0.9828875619572957, array([ 0.01595755, -0.01989033, 0
.04337725, -0.04633459, 0.02648134,
-0.04697169, -0.0191114 , -0.015711 , 0.01704849, -0.01540329,
-0.06637341, 0.02710851, 0.04839863, -0.10418241, 0.16225215,
-0.01776017, -0.09851717, -0.18995589, 0.29041107, -0.06236597,
0.22727756, 0.00658669, -0.01409841, -0.14199334, -0.0466045 ,
0.13775554, -0.13467361, 0.13090866, -0.19777716, 0.10239725,
-0.00367906, -0.02341806, 0.30707283, -0.21449997, -0.08920904,
0.02326865, 0.06790668, 0.11017155, 0.06531824, 0.06059489,
0.01953891, 0.04596282, 0.09166934, -0.16995796, -0.01987271,
0.16604913, 0.03892869, 0.02906565, -0.04459147, -0.01199517,
0.03124182, -0.06696273, -0.02876974, 0.02837694, 0.02627329,
0.02119666, -0.11566776, 0.11415444, -0.01093605, -0.05076346,
0.02776756, -0.04904545, 0.0584223 , 0.11361133, 0.01445449,
-0.17196192, 0.24766112, -0.00709194, 0.07229663, 0.18348122,
0.05605101, -0.05606634, 0.01176013, -0.08243724, 0.17129387,
0.08871317, -0.12892466, 0.10690118, -0.15630681, 0.12557654,
0.12195274, 0.0135422 , 0.0194999 , -0.05461132, 0.01772277,
-0.16588352, -0.04162553, -0.18084166, -0.03946636, 0.00890658])), (0.9812541978117575, arra
y([-1.58853556e-03, -2.77244697e-04, 1.30495900e-02, -2.57070737e-02,
-2.38716681e-02, 1.76918901e-03, -1.73442425e-02, -5.66356465e-04,
-2.92004497e-02, 1.10917867e-01, -6.27488840e-02, -1.78243115e-02,
2.15038257e-02, 6.41694035e-02, -1.85885073e-02, -3.34623753e-02,
4.91507234e-03, 1.85513793e-03, 1.99405167e-01, -4.64871501e-01,
-1.81224305e-01, 3.78104386e-02, -1.92021780e-03, -1.00559003e-01,
5.91510263e-02, 1.22049864e-01, -1.62150573e-01, -3.66253270e-02,
3.51101906e-03, -5.65836709e-02, 1.34142652e-01, -1.98874144e-02,
9.32473244e-03, -1.65789870e-01, 3.90059070e-02, 8.92804163e-02,
-7.03717091e-02, 1.86738236e-01, 2.48520537e-02, -8.63464230e-04,
-3.74401876e-02, -7.57339151e-03, -2.88428766e-02, 1.22443436e-01,
2.08410599e-02, -3.60085882e-01, 1.84958173e-02, 4.52949299e-02,
8.45531569e-02, -6.45286314e-02, -9.57236707e-03, 9.57283766e-02,
4.89745997e-02, -3.24114570e-02, -6.36907782e-02, -1.65291836e-02,
-8.62304468e-03, -1.83196203e-03, -3.28560945e-02, 4.24690729e-02,
2.89588908e-02, -4.45793674e-02, -3.55242576e-02, -2.97437416e-03,
-3.18511701e-03, -7.89019068e-02, -1.62655658e-01, -9.17582531e-02,
-2.82301267e-01, -1.33934778e-01, 2.30107909e-01, -1.45985964e-01,
8.49375746e-03, -1.38999842e-01, -1.13966250e-01, 1.39946369e-01,
1.02468432e-01, 6.37333277e-02, 3.10513608e-02, 1.30701912e-02,
-5.85314632e-02, 9.30604487e-02, 2.89469803e-02, 2.54664458e-02,
1.47083627e-02, -1.94198651e-01, 2.29775840e-02, 4.11200067e-02,
-2.41273580e-02, 9.62378425e-02])), (0.9698086941074693, array([-0.01071378, 0.00486143, -0
.00196125, -0.01630911, -0.02259151,
0.01407496, 0.00847955, 0.00253303, -0.0309036 , 0.0140293 ,
0.03300172, 0.01524013, -0.05273072, -0.04567607, 0.23391132,
-0.07848098, -0.12769542, -0.21407682, 0.07116596, -0.03609987,
-0.10502449, 0.04005191, 0.05065142, 0.17250256, -0.05749271,
0.03372211, -0.13167271, 0.08150494, -0.04870796, 0.09539604,
0.05417698, 0.05531372, -0.17647055, 0.11591562, -0.06368605,
-0.09531928, 0.03512564, 0.09801424, -0.02272997, -0.03956482,
0.0484233 , 0.06916443, 0.01819234, -0.04563155, -0.08722159,
-0.07701787, -0.02717661, 0.26806801, 0.00531044, 0.04114546,
-0.09148475, -0.42609818, 0.08854235, -0.09061132, 0.12185133,
-0.04883606, 0.13125812, -0.09962217, 0.04209341, -0.00978047,
0.00611318, 0.01726034, -0.02390289, -0.15085736, 0.00502519,
0.05113877, -0.09804292, -0.19753755, 0.08767721, -0.01680878,
-0.09742722, 0.0012008 , 0.14237304, 0.16035753, -0.31354836,
0.09154538, -0.13866588, -0.00058064, -0.14211276, 0.03044497,
0.00991763, 0.05382957, -0.02379609, 0.01326394, 0.00805447,
0.04422099, 0.19718486, -0.13166665, -0.00919805, -0.05891948])), (0.9529273395960557, arra
y([ 1.01680239e-02, -1.88980519e-02, 2.99406811e-02, -3.35410551e-02,
2.60871214e-02, 1.87005328e-04, -4.50241781e-03, -1.58850270e-02,
1.78052925e-02, 3.46440672e-02, 3.00147401e-04, -8.67904552e-03,
2.08497368e-02, -4.51580137e-02, 4.06369697e-02, -2.56313711e-02,
-2.62486361e-02, 6.18857441e-02, -1.22065618e-01, 2.46349337e-02,
1.09344651e-01, 1.82070441e-01, 7.71152985e-03, 1.22815065e-02,
-1.33315954e-01, -6.85951998e-03, 1.24658808e-01, -6.65173562e-02,
1.06037135e-02, -6.65605576e-02, 8.48243247e-02, -1.20348568e-01,
6.09639452e-02, 8.70517651e-02, -3.45482804e-02, 2.23037440e-01,
-3.14108802e-02, 1.42127354e-02, -1.78074199e-02, 8.72644635e-03,
-6.32716232e-02, -3.20996837e-02, -1.54585621e-02, 6.23968567e-02,
6.59357152e-02, -1.16186844e-01, -1.81810459e-02, 1.96731998e-01,
4.51921052e-02, -1.51162617e-01, -2.85625471e-02, -3.77574702e-01,
-5.11642449e-02, 1.12291336e-01, 2.85677301e-02, 1.15832788e-01,
-2.29778840e-01, 7.90781789e-02, 1.51605150e-02, -2.54163196e-02,
-5.69090347e-02, 1.06857705e-01, -4.99492608e-02, 1.83663474e-01,
-1.06629346e-03, -6.12071463e-02, -7.51621854e-03, 1.39727334e-01,
-3.61690448e-01, 1.05352784e-01, -1.24486885e-01, -8.92559677e-02,
6.51931348e-02, -1.23212333e-01, 1.09853846e-01, -1.25294871e-01,
1.05331636e-02, -1.14362626e-01, -2.24907114e-03, -1.45530433e-01,
2.37937775e-01, -2.62807130e-02, 8.51306367e-02, 2.80731422e-02,
1.80940813e-02, 5.39997622e-02, 2.34065029e-01, 1.21637193e-01,
-4.51543427e-02, -1.09308849e-01])), (0.949801385112714, array([ 0.026247 , -0.00047457, 0.
01750291, 0.01778684, 0.02567284,
-0.04074609, 0.01524609, 0.00458663, 0.0247263 , -0.08198198,
0.03632847, -0.04948209, -0.0359562 , 0.14245157, 0.05160305,
-0.03531067, -0.02601146, 0.05392896, -0.08556124, -0.04018908,
-0.02069455, -0.01690763, -0.01912764, -0.04071749, -0.08931593,
0.0863003 , 0.00435261, 0.01343055, 0.04213723, -0.26182683,
0.12130055, 0.03108943, 0.02772986, 0.27414521, -0.17068434,
-0.04992821, 0.05622691, -0.08057179, -0.06151518, -0.10373174,
0.06792272, 0.13515434, 0.07802657, -0.37489407, 0.02056473,
-0.09680857, 0.00834453, 0.1153062 , 0.14089884, -0.18118251,
0.01663615, 0.06270688, -0.06002869, 0.12750963, -0.12121124,
-0.05682705, 0.13333067, -0.04704703, -0.01074914, 0.01353065,
0.04337889, -0.06263388, 0.00747122, -0.14375484, 0.00436615,
-0.2299231 , 0.04541104, -0.02688675, 0.13416425, 0.05508127,
0.37018287, 0.0655413 , -0.00761955, 0.08851763, 0.1076094 ,
-0.03038559, -0.00648776, -0.07013712, -0.03279395, -0.17809777,
0.10159005, -0.10103883, -0.04794133, 0.20119432, 0.03467796,
-0.1360069 , 0.00991476, 0.06485029, 0.01914473, -0.11337748])), (0.9442733887767248, arra
y([-0.00331595, 0.01580543, -0.01863078, 0.02641719, -0.01756419,
0.00863267, -0.05255544, 0.01428063, -0.03098377, 0.04713682,
-0.01223649, -0.0234114 , 0.01673431, 0.04137828, -0.04917881,
0.02671128, 0.03935362, -0.02912825, -0.15252448, -0.16400054,
0.0269342 , 0.01335285, 0.0356738 , -0.06873641, 0.03472405,
-0.08950344, -0.01980065, 0.07594835, 0.0620901 , -0.34044703,
0.22865382, 0.07859782, -0.07444977, -0.1105835 , 0.18634094,
-0.09233799, 0.00439686, 0.12022529, 0.00175177, -0.03072499,
-0.07645899, 0.05477025, 0.03649804, -0.21479675, -0.08708492,
0.15125139, 0.04209627, 0.00873448, -0.30818838, 0.252356 ,
0.07504972, -0.12498248, 0.011856 , -0.03461129, 0.07415387,
0.08630299, -0.09882293, -0.0011802 , 0.02215717, -0.01518408,
0.03271371, -0.03960954, -0.00258443, 0.0248585 , 0.00954393,
0.06160335, 0.02599234, -0.10098387, -0.06614956, -0.08452039,
0.09361263, -0.07846221, 0.06104535, 0.08054712, -0.04221732,
-0.12005225, 0.01158872, -0.22989874, 0.04004718, 0.02458765,
0.2916282 , 0.17946123, 0.03770972, -0.12429102, -0.01816835,
0.07699743, -0.30798999, -0.01072195, 0.02391488, -0.03918125])), (0.9416943150113966, arra
y([ 3.61742058e-03, -1.94654621e-02, -1.86145277e-02, 3.91588971e-02,
1.38525227e-02, 4.44739728e-03, 9.45754087e-04, -1.76770213e-02,
2.02461721e-02, -1.61417044e-01, 6.29136317e-02, 5.17455870e-02,
-2.53843405e-02, -1.11295932e-01, 1.30763281e-01, -3.64336748e-02,
-8.15317828e-02, -8.43985908e-02, -1.27475240e-01, -2.00239008e-01,
1.98557819e-01, -7.27339155e-02, -1.16408967e-02, 1.64224903e-01,
-1.59373307e-02, -1.04195288e-01, -2.05862256e-02, 1.60821247e-01,
3.37202388e-03, 1.16542567e-01, 1.55131159e-01, -1.04539639e-01,
-2.05624726e-01, 3.19145268e-02, -2.09991272e-01, 1.19720329e-01,
-7.45497298e-02, 1.99388651e-01, -4.70761420e-02, 1.88768117e-02,
9.38880282e-02, -9.86437441e-02, 6.49199923e-02, 1.32814904e-01,
-9.76731607e-02, -1.04217562e-01, 2.45565669e-02, 3.85863854e-03,
-2.37514855e-02, 2.17459287e-02, 1.12076101e-02, 2.49809124e-01,
-3.18286457e-02, 2.87709220e-02, -6.44097575e-02, 1.09883045e-01,
-2.07807208e-01, 3.93896455e-02, 5.11527104e-02, -2.75265015e-02,
-2.81618559e-02, 4.95628121e-02, -1.27479090e-01, 1.18502408e-01,
-2.04002732e-02, -4.26501972e-02, 1.92286785e-01, -7.34691849e-02,
1.73694750e-01, 1.13387762e-04, 1.12026592e-01, 6.61066412e-02,
-2.07059118e-01, -6.88492163e-02, -1.43136942e-01, -8.95962379e-02,
-2.53968635e-02, -1.38401224e-01, -2.66401891e-02, -3.33911002e-02,
-5.20087744e-02, 1.54766613e-02, 1.63449322e-01, 1.96457648e-01,
4.19831632e-03, 2.46611286e-01, -8.13378520e-03, 5.43401026e-04,
7.06501601e-02, 8.24856486e-02])), (0.9337938560734337, array([-2.78666149e-04, 2.86153266
e-02, -4.40329619e-02, 7.63570923e-02,
3.35212441e-02, 1.20383512e-02, -1.94893707e-02, 2.69527911e-02,
-5.20199285e-03, 6.19190179e-02, 1.12686446e-02, 2.49743679e-03,
-7.47851523e-04, -6.50827399e-02, 6.04982626e-02, 3.50763726e-02,
-2.70478021e-02, -9.41447301e-02, -4.79984129e-02, 7.06010461e-02,
-1.57413633e-01, 1.78383556e-02, 4.43158321e-02, -1.81734767e-01,
3.99219004e-03, -1.22994888e-01, 4.80848553e-02, -1.08445254e-01,
1.47732168e-01, -1.72394451e-02, 1.07269968e-01, -2.70397426e-02,
-2.00465373e-02, -2.29905336e-01, -3.09686448e-01, 1.01529457e-01,
-2.86786596e-02, 4.32508641e-02, 2.90138653e-02, 1.79205933e-01,
2.04568838e-01, -5.35395804e-02, -8.76425041e-02, 2.18448161e-01,
6.14465619e-02, -8.95877060e-02, 5.63716438e-02, -1.66880716e-01,
3.88398200e-02, 6.64525839e-02, 1.02053052e-01, -2.19362265e-01,
-3.47961229e-02, 7.50291476e-02, 4.47602226e-03, -9.31343903e-02,
2.14805378e-01, -4.51143042e-02, -2.67282618e-02, 6.72241207e-03,
-1.36953903e-02, 6.76453603e-02, -3.86191907e-02, -9.30012005e-03,
1.40203816e-02, -1.22670598e-01, -1.20398887e-01, 1.60151086e-02,
1.16043517e-01, 2.22964993e-01, 2.96600305e-03, 1.32948538e-01,
3.60556747e-03, -1.11277075e-01, -5.62721267e-02, -6.35223718e-02,
1.17300309e-01, 5.81474467e-02, -8.29515328e-02, 5.69520136e-02,
1.58439491e-01, -1.20199487e-02, -1.07114984e-01, 1.06543666e-01,
2.92226824e-02, -8.56989836e-03, -3.23945528e-01, -1.02605111e-02,
6.87594084e-02, -2.13386880e-01])), (0.931041145818125, array([-0.00585469, 0.01812109, 0.
00792955, -0.02462631, -0.01451196,
0.02830446, -0.00879729, 0.01597911, -0.03214228, -0.02254815,
0.01669926, -0.02767143, -0.00122349, 0.04274492, 0.0346502 ,
-0.00490572, -0.03843456, -0.11066957, -0.02813814, -0.03271021,
0.14880528, 0.06438382, 0.01505054, -0.04131551, -0.01495056,
-0.0418485 , 0.04085931, 0.0137256 , -0.01961479, -0.05199247,
0.02304338, 0.34772615, -0.13803342, -0.13670866, -0.21230403,
-0.09764183, 0.04755794, -0.22810261, 0.04591375, 0.02773834,
0.20986703, -0.00934168, -0.00424333, -0.04100463, -0.00221949,
-0.05967108, 0.01744279, -0.23425454, 0.25326177, 0.02664948,
-0.0048028 , 0.0316391 , 0.01544314, -0.01943449, -0.00194862,
0.17549254, -0.25550851, 0.00912207, 0.06729272, -0.02334901,
-0.06605862, 0.08964901, 0.06782998, -0.46031615, -0.01514615,
0.03390418, 0.04328068, -0.02500638, -0.26223213, -0.02173067,
-0.15071761, -0.11753421, 0.00625882, 0.06036338, 0.0669714 ,
-0.01291813, -0.05330965, -0.02390962, -0.0820911 , -0.04149348,
-0.0594575 , 0.04543746, 0.15142232, -0.00184985, 0.00565584,
-0.07747515, -0.04673792, -0.03077443, -0.02645897, 0.02890448])), (0.9160124331703235, arra
y([-0.04512716, -0.0385672 , -0.0206561 , -0.04705618, -0.04212048,
-0.01466519, -0.01829115, -0.04505448, 0.01255338, -0.04392882,
0.12015178, -0.03515198, -0.08976007, 0.08473334, 0.0292259 ,
-0.0184341 , -0.03226551, 0.0674858 , -0.02706551, -0.11339415,
-0.01596481, 0.04625842, -0.11063261, 0.06854895, 0.03595167,
0.10528534, 0.12790198, -0.10999625, -0.00719051, 0.08049302,
0.02848249, -0.19192094, -0.00162064, -0.01302792, -0.11972141,
-0.12932582, 0.0461804 , 0.16420206, 0.02892848, 0.14381952,
0.13948706, -0.05823622, 0.10181279, -0.3449726 , -0.04151092,
-0.0870399 , 0.01424011, -0.26918623, 0.06644839, 0.05707586,
-0.04335126, -0.05749606, -0.02655391, -0.01128123, 0.08300746,
0.01291072, 0.0271534 , -0.04632249, -0.03392848, 0.03656581,
0.05312645, 0.03488761, -0.20652097, 0.20327968, 0.03210534,
0.03886553, -0.02695497, -0.04171325, -0.03426775, -0.00089415,
-0.21073789, -0.04495686, 0.19303697, 0.16995306, 0.06502183,
-0.02814482, 0.10027648, -0.1487149 , -0.01703184, 0.27576032,
-0.05856994, -0.2091439 , -0.05783323, 0.0506891 , -0.0645663 ,
-0.09810442, -0.01194532, 0.20861665, -0.05955877, 0.23499399])), (0.9017616031502895, arra
y([-0.01204613, 0.03157169, 0.00173901, -0.02522276, -0.02255628,
0.08860315, 0.00054061, 0.02748577, -0.03401337, -0.15531234,
0.11397098, -0.00209653, -0.10203533, 0.1247588 , -0.06968457,
0.01105041, 0.07674368, -0.11018815, -0.01025301, 0.03199334,
0.09125823, 0.05356348, 0.03618184, -0.15826721, -0.10560225,
0.01025809, 0.03069366, 0.02691185, 0.08197771, 0.00534303,
0.056395 , -0.36746288, 0.0181807 , -0.02410655, 0.01469366,
-0.08056341, 0.03822262, -0.25860032, -0.0505407 , -0.04810135,
-0.00116654, 0.05863885, -0.08239288, 0.0771162 , 0.03592515,
-0.09057108, 0.00783212, 0.02333246, 0.09206801, 0.16894537,
-0.00259428, 0.00753512, 0.06255998, -0.03827773, -0.0486697 ,
0.16976284, -0.03187038, -0.14708599, 0.06945362, 0.03166831,
0.02937489, -0.24219907, 0.36128429, 0.25402849, -0.01963118,
0.04604779, 0.03255906, -0.05974847, -0.16830735, -0.01446133,
0.04296761, -0.069433 , -0.01778528, 0.14103144, -0.13537139,
-0.02075788, -0.02933113, -0.03012077, -0.1153027 , 0.12267673,
-0.12071696, -0.09378854, -0.01146882, 0.1814344 , 0.06288938,
-0.0125343 , -0.05389026, -0.21635202, -0.03533361, -0.13628257])), (0.890644523647056, array
([ 9.98843162e-03, 1.83753484e-02, -2.15618355e-02, 5.60384562e-02,
1.90754629e-03, -4.93206812e-02, 3.44008833e-02, 1.92579847e-02,
5.36319810e-03, 1.61861064e-01, -1.27813948e-01, -7.75819723e-02,
2.87565785e-02, 2.07566454e-01, 2.24892437e-02, 1.65387794e-02,
-6.30551458e-02, 5.50758222e-02, 8.31593820e-02, 1.60433647e-01,
-3.29652843e-02, -2.71413980e-03, -5.05083252e-03, -1.54274608e-02,
-2.22873819e-02, 3.65170615e-02, 2.51583619e-02, 5.74703905e-02,
1.47873828e-01, -1.60874936e-01, -3.53703355e-01, 4.66269914e-02,
-5.77386648e-02, 5.34731395e-02, -6.99392009e-02, -5.59667683e-03,
1.50569071e-02, 1.55346729e-01, 5.42385274e-02, 2.98574646e-01,
6.16475449e-02, 1.38453606e-01, -1.04225708e-01, 7.80865747e-02,
2.68362391e-02, 1.41388217e-01, -6.26077147e-05, -8.67876826e-02,
-1.64909818e-01, -1.17812824e-01, -3.00302056e-02, -2.76871244e-02,
1.37548277e-01, -3.01084262e-02, -1.82363579e-01, -4.33397819e-02,
-1.35010942e-01, 7.45199010e-02, 1.43697804e-04, -3.62370319e-03,
8.65683739e-02, -1.18813821e-01, -1.84208722e-01, -1.43973498e-03,
-1.65915130e-02, 8.35012256e-02, 4.85575694e-02, -1.41136110e-01,
-1.23841638e-01, -1.27825175e-01, 1.58779130e-01, -7.77355540e-02,
-1.07927145e-01, -9.25315807e-03, -4.69461310e-02, -8.77350934e-02,
-1.55584716e-02, -8.70564735e-02, -2.14612552e-01, 2.16155162e-01,
1.32878204e-02, -1.04393498e-01, 7.12127204e-02, 1.61393672e-01,
2.70368961e-02, 4.29979717e-02, 1.53933935e-01, -5.87435978e-02,
5.96815311e-02, -1.17995448e-01])), (0.8885676555317741, array([ 0.00274796, -0.02361445, 0
.05116237, -0.0842552 , -0.07213576,
0.00093838, -0.01098897, -0.02175948, -0.00658835, -0.15644416,
0.21212145, -0.08688107, -0.12772555, 0.19739457, 0.00065999,
-0.05611503, -0.09542822, -0.0790884 , 0.07532689, 0.04622811,
0.0199169 , 0.07703434, -0.12052577, 0.0575971 , 0.04508036,
0.1558777 , -0.00821302, 0.18890517, -0.03258032, -0.193194 ,
-0.25381167, 0.10040089, 0.02806504, 0.02635801, -0.03138048,
0.06964833, 0.02135525, 0.03181669, 0.13739875, -0.08807277,
0.02874702, -0.04365259, 0.11838292, 0.27886171, -0.20779585,
0.08171919, -0.00790738, -0.00744908, -0.0837399 , 0.20445817,
-0.02006053, -0.01835591, -0.07266937, -0.02869563, 0.17928846,
0.05813758, 0.0441099 , -0.07759448, -0.03301373, 0.08159966,
0.07785581, -0.16431664, 0.01670631, -0.0194537 , -0.01016957,
-0.06294145, -0.07168093, -0.00171829, 0.01843777, 0.00627979,
-0.04178049, -0.05313962, 0.17234462, -0.27315857, 0.04703748,
0.05148856, 0.05858007, 0.01267136, 0.0298947 , -0.26037273,
0.07794181, -0.12729389, -0.03197504, 0.21431478, 0.01807618,
0.02617634, -0.08542077, 0.11938408, -0.06302607, 0.11919231])), (0.8791588942184467, arra
y([-0.03117837, -0.01399989, -0.03340375, 0.00088924, 0.02028349,
-0.01951033, -0.03937196, -0.01919307, -0.02840681, 0.14604114,
-0.04894041, 0.00851742, 0.04925042, -0.10984964, -0.00350001,
0.05979627, 0.05638347, 0.05399111, -0.06529045, -0.07689977,
-0.1253334 , 0.04593203, -0.18111414, -0.11470011, 0.18498936,
0.15281119, 0.16061652, 0.03052356, -0.31618665, -0.00304363,
0.20344699, 0.11546958, -0.02264744, 0.10675221, 0.00771112,
0.05565993, -0.06029402, -0.07399017, -0.06291728, 0.01991941,
-0.03703263, -0.13888537, 0.06051865, -0.07519849, -0.16860843,
0.01692649, -0.02738467, -0.15388506, -0.15641782, -0.12933231,
-0.03753313, -0.08508395, -0.00065314, 0.04269448, -0.0422484 ,
-0.11940561, 0.03233509, 0.19252331, -0.12759107, 0.01688613,
0.05361439, -0.20145027, 0.26238912, -0.0368179 , -0.00501027,
-0.03522915, -0.08001625, -0.02695852, -0.07996612, 0.03635574,
-0.12670346, -0.03504763, -0.12295178, 0.02712415, -0.00411092,
-0.0300398 , -0.05640326, 0.10016602, -0.13261971, 0.04293086,
-0.05410893, 0.01850406, -0.05688299, 0.32093398, -0.04501955,
0.29384958, 0.01334824, -0.05077861, 0.02474069, 0.03883143])), (0.8629753367296573, arra
y([ 0.01166394, -0.04751393, 0.02226712, -0.01738976, -0.0372354 ,
0.00137357, -0.03142243, -0.04260486, 0.01422279, -0.1007145 ,
0.04963574, 0.04530055, -0.02641501, -0.04232224, -0.11693779,
-0.02743814, 0.15277544, 0.18836382, -0.03718525, -0.06302018,
0.02320708, -0.00930566, 0.04126506, 0.01878739, -0.04468988,
-0.06594267, -0.05484857, 0.04120591, 0.02869672, -0.01306566,
0.05889018, 0.05443091, -0.03439045, 0.02260685, 0.08644482,
-0.02284593, -0.18626504, -0.12175179, -0.12165186, 0.50096173,
-0.06641931, -0.01516339, 0.10534416, 0.01814603, -0.12160945,
-0.03934602, -0.0093016 , 0.14292649, 0.05077062, 0.00787034,
-0.01718144, 0.03468688, -0.03991159, 0.00653545, 0.04821644,
0.00913398, 0.06714187, -0.05698063, 0.0077161 , 0.0090317 ,
-0.03761309, 0.02592563, 0.14953846, -0.03722617, 0.00620756,
0.11938813, 0.00961706, 0.05972683, 0.13493806, -0.01354354,
0.01276413, 0.04573275, 0.08990901, -0.19106456, 0.04890457,
0.03858561, 0.05644118, -0.09668513, -0.548203 , -0.01880534,
0.07071005, 0.06407696, -0.00113584, -0.07058048, -0.02843993,
-0.13582121, 0.05628283, 0.03813165, -0.01339753, 0.13068332])), (0.8317611004894274, arra
y([ 0.00545156, -0.07008116, -0.03053225, 0.06334239, -0.00939906,
-0.05711216, 0.02420308, -0.06509471, 0.06767608, 0.16041921,
-0.06848737, 0.01600285, -0.01452638, 0.05829487, -0.00285141,
-0.0621527 , -0.03677003, -0.15963981, -0.03683498, -0.07947628,
-0.0100988 , -0.03931464, 0.10738339, -0.05091965, -0.08705204,
-0.1497019 , -0.1584345 , -0.05739527, 0.24238019, -0.00924365,
-0.01982462, 0.02188455, 0.02368897, -0.07832417, -0.02152104,
-0.02948498, 0.08748718, -0.13049489, 0.02790162, -0.20781542,
0.08399357, -0.12619883, 0.08092237, 0.03826677, -0.01464393,
0.00819209, -0.09482382, -0.14170098, -0.141553 , -0.26439855,
-0.08143666, -0.04212307, -0.09987588, 0.04378873, 0.11845842,
-0.10995148, -0.0611874 , -0.03675344, 0.10142608, -0.04646941,
-0.07001699, -0.03058676, 0.28216562, 0.02546009, -0.02403687,
0.12041728, -0.01333376, 0.00929337, 0.14548328, -0.04950643,
0.01893052, 0.06961991, 0.08997331, -0.16177626, 0.05114716,
-0.13466767, 0.02926301, -0.18830137, 0.10727718, 0.17320939,
0.11706974, 0.06495608, -0.08982689, 0.17787126, -0.04179447,
-0.10593689, 0.23824305, -0.06316331, 0.06327437, 0.28536944])), (0.8087658281973401, arra
y([-1.38037599e-02, -2.62538836e-02, -1.50064891e-02, 7.73353884e-04,
5.58215569e-03, 3.85305662e-02, 4.85734662e-02, -2.74253271e-02,
8.12131793e-03, -3.85170607e-02, -7.26463929e-03, 4.13665719e-02,
8.12804341e-03, -7.43367589e-02, 2.48403179e-02, 1.01314548e-01,
-5.67692097e-01, 3.38188062e-01, -1.75549813e-02, -3.81677187e-03,
1.21591794e-02, -1.34692259e-02, 8.01078060e-02, -2.71126200e-02,
-5.01367953e-02, -5.56049468e-02, -6.84804047e-03, -8.00401890e-02,
-1.29583624e-02, 7.48112337e-02, 5.96831837e-02, 3.49400558e-02,
3.59546531e-02, 1.17052541e-01, -3.22895551e-02, -1.20091688e-02,
-3.68001001e-01, -3.27377391e-02, 5.25519139e-01, -5.54732492e-02,
-9.28767847e-02, 2.24409625e-02, 1.66298363e-02, -4.90522909e-02,
-4.74724762e-02, -2.35151595e-02, 2.44117022e-02, -3.14640906e-02,
-2.60737436e-02, 3.45476294e-02, 1.28535207e-02, -1.72745096e-02,
1.94282952e-02, -3.27232993e-03, -2.34476222e-02, 5.73397163e-02,
9.32157164e-03, -7.73506024e-02, 2.56921031e-02, 2.98558779e-02,
-4.61180858e-03, -6.72791909e-02, 1.15526687e-01, -1.48512807e-02,
-1.64962985e-02, 6.39246379e-02, 1.09245648e-02, -4.17016748e-03,
-1.24036206e-02, -1.39811565e-02, 1.60576603e-02, 1.25968178e-02,
-3.82114539e-03, -2.84256479e-02, -1.49265660e-02, 2.17853452e-04,
-3.03888764e-04, -3.57882575e-02, 1.10530819e-02, 7.29136841e-02,
-2.73872115e-02, 2.33068073e-02, -2.51894381e-02, 4.08449419e-03,
1.14798768e-02, -6.14097124e-02, -4.28706108e-02, -4.81625998e-02,
1.97217871e-02, -5.97897486e-02])), (0.7892722212229611, array([ 0.01759243, 0.0244195 , -0
.01348355, 0.05594109, -0.02822828,
-0.03114862, -0.01025114, 0.02642081, -0.00898686, -0.42732225,
0.05497189, 0.09793574, -0.00615247, -0.19597648, 0.10631042,
0.27251081, 0.14753006, -0.00906053, -0.0057597 , -0.00494552,
0.05043376, 0.12632815, 0.11116636, 0.13092532, -0.06112112,
-0.09693479, -0.05446578, -0.14006325, 0.00753102, 0.02019177,
0.00181738, 0.21916386, 0.17045392, -0.02379436, -0.12168054,
0.00553675, -0.00742759, 0.07757025, -0.19196648, -0.05025151,
-0.231689 , 0.14095046, -0.02819515, -0.0134607 , -0.0707351 ,
0.00763897, -0.02964745, -0.16497223, -0.19827882, -0.18079722,
-0.0066776 , -0.05190613, 0.05535967, -0.00839952, -0.06733092,
0.25211957, 0.17666072, -0.13722392, -0.04466507, 0.09203293,
0.05474654, -0.09144626, -0.07607113, -0.06254102, -0.01293667,
0.0762188 , -0.07111797, -0.01021933, -0.10496458, 0.01394139,
0.05186032, -0.09578208, 0.00189994, -0.11034241, -0.0020095 ,
0.00307013, 0.04406047, 0.02847866, 0.08419341, 0.14161595,
-0.00614269, -0.04934289, -0.02857221, 0.07786195, -0.02494885,
-0.03030585, 0.04267903, -0.03400083, 0.06690435, 0.00336231])), (0.7619517528659735, arra
y([ 0.02689377, -0.05957791, -0.02589526, 0.09474024, -0.02102001,
0.03012182, -0.04118114, -0.05106918, 0.04332153, -0.29746521,
0.04675402, 0.06112399, -0.0113335 , 0.02307008, -0.06525496,
-0.27287067, -0.09734835, -0.0244697 , 0.00126079, -0.00036962,
-0.00616904, 0.22725556, -0.12359879, 0.05529858, 0.0652219 ,
0.06904022, 0.031982 , 0.08157967, 0.01682568, -0.01334902,
0.02608926, -0.14114573, -0.20349955, -0.08650184, 0.10832692,
0.00046514, 0.01815474, -0.09422992, 0.11832343, -0.03575713,
0.28709648, 0.24770688, -0.08384164, -0.01241468, -0.05904321,
0.01440831, -0.05734269, -0.09360715, -0.15191822, -0.17191821,
0.00443259, 0.02042583, 0.10592989, -0.01977457, -0.16017916,
0.00170861, 0.1962971 , 0.20708513, -0.15433359, -0.09634164,
-0.04421257, 0.22916174, 0.1700131 , 0.03785226, 0.1072855 ,
0.05827851, -0.01864558, -0.00840599, -0.03722675, 0.05147844,
0.07801918, -0.0726303 , 0.02782703, -0.07946549, 0.02506575,
0.09877713, 0.03717162, 0.07143827, 0.02336953, 0.02438579,
0.10413921, 0.00061904, 0.07713089, -0.18325313, -0.12507461,
0.06179492, 0.05677923, 0.00796665, 0.12786405, -0.01064299])), (0.754916612665571, array
([ 3.44821659e-02, 1.26829308e-01, 5.43327399e-03, 5.40251208e-02,
-6.31772239e-03, -7.31096931e-02, 1.00194762e-01, 1.26311627e-01,
-2.83309870e-02, -1.38370377e-01, -2.42896274e-02, 3.44449527e-03,
3.27200264e-02, 3.32972459e-02, 3.54149565e-02, -2.48360446e-01,
-1.19435469e-02, 1.12217435e-01, 2.54212469e-02, -3.23663707e-02,
3.06979595e-02, 1.09563881e-02, 6.97932703e-02, 4.88264866e-02,
2.38292722e-02, -1.34668990e-01, -1.80846307e-01, 8.10677045e-03,
1.07050838e-01, -2.81961937e-02, -4.41994556e-02, -4.31431671e-03,
1.40805904e-01, 9.43767166e-02, 9.03656398e-02, 3.75429265e-02,
-1.45878411e-01, 4.79768652e-02, 6.34142899e-03, 8.05501059e-02,
2.01874747e-01, -8.59169435e-02, -1.31662987e-01, -5.88583553e-02,
3.81207580e-01, 6.76882097e-02, -6.34175310e-02, -1.04675300e-01,
-1.41017598e-01, -5.40828206e-02, -2.52485663e-02, -4.05080992e-02,
-1.23833066e-01, 5.85499444e-02, 1.34698798e-01, 1.27605442e-01,
5.56295706e-02, -4.32121795e-02, 2.69422352e-02, -1.30918221e-02,
3.10371534e-02, -7.44338840e-02, -1.96521751e-02, 1.76668316e-02,
-1.26032960e-02, -2.26440295e-01, -1.03492450e-01, -5.83729646e-05,
-8.81297653e-02, 1.30297831e-02, -5.17166653e-02, -1.00603415e-01,
-4.24944869e-02, 2.26723469e-01, -2.88255891e-02, -3.69593029e-02,
-4.74195219e-02, 1.69271688e-01, -1.43065221e-01, -1.81965117e-01,
-3.44930629e-02, 1.20781113e-01, -6.03238202e-02, 4.87405557e-02,
6.82943198e-02, 1.60096454e-01, -6.43351316e-02, -7.49687169e-02,
-4.76429693e-03, 3.09960604e-01])), (0.7309008552453276, array([-3.12576183e-02, -3.00353136
e-02, -1.15434524e-02, -3.74192404e-02,
1.21194607e-02, 3.54390679e-02, -5.92220714e-02, -3.43419973e-02,
-1.62954981e-02, 3.61814212e-02, -4.15691648e-02, 1.62172077e-02,
3.11683856e-02, -9.99366833e-02, 2.28516084e-01, -4.59537346e-01,
1.41672965e-01, 1.77388278e-01, -2.70709877e-03, 3.68925400e-02,
4.19564801e-03, -1.19786356e-01, 9.82757133e-02, -2.37302746e-02,
-1.15006306e-02, -1.82167491e-02, 3.53555482e-02, -8.99522902e-02,
-1.05099269e-01, 1.40408070e-02, -3.63399133e-02, 9.96778586e-02,
1.44669392e-01, 1.55519585e-02, 2.29991872e-01, 1.48939095e-01,
-2.33116872e-01, 2.86760961e-02, -1.95486408e-01, -1.15693709e-01,
4.00805322e-01, 2.27205418e-02, 5.11989738e-02, -1.45252118e-02,
-1.52042970e-01, -1.81434799e-02, 8.15185742e-02, -4.65860405e-02,
4.41958474e-02, 8.88374795e-02, 1.01053499e-02, -2.93229234e-02,
3.75101120e-02, -2.43764479e-02, -1.78384967e-02, 1.00186522e-01,
-4.09938703e-02, -9.78378753e-02, 1.45384629e-02, 8.39263450e-02,
5.42777656e-02, -1.67621893e-01, -3.27909821e-02, -2.52681884e-02,
-4.20184070e-02, 1.35099267e-01, 3.70030499e-02, -2.57640682e-04,
2.09275366e-02, -1.31613049e-02, 1.01878028e-02, 3.99945612e-02,
-6.81091768e-03, -9.42781981e-02, 9.02301120e-03, -2.31518100e-02,
1.72141119e-02, -5.61493349e-02, 1.20112902e-01, 1.12640843e-01,
-1.63524514e-02, -3.38260736e-02, -3.09316007e-03, 4.91087253e-02,
-2.52284664e-02, -9.83192805e-02, 2.23107503e-02, 1.35065309e-02,
-2.10648854e-02, -2.23614183e-01])), (0.6902086442370304, array([-2.48394712e-02, -6.54456851
e-02, -6.04496784e-02, 5.97797123e-02,
1.04453000e-02, 4.43533564e-02, -2.34111646e-02, -6.65301057e-02,
1.68492557e-02, 1.19272280e-01, 1.81350517e-03, 6.97093156e-03,
2.12508055e-02, -1.60698027e-02, -1.93840577e-01, -1.31447806e-02,
-7.15273312e-02, -1.20965888e-01, 6.88178481e-03, -6.60825012e-04,
-1.85468037e-02, -5.97973938e-01, 9.73324139e-02, -1.66199196e-02,
8.01983100e-02, 2.45483333e-02, 3.08441845e-02, -4.48242011e-02,
-9.00496982e-02, -5.94900724e-02, 6.40474671e-02, -2.86411959e-02,
-3.14888366e-02, 6.02455867e-02, 4.18503179e-02, 1.80504114e-02,
2.18824714e-01, 8.52678194e-03, 1.26993359e-01, 8.53242763e-02,
2.12018084e-02, 7.16294977e-02, -8.95454956e-02, 6.51731177e-03,
-1.56303498e-02, -1.06026249e-04, -3.25688168e-02, 6.19167864e-04,
4.47910476e-02, -3.85713194e-02, 3.36407794e-02, -7.61597641e-02,
7.70892141e-03, -9.53573105e-03, 2.38271484e-02, 5.23502403e-01,
9.54240031e-02, -6.46799446e-02, -7.69163962e-02, 3.55080562e-02,
4.00248106e-02, 2.41773244e-02, -2.64112052e-02, 1.34953270e-02,
4.29121784e-02, 3.65390693e-03, -1.79335175e-02, 3.32345142e-02,
-2.99637256e-03, 9.53443834e-02, 4.16274493e-02, -1.70963714e-02,
2.96411797e-02, -3.01885058e-02, 4.34496481e-02, 6.73465056e-02,
1.56924088e-02, 1.05070539e-01, -2.55381324e-02, 2.46675269e-03,
5.41649939e-02, 3.12106893e-02, 9.12954266e-03, 8.48837089e-02,
-1.92830666e-01, 7.03777360e-02, 1.22657643e-01, 6.04117042e-02,
6.30270009e-02, 3.83421834e-02])), (0.6604534042250566, array([ 0.03493521, 0.00433818, 0
.00421347, 0.05698903, -0.0472373 ,
0.08277977, -0.04077489, 0.01079527, 0.04588983, 0.22098083,
0.20979844, -0.09531005, -0.20616871, -0.00620906, 0.38560093,
0.15308173, 0.12878625, 0.21170295, 0.01391932, 0.00681338,
-0.02000289, -0.21287615, -0.12193815, -0.05404867, 0.0496134 ,
0.09394178, 0.02479103, 0.11795467, 0.14014343, -0.02948085,
-0.12000781, -0.05256568, -0.06624699, -0.14857877, -0.17420537,
-0.08197003, -0.38631173, -0.04199609, -0.18837499, -0.1385434 ,
-0.10178701, 0.17064317, -0.10241735, -0.01967789, 0.11111213,
0.02463147, -0.00974989, 0.00286693, -0.04670847, 0.01330558,
-0.02012631, -0.01593084, 0.01209422, -0.02779087, 0.02695335,
0.08297272, -0.04745558, 0.06196023, -0.0145512 , -0.06916519,
-0.0275976 , 0.11834801, 0.15227742, 0.01652194, 0.06305596,
-0.02901072, 0.01837811, -0.01971193, 0.00842294, 0.04959294,
0.02020987, -0.00417164, 0.06980409, 0.03091795, 0.03161785,
0.09845365, 0.02155261, 0.07654164, 0.03378785, -0.02283066,
0.0701262 , -0.00271604, 0.07094519, -0.02398482, -0.15269574,
0.04210218, 0.04026346, 0.01763493, 0.02093315, 0.11314927])), (0.6482152267621847, arra
y([ 0.04976821, 0.10658763, 0.0956541 , -0.07532322, -0.05847024,
-0.05523187, 0.07456164, 0.11014023, -0.05021298, 0.06017025,
-0.0058527 , 0.02331819, -0.03819477, 0.01330611, -0.00425335,
-0.02067504, 0.00599168, 0.00552276, -0.01696796, -0.00386224,
-0.02471487, -0.21103374, -0.09226116, 0.019868 , -0.01843372,
0.10301034, 0.0947492 , 0.05973184, 0.00468052, 0.06264263,
-0.03086377, -0.07067192, -0.1095315 , -0.1011189 , -0.03080424,
-0.03838276, -0.05651225, -0.03669107, -0.0314335 , -0.02534076,
0.01485672, -0.23178877, 0.20503302, -0.03629108, -0.11232838,
-0.00681464, -0.11182121, -0.16201526, -0.22460152, -0.17242559,
0.05171118, -0.07330419, -0.00841712, 0.04114159, -0.03167827,
0.18503452, 0.05815386, -0.12078235, -0.07831803, 0.08114397,
0.00129601, 0.12729861, 0.06880663, -0.01546559, 0.07662265,
-0.15732585, -0.01931486, -0.07002065, -0.04065681, -0.0829991 ,
0.05445492, -0.00582084, -0.05749134, -0.08301152, -0.05502221,
-0.08062502, 0.00127886, -0.09811222, -0.0351346 , -0.09271583,
-0.07466982, -0.14215307, -0.07893433, -0.32281736, 0.48346038,
-0.04718825, 0.07673822, -0.0884037 , -0.05632293, -0.13510046])), (0.5714116669727295, arra
y([ 1.59600051e-02, 4.21114418e-02, 1.28461418e-03, 2.71529821e-02,
3.42196303e-02, 1.54971693e-01, 1.19960622e-01, 4.28047863e-02,
2.80131156e-02, 2.61894305e-01, 3.98224856e-01, 2.22430745e-02,
-2.12989146e-01, -2.37918870e-01, -1.51039961e-01, -7.93097234e-02,
-3.08705345e-02, -5.13930226e-02, -6.11104656e-03, -4.42445262e-03,
8.73600832e-03, 1.14425648e-01, 9.10932453e-02, -2.46844763e-02,
-4.11581153e-02, -4.09992595e-02, 2.33406505e-02, -8.65906300e-02,
-8.25769878e-02, 2.64222833e-04, 7.82106040e-02, 2.64207688e-02,
4.98407339e-02, 4.63747920e-02, 6.02988198e-02, 3.19919850e-02,
9.85422558e-02, 2.08581457e-02, 2.67063403e-02, 5.80787317e-02,
4.37215388e-02, 2.30841366e-01, -6.49068804e-02, -9.32392711e-03,
-3.29560958e-02, -6.30860877e-03, -1.62708094e-01, -9.22755011e-02,
-1.48081897e-01, -1.26421154e-01, 5.96946527e-02, 5.95035353e-02,
-2.80467760e-02, 8.64978493e-03, 2.13743205e-02, -2.22829559e-01,
-2.65768161e-01, -3.39638410e-01, 1.08326126e-02, 2.26939806e-01,
9.86318206e-02, 4.80225690e-03, -3.33788664e-02, -9.39573658e-03,
8.31612558e-02, -6.48915304e-02, 5.78423406e-02, -1.50498485e-02,
-4.49476406e-03, 1.04890517e-02, 2.80642735e-02, 3.19174967e-02,
1.41741391e-02, 6.45255334e-04, -1.34223998e-02, 1.02578208e-01,
2.84638364e-02, 9.67061410e-02, -4.95645113e-02, -6.37995458e-02,
-6.47332744e-03, -9.21554205e-02, 1.06503605e-01, -6.52962072e-02,
-2.19855513e-02, 4.75944087e-02, -5.60596598e-03, -3.71788394e-02,
9.41067553e-02, 2.11100251e-02])), (0.42504944989687043, array([-3.47767932e-02, 7.5845150
9e-02, -4.23429705e-02, 9.86716501e-03,
1.59940061e-01, -2.70717662e-01, 1.77383051e-01, 6.49096231e-02,
5.12056845e-02, 7.02072691e-02, 1.30791907e-01, 2.40911060e-03,
-6.06508944e-02, -7.60844667e-02, -6.15267202e-02, -1.58919873e-02,
-1.34459445e-02, -1.77913733e-02, -6.43477985e-03, -4.10655156e-03,
-9.60136167e-03, -8.54788290e-02, -1.57451652e-01, -3.58176329e-02,
-6.13724475e-02, -1.69618627e-02, -1.66075688e-02, 4.99590425e-03,
1.02490121e-01, 7.75399505e-02, 5.23546073e-02, 3.21318129e-02,
6.12294011e-02, 4.85299700e-02, 4.39752350e-02, 3.26550708e-02,
6.30899239e-02, -1.77241962e-03, -1.34126947e-03, 6.32237248e-03,
1.14247306e-02, 2.16675054e-01, 1.34977708e-01, 5.60382056e-02,
-2.14084863e-02, -4.14393775e-03, 5.55328275e-01, 3.27885606e-04,
-6.98772398e-03, -5.20532257e-02, -5.58848615e-01, -7.48481022e-03,
-3.12871441e-02, 2.89332523e-02, 1.22703678e-02, 4.22965525e-02,
1.80475928e-02, 2.41221529e-02, -1.97257540e-02, -2.97198520e-02,
1.06193706e-02, 4.19201509e-02, 3.43960546e-02, 1.42779862e-03,
3.66200453e-02, 4.72563016e-02, -2.81442245e-02, 1.92407077e-03,
-5.40808902e-02, -2.68162723e-02, -1.76657479e-02, -2.53211115e-02,
-1.78627652e-02, -2.81419084e-02, -2.40262612e-03, -2.32095273e-02,
-4.49570115e-03, -2.05890542e-02, -1.64679051e-04, -2.21925687e-02,
-3.75932025e-02, -1.62296398e-02, -5.76901005e-03, -4.50529220e-02,
6.19451479e-02, 2.40252178e-02, -1.40397374e-01, -2.13950623e-02,
7.27210216e-02, 6.17761287e-03])), (0.3978166894517955, array([-0.11005611, 0.05075236, -0
.09624746, -0.0346809 , 0.24849045,
-0.26368587, 0.28140826, 0.02678829, 0.06076458, 0.04763941,
0.11089526, -0.01547567, -0.03883996, -0.05911073, -0.02447412,
-0.03007646, -0.00050185, 0.00710358, -0.00213599, -0.00313048,
-0.01511068, -0.07451991, -0.29919188, -0.01310891, -0.09356173,
-0.01210994, -0.03275207, 0.04013982, 0.15592938, 0.132279 ,
0.08999955, 0.10598052, 0.10504277, 0.0389563 , 0.04564958,
0.02487026, 0.01432714, 0.01865657, -0.01893375, 0.00605514,
0.01792934, 0.28197809, 0.19678471, 0.07381823, -0.01360361,
-0.00726783, -0.38029588, 0.04512801, 0.09609548, 0.10470117,
0.38468441, -0.02207943, -0.01634265, 0.01960711, 0.00246179,
0.05325301, 0.08705457, 0.17774632, 0.04495314, -0.13924086,
-0.03897458, -0.05755354, -0.02677295, 0.00089107, -0.06797897,
0.06020379, -0.04883762, -0.00645087, -0.07838135, -0.04969979,
-0.04826861, -0.0359796 , -0.01252585, -0.01454904, -0.01312105,
-0.0546026 , -0.01316386, -0.07149409, 0.0161508 , 0.05153606,
-0.05670376, 0.01740911, -0.01657489, 0.04728431, 0.02907354,
-0.02996158, -0.04353655, -0.00926233, 0.01796126, 0.01685783])), (0.35238333407115874, arr
ay([ 2.72172870e-01, -5.71486439e-02, 2.58216217e-01, 5.05159254e-02,
-2.04062845e-01, 4.30256785e-01, 1.45763747e-01, -1.72881194e-03,
1.51499040e-01, 6.40095103e-02, 1.82131063e-01, 9.17543931e-02,
-1.26643385e-01, -1.76507321e-01, -1.16503634e-01, -4.72962966e-02,
-3.04675728e-02, -5.48197726e-02, -3.20158781e-02, -1.50510244e-02,
-1.22736041e-02, -3.80745058e-02, -6.50252743e-02, 1.23130614e-02,
-2.72930734e-04, 9.95479012e-03, 1.29080008e-02, 1.73148004e-02,
-4.74740301e-03, 2.57731002e-03, 1.92073566e-02, 2.77597161e-02,
3.68733793e-02, 2.46648624e-02, 1.48229781e-02, 1.21683841e-02,
2.71664328e-02, 8.47512627e-03, -8.89655041e-03, 7.07393823e-03,
2.64322939e-03, -6.02993028e-02, 8.05093040e-02, 9.54459142e-03,
1.44879125e-01, 7.48095238e-03, 4.13573441e-02, -6.52609644e-02,
-4.99949244e-02, -5.31302586e-02, -8.08456803e-02, -2.23092074e-02,
3.28519179e-02, -1.07182823e-02, -4.06721339e-02, 6.01641342e-02,
1.35307018e-01, 3.11889795e-01, 1.75319796e-01, -2.24787076e-01,
-1.42653467e-01, -2.41317120e-01, -1.45056322e-01, -3.60470006e-02,
-2.73935664e-01, 3.35921547e-02, -1.05881773e-02, -2.23898076e-02,
-8.04181260e-03, -4.15624181e-02, 1.37560250e-02, 1.81325894e-05,
4.01312448e-02, 1.56110706e-02, -1.81586891e-02, 6.98493209e-03,
2.21580621e-02, -5.27720713e-02, -1.73222731e-02, -5.76305220e-03,
3.77462940e-03, -1.69856197e-02, -1.57448527e-02, -7.42518745e-03,
3.44003188e-02, -1.07208876e-01, -3.22301219e-02, 3.40912034e-03,
-1.51647621e-02, -6.78213851e-02])), (0.337764060513702, array([-0.10267333, -0.29443803, -0.
0915461 , -0.02929046, -0.01193299,
0.13856508, 0.76303677, -0.2975754 , 0.17926268, -0.05497305,
-0.15320546, -0.03482298, 0.08583347, 0.10016258, 0.09579815,
0.03742155, 0.05589383, 0.01959031, 0.00804244, 0.00642888,
0.00549271, 0.02367011, 0.16449441, 0.00371798, 0.03479047,
0.03756304, 0.02415189, -0.0281378 , -0.09282584, -0.05453758,
-0.05188437, -0.05920363, -0.0772278 , -0.02662847, -0.03024676,
-0.01297158, -0.02741065, -0.0048173 , -0.09995176, -0.01941201,
-0.0087679 , -0.04618292, 0.01019038, -0.01331143, -0.05073854,
0.00250184, 0.01289621, -0.00670389, -0.01413956, -0.00291049,
-0.02149748, -0.02609368, 0.0018284 , -0.00596198, 0.01653413,
-0.00368353, -0.02215825, -0.04566043, -0.03991345, 0.04523418,
0.01758591, 0.0493502 , 0.02356299, 0.00854587, 0.04467478,
-0.03361031, -0.00296489, 0.00296015, -0.03464237, 0.0162329 ,
0.0092337 , -0.01099873, 0.01092503, -0.00199202, -0.00341072,
0.03307768, 0.0134801 , 0.03392747, -0.02049286, -0.02969398,
-0.00254682, 0.02844248, 0.01158971, -0.02635988, -0.00291389,
0.02692371, -0.12320224, -0.01920371, 0.01894774, 0.01093196])), (0.24537240875649707, arr
ay([ 0.33346055, 0.05539091, 0.28223949, 0.12145879, -0.07080231,
-0.54679222, 0.05434262, 0.11623824, 0.25216066, 0.04192859,
0.03877752, 0.02991654, -0.03311103, -0.04946977, -0.0402435 ,
-0.02243829, -0.01467914, -0.00918881, -0.0232997 , -0.01064298,
0.01051628, 0.05725223, 0.33682475, -0.00087466, 0.08132144,
0.09169357, 0.05327385, -0.08259453, -0.2000579 , -0.09646241,
-0.10150003, -0.12684152, -0.17657528, -0.06797636, -0.04195422,
-0.03898997, -0.07239586, -0.02050143, -0.02096775, -0.02422455,
-0.00828482, -0.00810232, 0.25424483, 0.02165218, 0.09415622,
0.00784175, -0.04497648, 0.03142562, 0.0495527 , 0.04873873,
0.07287478, 0.00271661, 0.01906564, -0.01831662, -0.00271585,
0.00710061, -0.00067786, -0.01882985, 0.00452397, 0.01880303,
-0.00790012, -0.01824323, 0.00445034, -0.00215321, -0.01528222,
0.06074812, -0.00679641, 0.02165929, -0.02416152, 0.01245209,
-0.00502446, -0.01888459, 0.01762397, 0.02308425, 0.00868448,
0.03326563, 0.02750681, 0.04028162, 0.01820977, 0.0451108 ,
-0.01384362, 0.03892947, 0.01754462, 0.05322128, -0.13580873,
0.02668102, -0.02897502, -0.00206738, 0.0692191 , -0.04783141])), (0.2105885520158932, arra
y([-3.10827511e-02, 1.13819233e-01, -2.59825677e-02, -1.18901436e-02,
6.81701205e-01, 3.06343714e-01, -1.26543027e-01, 1.01456929e-01,
9.27070651e-02, -1.32567215e-02, -4.05979079e-02, -1.17258673e-02,
2.21670119e-02, 2.38461476e-02, 4.04458947e-02, 2.00715548e-02,
1.69954253e-02, 1.89284684e-02, 5.34251756e-03, 5.52036179e-03,
7.37334954e-03, 3.87231506e-02, 2.99418215e-01, -4.13107676e-03,
8.31918236e-02, 8.00694576e-02, 4.75864449e-02, -5.59973043e-02,
-1.92155780e-01, -1.10033864e-01, -9.15510655e-02, -1.07284028e-01,
-1.41548109e-01, -4.97663771e-02, -3.94930698e-02, -2.29170359e-02,
-3.44465298e-02, -4.57614518e-03, -1.30644373e-02, -7.07281585e-03,
-1.16763094e-02, 1.65801688e-01, 8.82661150e-02, 3.80122928e-02,
-3.37033427e-02, -6.39047241e-04, 2.76063573e-03, -1.30101566e-02,
1.93777474e-03, -2.38012823e-03, -2.63197535e-02, 3.26343809e-03,
-9.28626432e-02, 3.89161237e-02, 9.71798149e-02, 2.76026138e-02,
7.80342774e-02, 9.49966701e-02, 2.93796276e-02, -6.03171537e-02,
-4.85341294e-02, -4.54767224e-02, -2.95147400e-02, -4.82907511e-03,
-7.11207984e-02, -1.90243245e-02, -2.03019810e-02, -1.67590552e-02,
-1.26001114e-02, -3.31330276e-02, -2.24885219e-02, 4.54737692e-03,
-2.60866996e-02, -3.48038740e-02, -1.82847898e-02, -6.46713273e-02,
-2.47287306e-02, -5.94244019e-02, -4.63910905e-03, -2.27185408e-02,
-2.22768472e-02, -5.12580769e-02, -6.51606223e-02, -4.44153155e-02,
1.35305829e-01, -5.71161658e-02, 1.20859100e-02, -2.49796205e-02,
1.07341383e-01, 1.86479045e-01])), (0.1947049465927752, array([-3.13869905e-01, 1.80486031
e-01, -2.65715965e-01, -1.14222059e-01,
-3.14658867e-01, 2.07415000e-01, -7.25748995e-02, 1.10132718e-01,
2.73851128e-02, -2.79589005e-02, -6.61198858e-02, -2.94104068e-02,
4.76871738e-02, 5.84940338e-02, 2.88044602e-02, 2.15043113e-02,
7.68030166e-03, -5.12020248e-03, -5.74688081e-03, -2.59236090e-04,
-7.45518168e-04, 6.28118296e-03, 5.26188188e-02, -1.97042248e-03,
-3.47192829e-03, 4.47694529e-02, 4.15371618e-02, -6.65222015e-02,
-6.05610881e-02, 6.49012625e-03, -7.36300556e-04, -2.45104237e-02,
-8.83109239e-03, 5.97309951e-03, -5.23943333e-03, 1.80728351e-03,
8.64929209e-03, -2.23466435e-03, 6.52532347e-03, 6.87686189e-03,
3.35876540e-03, 8.63268838e-02, 6.05110803e-01, 1.02871165e-01,
3.29692003e-01, 4.14976067e-02, -1.15587468e-02, -8.60571370e-03,
-1.93268840e-02, -1.58919638e-02, -1.88075817e-02, -4.20552528e-03,
1.36702370e-02, -8.55654201e-03, -1.25536852e-02, -1.46180593e-02,
-2.38149310e-02, -2.39224795e-02, -2.00389712e-04, 4.77005636e-03,
1.56076319e-02, 1.42342996e-02, 8.27201823e-03, 5.16918033e-04,
2.24203268e-02, 1.39543962e-02, 1.51098959e-03, 9.13492250e-03,
-6.92233410e-05, 5.79228782e-04, 2.28272273e-02, -1.40926361e-03,
3.42544640e-02, 1.98555635e-03, 1.44637639e-03, 4.26271148e-02,
2.96759892e-02, 2.97914223e-02, -1.21144395e-02, -5.41792673e-03,
4.83805949e-03, 6.05804192e-02, 4.77400799e-02, 1.42403755e-02,
-7.40234918e-02, 3.17133754e-02, 1.40296135e-02, 7.85101481e-03,
2.74875888e-01, -5.90879502e-02])), (0.12332392942132495, array([ 4.53435101e-02, -2.7803492
2e-03, -3.13445963e-01, 6.30711864e-01,
1.27697456e-02, 6.64868949e-02, -5.75087081e-03, 6.07346566e-03,
-2.61678153e-02, -9.93067403e-03, -7.91752836e-04, 1.70733274e-02,
-2.73025573e-03, -2.26305080e-02, -8.96305492e-03, -3.60634540e-04,
-8.49617743e-03, -9.80700990e-03, -2.24852279e-04, 4.09108102e-03,
2.05637292e-03, 9.24051832e-03, 2.42592699e-02, -3.64164001e-03,
1.68472969e-02, 2.54693542e-02, 5.05075770e-03, 5.16173513e-03,
-4.83468454e-02, -1.21900774e-02, -9.11780503e-03, -5.05512123e-03,
-7.41232540e-03, 3.91398550e-03, 3.06965071e-03, -5.90932528e-03,
1.12445860e-03, 4.66719907e-03, 7.76292483e-03, -4.68010209e-04,
4.57085333e-03, 6.98203998e-02, 2.23256968e-01, 6.03967088e-02,
5.41465239e-02, 6.50298816e-03, 1.54786976e-03, -6.12458026e-03,
-1.21478016e-02, -2.36115428e-02, -1.44891907e-02, 1.00069594e-02,
1.28721599e-02, -2.86915970e-03, -2.29801640e-02, -1.27419711e-02,
-1.79027671e-02, -2.41362676e-02, -7.78849976e-03, 1.25177575e-02,
1.22658786e-02, 2.01564609e-02, 6.24882932e-03, -2.94930421e-03,
2.19246690e-02, -1.21455484e-02, -3.64289500e-03, -2.47392662e-03,
-5.05322234e-03, -7.04776467e-03, 2.60680790e-03, 1.63317314e-03,
1.57057505e-03, -6.94289571e-03, -9.13210988e-03, -5.14828300e-03,
-2.05609675e-03, -1.55565554e-02, -1.56562532e-03, -6.52763569e-03,
-3.96162744e-03, -5.70459097e-04, 7.76754226e-04, -1.51831525e-03,
9.68083322e-04, 6.81558268e-04, -1.87228840e-03, -2.19265410e-03,
-6.51541156e-01, -1.35251471e-02])), (0.07848642552981931, array([ 1.84188359e-01, -3.0587017
1e-01, 1.49198471e-01, 7.87807951e-02,
1.22599538e-01, -4.03112767e-02, 3.34042236e-02, -2.53340913e-01,
-7.77881450e-01, 4.92023239e-03, 3.09791110e-02, 5.48255151e-03,
-2.82090547e-02, -1.27406253e-02, 1.15325868e-03, 4.43431341e-03,
6.29308236e-03, 1.45105664e-03, -1.82329674e-03, -7.60767616e-04,
-8.41740602e-04, 5.47909104e-03, 9.09192366e-03, -1.39850991e-03,
2.32559995e-02, 2.59952546e-02, 2.72665779e-02, -6.95731563e-03,
-3.79661683e-02, -3.82250001e-03, -7.30620065e-03, -8.43124545e-03,
-2.49057213e-02, -1.13018454e-02, -8.81137961e-03, -7.00195233e-03,
-1.75154527e-02, -4.03319983e-03, -6.31246022e-03, -3.38662606e-03,
-1.90484877e-03, 8.86981906e-02, 2.13060832e-01, 6.32707375e-02,
2.35421226e-01, 2.89764159e-02, 1.67968346e-03, -3.84168000e-03,
-3.03666587e-03, -1.83109292e-03, -8.57071972e-04, -4.31219406e-03,
-6.20497375e-03, 6.97048660e-03, 1.14633859e-02, -2.87600954e-04,
-3.78304190e-03, -1.95330813e-02, -5.22629574e-03, 2.04417140e-02,
4.52053030e-03, -7.15957141e-03, -2.63864754e-03, 2.18279169e-03,
2.14227840e-05, -5.32141382e-03, -5.85467921e-03, -1.59644921e-02,
2.70014938e-03, -1.34069282e-02, -1.25762248e-02, 2.80874041e-03,
-3.06062185e-02, 3.51151122e-04, -6.19744668e-03, -3.97887545e-02,
-2.64247400e-02, -3.69463581e-02, 7.20491728e-03, 3.56915543e-03,
-6.49136832e-03, -2.98564240e-02, -3.13811801e-02, -1.69712136e-02,
-3.22039697e-02, -2.11954676e-02, -2.95472599e-03, 3.49282285e-04,
1.61455654e-01, 1.24901710e-02])), (0.051763400830042, array([-1.88697768e-02, 1.99153916e
-02, -1.50992854e-02, -8.39538161e-03,
-6.13134826e-02, 1.27543697e-02, -3.74091591e-03, 1.51760514e-02,
6.56315244e-02, 1.01737069e-03, 8.89451723e-03, -1.99027424e-03,
-5.68317612e-03, -2.06171266e-04, -1.40521699e-03, -1.39625680e-03,
7.59116202e-04, -5.96010680e-04, 4.60939872e-04, 1.45203641e-04,
-3.86390842e-03, -3.24537563e-04, 3.86384430e-04, 7.11164706e-04,
-2.66644623e-03, -3.15822443e-03, -2.77303177e-03, 1.65647745e-03,
-2.20855060e-03, 1.16677604e-03, -1.06345034e-03, 4.03579179e-03,
6.48850641e-03, 3.26805997e-03, 2.17878795e-03, 1.10405224e-03,
1.41902792e-03, -1.79237047e-05, -2.92586910e-03, -4.04844594e-03,
6.73236709e-04, -8.25972214e-03, -1.64017263e-02, -4.90970826e-03,
-2.56431608e-03, -2.54648914e-04, 1.61465231e-03, 3.32298891e-03,
3.68933548e-04, 9.76030252e-05, -3.00978027e-04, 2.25115748e-03,
8.40532033e-03, -6.25251233e-03, -1.82606624e-03, -1.01219485e-03,
-1.22428377e-02, -1.11400675e-02, -5.78214933e-03, 8.73957810e-03,
4.98077086e-03, 9.74562202e-03, 4.18968632e-03, -2.00213912e-04,
1.00137419e-02, -2.75230220e-01, -5.60595261e-02, -1.12764402e-01,
-5.78889079e-02, -9.77460257e-02, -1.01178443e-01, -4.26695112e-02,
-2.24173542e-01, -1.99158429e-01, -1.34496154e-01, -2.61734132e-01,
-2.48290120e-01, -1.79693938e-01, -4.12659042e-02, -1.21238110e-01,
-9.89963288e-02, -2.26076602e-01, -3.01991152e-01, -2.02651620e-01,
-5.99178341e-01, -1.34591980e-01, -4.30616081e-02, -1.15515162e-01,
9.39791440e-03, -7.74882426e-03])), (0.0022475672476950743, array([ 8.73743164e-04, -1.66423
040e-03, 7.45416458e-04, 3.07977559e-04,
1.72861907e-03, -2.04540557e-03, 6.82206778e-04, -1.40305294e-03,
-2.72609621e-03, -1.42354356e-03, 7.09013571e-05, -2.25955695e-05,
7.30409774e-05, 2.09513937e-04, -6.28938505e-04, 8.33854395e-05,
2.71834078e-05, 2.73636459e-05, 2.77647997e-05, 4.63309898e-06,
2.73807988e-04, 1.05254109e-03, -2.15173325e-04, 1.10141058e-04,
-3.63394974e-04, 1.89895596e-04, 2.13911999e-04, -2.12449614e-04,
-1.48090129e-04, 3.83683190e-05, 3.95015250e-05, -6.40889697e-05,
-2.15752268e-04, 3.13663774e-05, 5.52593117e-05, -1.68659188e-06,
1.24672096e-04, -3.22821095e-05, 4.53533751e-06, 1.52531961e-05,
2.96810024e-05, 2.66726327e-04, -3.15592018e-04, 1.24963156e-04,
2.91945427e-04, 3.44997444e-05, -4.49126982e-04, -3.91767013e-04,
3.18626012e-04, 2.06652903e-04, -1.15869167e-03, -1.19228002e-01,
-6.73506700e-01, -6.19958284e-01, -3.84399876e-01, -2.63273327e-05,
-4.93419905e-03, -1.18597689e-03, 2.38658754e-04, 1.05118521e-03,
4.73676341e-04, 4.94709946e-04, 3.05776306e-04, 6.32243095e-05,
7.23131883e-04, -4.31812813e-04, 2.56914039e-04, -1.10376292e-04,
7.39428235e-05, -4.68659075e-05, 1.93069842e-05, 7.41033672e-05,
-3.28099516e-04, -2.30607646e-04, -7.37500178e-04, -2.47522772e-04,
-6.48623696e-04, -2.00731100e-04, 6.41467503e-05, 5.37530518e-05,
-7.53452293e-05, -3.79687489e-04, 1.58121247e-04, -1.08978707e-04,
-1.66192458e-03, 4.65935481e-05, 1.81179430e-04, -1.10096449e-04,
1.35285991e-03, 3.28324358e-04])), (0.0009933514215786233, array([ 1.53628210e-04, -6.22361
736e-04, -4.57054309e-06, 2.90935816e-04,
-2.10106863e-04, 1.00204884e-04, 2.17499715e-04, -5.57897504e-04,
-1.39568550e-03, 1.05339167e-01, 3.79239031e-01, 5.45831880e-01,
5.02591619e-01, 2.61126125e-01, 1.08363182e-01, 3.79714888e-02,
2.14380574e-02, 1.61903934e-02, 1.14422809e-02, 8.09929611e-03,
7.40576108e-03, 2.66280569e-02, 1.98802392e-01, 1.10899671e-02,
1.30437900e-01, 1.77213342e-01, 1.47040349e-01, 1.47554793e-01,
2.16513272e-01, 1.10090824e-01, 8.76644327e-02, 7.05251496e-02,
7.71524303e-02, 3.13236194e-02, 2.52734757e-02, 1.43036488e-02,
2.08533768e-02, 5.21462344e-03, 7.36003888e-03, 6.38283136e-03,
3.69168397e-03, 1.23179683e-05, 1.01732980e-03, 2.06129745e-04,
1.45626634e-03, 4.51948859e-03, 1.02505841e-05, 1.06353364e-06,
7.59423127e-04, 4.73252908e-05, 3.53150845e-05, 5.34980746e-04,
-9.21386327e-05, -1.81679508e-04, -1.39970011e-04, 1.93297011e-03,
5.53199015e-05, 1.82548204e-04, 7.77840325e-04, 4.21404278e-04,
2.32995847e-05, 3.22842741e-05, -9.13394568e-06, 6.90184011e-07,
3.49610347e-05, -4.14575111e-04, -1.47745097e-04, -1.84970570e-04,
-2.31609979e-04, -1.85588148e-04, -1.81753881e-04, -1.27181386e-04,
-3.94305437e-04, -3.18831514e-04, -2.19945839e-04, 7.05051980e-05,
-4.52457928e-04, -2.64528565e-05, -4.66528170e-05, -1.51408972e-04,
-2.01521049e-04, -3.48861044e-04, -5.67246400e-04, -2.76629167e-04,
-1.07199259e-03, 9.37205376e-04, -6.20598211e-05, -1.53413905e-04,
-1.32020174e-04, -1.10134760e-04])), (0.0001285036483623192, array([-1.28878123e-04, 2.02334
342e-05, -1.98617012e-04, 1.09363369e-04,
2.14764133e-04, 2.23521491e-04, -1.43039095e-04, -5.62481330e-06,
-2.80984771e-04, 5.46493983e-02, 1.96852465e-01, 2.83402156e-01,
2.60788652e-01, 1.35444226e-01, 5.62126533e-02, 1.96921722e-02,
1.11068787e-02, 8.39898897e-03, 5.94223765e-03, 4.20047060e-03,
-1.42471248e-02, -5.25585800e-02, -3.82741827e-01, -2.13698723e-02,
-2.50747229e-01, -3.41163519e-01, -2.83196435e-01, -2.84414624e-01,
-4.17342376e-01, -2.12227247e-01, -1.69255898e-01, -1.36025496e-01,
-1.48792418e-01, -6.03392197e-02, -4.87875548e-02, -2.75823143e-02,
-4.02682463e-02, -1.00689806e-02, -1.42375864e-02, -1.23355135e-02,
-7.12337590e-03, 9.68627459e-05, 1.52790204e-04, 4.97796947e-05,
1.16390050e-04, -1.09305830e-03, 2.32283741e-05, -3.69245654e-05,
-2.08640398e-04, -2.08468889e-05, -2.73325647e-05, 3.09570464e-04,
1.12712030e-05, 6.72111137e-06, 4.97753122e-05, 1.44149409e-03,
1.38348556e-03, 3.95272055e-03, 6.47066837e-03, 5.84691125e-03,
1.93258877e-03, 1.09953636e-03, 4.14830539e-04, 7.20333669e-05,
2.33433608e-03, 1.50046037e-04, -8.99742286e-07, 5.09254547e-05,
-3.69858670e-05, 3.66282565e-05, 7.63683400e-05, -7.89082876e-06,
1.09914137e-04, 7.57720383e-05, 6.72733706e-05, 1.36273518e-04,
1.23933000e-04, 2.45637766e-04, 2.42022541e-05, 6.39095337e-05,
4.85970167e-05, 1.10251470e-04, 1.34102319e-04, 8.35564438e-05,
4.47564514e-04, 1.36209850e-04, 5.51574641e-05, 6.13542319e-05,
-3.51101653e-05, 5.54513489e-05])), (8.546833259599872e-05, array([-2.08332763e-04, 2.31838
890e-04, -2.32036055e-04, 2.13627016e-05,
-3.21972859e-04, -3.28785753e-05, -1.44691509e-05, 1.78841317e-04,
3.08667982e-04, -1.27699843e-04, -2.38053889e-03, -3.47639669e-03,
-3.15684072e-03, -1.61671008e-03, -6.72242549e-04, -2.34562134e-04,
-1.34216500e-04, -1.01989746e-04, -6.92548070e-05, -4.77069908e-05,
1.28954766e-04, 1.82467528e-03, 3.69931946e-03, 1.86164953e-04,
2.40342441e-03, 3.29844614e-03, 2.71115400e-03, 2.74510258e-03,
4.07991636e-03, 2.06063416e-03, 1.66397221e-03, 1.34137927e-03,
1.48732389e-03, 6.03148471e-04, 4.85348750e-04, 2.75466231e-04,
4.07980815e-04, 1.01221155e-04, 1.47168736e-04, 1.25627378e-04,
7.56742638e-05, -5.29424996e-05, 8.91831971e-05, -8.22435331e-07,
3.13514097e-05, 1.22806426e-05, -7.69374333e-05, -8.32139535e-06,
-5.22312387e-05, -3.01101714e-05, -7.96754176e-06, -2.62440143e-05,
-3.81164553e-05, -8.87918641e-05, 1.50166487e-04, 4.25264035e-02,
1.30542379e-01, 3.84450432e-01, 6.32638568e-01, 5.75707304e-01,
1.90209340e-01, 1.06746617e-01, 3.98898116e-02, 6.44077334e-03,
2.28826068e-01, -5.74563925e-05, -5.80535539e-06, -5.59460932e-06,
-8.47758206e-05, -1.59189210e-05, 7.39386200e-04, -1.86751752e-05,
-2.45961895e-06, -3.30313631e-07, -2.55624742e-05, 8.58763777e-06,
-2.34121617e-05, 3.01494013e-05, -1.62412636e-05, -3.43000551e-05,
-5.75614077e-06, 7.25723622e-06, 1.39889108e-05, -1.43744435e-05,
-1.70305297e-04, 3.63750540e-05, -6.76543807e-05, -3.14403423e-05,
-9.10703227e-05, -8.62663594e-05])), (3.57406919508468e-15, array([ 2.11575486e-02, -1.582900
18e-02, -2.31031029e-02, -1.32339463e-02,
-1.89671464e-15, -4.50856253e-16, -2.47485440e-16, 1.67721240e-02,
-6.87602672e-18, 1.66117623e-13, 5.83720984e-13, 8.40294151e-13,
7.73710867e-13, 4.02040948e-13, 1.66488434e-13, 5.83168521e-14,
3.32222490e-14, 2.49726506e-14, 1.75715810e-14, 1.24680774e-14,
-3.05980931e-14, -1.02417681e-13, -8.22521604e-13, -4.60839231e-14,
-5.39098389e-13, -7.33497436e-13, -6.08832099e-13, -6.11607489e-13,
-8.97321584e-13, -4.56141407e-13, -3.63870351e-13, -2.92295232e-13,
-3.19878831e-13, -1.29643197e-13, -1.05080017e-13, -5.93866341e-14,
-8.60871852e-14, -2.14069617e-14, -3.08313969e-14, -2.67523519e-14,
-1.55999860e-14, -3.68177357e-16, 9.83300189e-16, -2.88448726e-17,
1.17248397e-15, -8.87031146e-16, -5.82693334e-16, -1.15822815e-16,
-7.13689134e-16, -2.60166268e-16, -2.90415081e-16, -6.04816455e-15,
-3.87908532e-14, -3.59788680e-14, -2.04042558e-14, 3.36160503e-13,
1.02335089e-12, 3.01489393e-12, 4.96135057e-12, 4.51465170e-12,
6.15185598e-01, 3.45170989e-01, 1.28946998e-01, 2.08244423e-02,
-6.95452578e-01, -1.17725314e-15, -2.14993045e-16, -1.88381812e-16,
-1.07716607e-15, -1.92010054e-16, 5.71067215e-15, -1.84127565e-16,
-6.35501344e-16, -4.92896282e-16, -3.09190140e-16, -2.45488394e-16,
-9.10601070e-16, 1.07212288e-16, -2.11995340e-16, -5.21567041e-16,
-4.83790506e-16, -7.36752799e-16, -9.05612969e-16, -6.78891034e-16,
-3.07509446e-15, 7.67017500e-16, -3.25664494e-16, -3.37512170e-16,
4.22647841e-16, -1.53183588e-16])), (-5.203296557290081e-16, array([ 3.95776120e-01, 5.9535
0761e-01, -2.60441848e-01, -1.49186602e-01,
1.92539649e-15, -7.15780497e-16, 1.14891672e-15, -6.30822900e-01,
-3.29257908e-15, 4.67313577e-14, 1.76804996e-13, 2.53917221e-13,
2.33699003e-13, 1.21103520e-13, 5.04820000e-14, 1.73684194e-14,
1.00907288e-14, 7.40941579e-15, 5.25033951e-15, 3.63889926e-15,
-1.92335010e-14, -7.64211538e-14, -5.14525154e-13, -2.85113552e-14,
-3.36988883e-13, -4.58831878e-13, -3.80916894e-13, -3.82484085e-13,
-5.61969388e-13, -2.85650684e-13, -2.27840712e-13, -1.83258941e-13,
-2.00699534e-13, -8.12421295e-14, -6.55676014e-14, -3.71603918e-14,
-5.46295728e-14, -1.35999504e-14, -1.93399333e-14, -1.67395955e-14,
-9.51656615e-15, 4.15643054e-16, 2.39123077e-16, -2.40023495e-16,
4.57368674e-16, -2.19071205e-15, 8.04654859e-16, 1.91213598e-16,
2.23220914e-16, 2.44529910e-16, 4.61959783e-17, 6.51377538e-15,
3.49687157e-14, 3.24735983e-14, 1.92711779e-14, -1.91410281e-13,
-5.90311992e-13, -1.74015128e-12, -2.86353645e-12, -2.60610537e-12,
-1.84157089e-02, -1.03327654e-02, -3.86005521e-03, -6.23384014e-04,
2.08185177e-02, 5.82569889e-16, -1.27445237e-17, 1.40344000e-17,
3.79983681e-16, 7.56614939e-17, -3.43738723e-15, 8.31006203e-17,
8.20325305e-17, 1.13494052e-16, 2.93170621e-16, -1.51020000e-16,
3.11752960e-16, 1.58851686e-16, 1.59089842e-16, 4.46618813e-17,
6.60833994e-17, -8.21858778e-17, -1.70396109e-16, -1.16447521e-16,
1.30103395e-15, -3.37879073e-16, 1.11958831e-16, 2.80088150e-16,
2.49661075e-15, 3.14516975e-16])), (-1.832295599444726e-15, array([-3.66771361e-01, 5.3013
3370e-01, 4.49762306e-01, 2.57633367e-01,
-1.00010513e-15, 1.36101936e-15, -5.96497519e-16, -5.61719731e-01,
2.96800585e-15, 2.21860401e-13, 8.01561027e-13, 1.15431277e-12,
1.06242717e-12, 5.51661224e-13, 2.28765043e-13, 7.99484576e-14,
4.53604094e-14, 3.40503053e-14, 2.42471615e-14, 1.70529159e-14,
-5.43693189e-14, -2.02956039e-13, -1.45906397e-12, -8.12234427e-14,
-9.55450657e-13, -1.30039646e-12, -1.07947378e-12, -1.08345113e-12,
-1.59000274e-12, -8.08602402e-13, -6.44860817e-13, -5.18191098e-13,
-5.66762590e-13, -2.29591484e-13, -1.85566519e-13, -1.05031958e-13,
-1.53445492e-13, -3.83317479e-14, -5.41923598e-14, -4.70300061e-14,
-2.70197749e-14, -8.38247062e-16, -2.00960536e-15, -6.14130512e-16,
-1.24468044e-15, -4.01922963e-15, 8.54756834e-16, 6.71081649e-17,
-5.35323451e-16, 1.09902452e-16, -4.81268135e-16, 4.21930429e-15,
1.59689958e-14, 1.45423480e-14, 8.54643474e-15, -6.70301174e-14,
-2.17659478e-13, -6.42007685e-13, -1.05641968e-12, -9.61851422e-13,
-9.68124537e-03, -5.43199492e-03, -2.02925349e-03, -3.27716605e-04,
1.09444159e-02, -2.06603474e-16, -1.48153166e-16, -2.55972266e-16,
-1.98805873e-16, -1.06059060e-16, -1.46806749e-15, -1.30672293e-16,
-2.75698019e-16, -4.80575050e-16, -2.11066013e-16, -3.44234921e-16,
-2.54309196e-16, 5.75283223e-16, -8.71629386e-17, -6.15334883e-16,
-1.03644619e-16, -5.13019067e-16, -6.55711395e-16, -8.17593232e-16,
-8.48519545e-17, 8.21246329e-17, 1.26381210e-16, -1.13566604e-16,
-1.49054402e-15, 7.83773982e-17]))]
Eigenvalues in descending order:
[6.400301029851477, 4.23053271502051, 3.0220056962108997, 2.3606995503606907, 1.7227802802980245, 1.
7053304684180373, 1.5823948341725016, 1.5166979298471606, 1.4840050960170488, 1.3921255385114626, 1.
338123871891159, 1.2745588269563284, 1.2235863310684403, 1.2178118832338365, 1.1943992057853123, 1.1
835468203512876, 1.175215028494778, 1.1607311253642982, 1.148470393300462, 1.1194893763589782, 1.109
6027631760017, 1.1049386135422454, 1.0914342813543214, 1.0846048546102214, 1.0827345320601791, 1.071
1889349293784, 1.0636889264741323, 1.054552899346843, 1.0495564454254236, 1.048150719936755, 1.04163
63339168457, 1.0381369601703594, 1.0334519547258465, 1.027681645874375, 1.0238189302225627, 1.015216
974820168, 1.0136313723660826, 1.0118982712943438, 1.0092805032957575, 1.0062251615105764, 1.0055267
813795965, 1.0041959740870794, 1.0019590929564808, 1.0005162465340969, 0.9936805751250474, 0.9932619
924220245, 0.989390243013148, 0.9869860806882016, 0.9828875619572957, 0.9812541978117575, 0.96980869
41074693, 0.9529273395960557, 0.949801385112714, 0.9442733887767248, 0.9416943150113966, 0.933793856
0734337, 0.931041145818125, 0.9160124331703235, 0.9017616031502895, 0.890644523647056, 0.88856765553
17741, 0.8791588942184467, 0.8629753367296573, 0.8317611004894274, 0.8087658281973401, 0.78927222122
29611, 0.7619517528659735, 0.754916612665571, 0.7309008552453276, 0.6902086442370304, 0.660453404225
0566, 0.6482152267621847, 0.5714116669727295, 0.42504944989687043, 0.3978166894517955, 0.35238333407
115874, 0.337764060513702, 0.24537240875649707, 0.2105885520158932, 0.1947049465927752, 0.1233239294
2132495, 0.07848642552981931, 0.051763400830042, 0.0022475672476950743, 0.0009933514215786233, 0.000
1285036483623192, 8.546833259599872e-05, 3.57406919508468e-15, -5.203296557290081e-16, -1.8322955994
44726e-15]
In [174]:
tot = sum(eigenvalues)
var_explained = [(i / tot) for i in sorted(eigenvalues, reverse=True)]
# an array of variance explained by each
# eigen vector... there will be 90 entries as there are 90 eigen vectors)

cum_var_exp = np.cumsum(var_explained)
# an array of cumulative variance. There will be 90 entries with 90 th entry cumulative reaching almost 100%
In [175]:
print(len(var_explained))
print((cum_var_exp))
90
[0.07111057 0.11811392 0.15168992 0.17791848 0.19705944 0.21600652
0.23358772 0.250439 0.26692704 0.28239426 0.29726149 0.31142248
0.32501714 0.33854764 0.35181802 0.36496782 0.37802505 0.39092136
0.40368144 0.41611953 0.42844778 0.4407242 0.45285059 0.46490109
0.47693082 0.48883227 0.50065039 0.512367 0.5240281 0.53567358
0.54724669 0.55878091 0.57026308 0.58168114 0.59305629 0.60433586
0.61559781 0.62684051 0.63805413 0.6492338 0.66040571 0.67156283
0.6826951 0.69381134 0.70485163 0.71588727 0.72687989 0.73784581
0.74876618 0.75966841 0.77044347 0.78103098 0.79158375 0.8020751
0.8125378 0.82291272 0.83325705 0.84343441 0.85345344 0.86334895
0.87322138 0.88298928 0.89257737 0.90181865 0.91080445 0.91957366
0.92803933 0.93642683 0.94454751 0.95221608 0.95955405 0.96675604
0.97310471 0.97782723 0.98224717 0.98616233 0.98991506 0.99264127
0.99498101 0.99714428 0.99851447 0.9993865 0.99996161 0.99998659
0.99999762 0.99999905 1. 1. 1. 1. ]
From above table we conclude that 96% variance is contributed by about 72 features
In [176]:
plt.bar(range(0,90), np.array(var_explained), alpha = 0.5, align='center', label='individual explained variance')
plt.step(range(0,90), np.array(cum_var_exp), where= 'mid', label='cumulative explained variance')
plt.ylabel('Explained variance ratio')
plt.xlabel('Principal components')
plt.legend(loc = 'best')
plt.show()
72 dimensions covering 97% variance in the data. So we can reduce to 72 dimension space
Now will recall the ensemble models from our initial run to check the feature selection using featureimp from individual models
In [177]:
#Building fuction to return the feature importances for the model
predictors = [x for x in dff.columns if x not in ['price']]
def modelfit(alg, dxtrain, dytrain, printFeatureImportance=True):

#feature importance
alg.fit(dxtrain,dytrain)
alg_imp_feature_1=pd.DataFrame(alg.feature_importances_, columns = ["Imp"], index = predictors)
alg_imp_feature_1.sort_values(by="Imp",ascending=False)
alg_imp_feature_1['Imp'] = alg_imp_feature_1['Imp'].map('{0:.5f}'.format)
alg_imp_feature_1=alg_imp_feature_1.sort_values(by="Imp",ascending=False)
alg_imp_feature_1.Imp=alg_imp_feature_1.Imp.astype("float")
feat_30list=list(alg_imp_feature_1.index[:30])
if printFeatureImportance:
alg_imp_feature_1[:30].plot.bar(figsize=(plotSizeX, plotSizeY))
#First 20 features have an importance of 90.5% and first 30 have importance of 95.15
print("First 25 feature importance:\t",(alg_imp_feature_1[:25].sum())*100)
print("First 30 feature importance:\t",(alg_imp_feature_1[:30].sum())*100)
return feat_30list
Will run above function with ensemble models: Gradient boosting, Random forest, Bagging
In [178]:
#Gradient boost model

modelfit(GB1,X_train,y_train)

dtype: float64
dtype: float64
Out[178]:
['furnished_1',
'living_measure',
'yr_built',
'living_measure15',
'quality_8',
'City_Bellevue',
'City_Seattle',
'lot_measure15',
'HouseLandRatio',
'City_Kent',
'quality_9',
'sight_4',
'City_Federal Way',
'coast_1',
'City_Mercer Island',
'City_Kirkland',
'City_Medina',
'City_Redmond',
'quality_11',
'ceil_measure',
'quality_7',
'City_Renton',
'City_Maple Valley',
'quality_6',
'total_area',
'quality_10',
'basement',
'City_Issaquah',
'City_Sammamish',
'condition_5']
The top 30 features are covering about 98% in gradient boosting model. This is very good coverage for just 30% of the variables
In [179]:
#Random Forest model

modelfit(RF1,X_train,y_train)

dtype: float64
dtype: float64
Out[179]:
['furnished_1',
'yr_built',
'living_measure',
'living_measure15',
'quality_8',
'HouseLandRatio',
'lot_measure15',
'quality_9',
'ceil_measure',
'City_Bellevue',
'total_area',
'lot_measure',
'City_Seattle',
'City_Kirkland',
'City_Kent',
'City_Federal Way',
'coast_1',
'basement',
'City_Mercer Island',
'quality_7',
'City_Redmond',
'sight_4',
'City_Renton',
'City_Maple Valley',
'City_Medina',
'City_Sammamish',
'quality_10',
'has_renovated_Yes',
'room_bath_2.5',
'room_bed_3']
The top 30 features are covering about 95% in random forest model
Now will extract the top 30 features from the above models
In [180]:
feat_list_GB1=modelfit(GB1,X_train,y_train, printFeatureImportance=False)
print(feat_list_GB1)
feat_list_RF1=modelfit(RF1,X_train,y_train, printFeatureImportance=False)
print(feat_list_RF1)
['furnished_1', 'living_measure', 'yr_built', 'living_measure15', 'quality_8', 'City_Bellevue', 'Cit

y_Seattle', 'lot_measure15', 'HouseLandRatio', 'City_Kent', 'quality_9', 'sight_4', 'City_Federal Wa
y', 'coast_1', 'City_Mercer Island', 'City_Kirkland', 'City_Medina', 'City_Redmond', 'quality_11', '
ceil_measure', 'quality_7', 'City_Renton', 'City_Maple Valley', 'quality_6', 'total_area', 'quality_
10', 'basement', 'City_Issaquah', 'City_Sammamish', 'condition_5']
['furnished_1', 'yr_built', 'living_measure', 'living_measure15', 'quality_8', 'HouseLandRatio', 'lo
t_measure15', 'quality_9', 'ceil_measure', 'City_Bellevue', 'total_area', 'lot_measure', 'basement',
'City_Kent', 'City_Kirkland', 'City_Federal Way', 'City_Seattle', 'quality_7', 'City_Mercer Island',
'City_Redmond', 'coast_1', 'City_Renton', 'sight_4', 'quality_10', 'City_Maple Valley', 'City_Sammam
ish', 'City_Medina', 'room_bed_4', 'condition_3', 'City_Issaquah']
From the above 2 feature list, we will consolidate all the features
In [181]:
Key_feat=list(set(feat_list_GB1).union(feat_list_RF1))
print(len(Key_feat))
print(Key_feat)
33
['City_Mercer Island', 'condition_5', 'City_Sammamish', 'yr_built', 'sight_4', 'City_Seattle', 'City
_Federal Way', 'City_Maple Valley', 'City_Bellevue', 'furnished_1', 'City_Kent', 'quality_9', 'City_
Redmond', 'City_Issaquah', 'quality_8', 'total_area', 'quality_7', 'ceil_measure', 'City_Medina', 'c
oast_1', 'condition_3', 'lot_measure15', 'HouseLandRatio', 'City_Kirkland', 'City_Renton', 'living_m
easure15', 'basement', 'room_bed_4', 'quality_6', 'lot_measure', 'quality_10', 'quality_11', 'living
_measure']
From two models we have 33 importance features. We will freeze on the above 33 list and make another dataframe (along with 'price')
In [182]:
dff33=dff[['price','basement', 'City_Bellevue', 'coast_1', 'HouseLandRatio', 'City_Seattle', 'quality_10', 'quali

ty_9', 'ceil_measure', 'City_Renton', 'City_Redmond', 'City_Federal Way', 'City_Mercer Island', 'yr_built', 'livi
ng_measure15', 'living_measure', 'City_Maple Valley', 'sight_3', 'total_area', 'City_Kirkland', 'sight_4', 'quali
ty_6', 'quality_7', 'City_Sammamish', 'quality_8', 'City_Kent', 'quality_12', 'lot_measure', 'condition_3', 'furn
ished_1', 'City_Issaquah', 'quality_11', 'City_Medina', 'lot_measure15']].copy()
In [183]:
dff33.shape
Out[183]:
(18287, 34)
In [184]:
dff33.head()
Out[184]:
price basement City_Bellevue coast_1 HouseLandRatio City_Seattle quality_10 quality_9 ceil_measure City_Renton ... quality_8 City_Kent qualit
17786 430000 0 0 0 19.0 1 0 0 2550 0 ... 1 0
3782 385500 420 0 0 16.0 0 0 0 1120 0 ... 0 0
10069 736000 0 1 0 16.0 0 0 1 2290 0 ... 0 0
7114 580000 970 0 0 24.0 1 0 0 970 0 ... 0 0
10080 315000 1160 0 0 22.0 1 0 0 1160 0 ... 0 0
In [185]:
X3 = dff33.drop("price" , axis=1)
y3 = dff33["price"]
X3_train, X3_test, y3_train, y3_test = train_test_split(X3, y3, test_size=0.2, random_state=10)

X3_train, X3_val, y3_train, y3_val = train_test_split(X3_train, y3_train, test_size=0.2, random_state=10)
print(X3_train.shape)
print(X3_test.shape)
print(X3_val.shape)
(11703, 33)
(3658, 33)
(2926, 33)
Eventhough PCA is helping us to reduce dimensions upto about 60 dimensions, we can see that in our random
forest model top 30 features are explaining the 95% variance in the regression and in gradient boosting model
top 30 features are covering 98% varience.
Hence we conclude that we will use features selection by considering the feature importance fucntion in
individual models. Thus we extracted 33 important features
HYPERTUNING with Gridsearch CV
In [186]:
from sklearn.model_selection import GridSearchCV

from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import cross_val_score
Since we have better performance in gradient boosting model, we will hypertune the model for improving the score
Following are the parameters we tune for the gradient boosting model.
In [187]:
param_grid = {
'loss':['ls','lad','huber'],
'bootstrap': ['True','False'],
'max_depth': range(5,11,1),
'max_features': ['auto','sqrt'],
'learning_rate': [0.05,0.1,0.2,0.25],
'min_samples_leaf': [4,10,20],
'min_samples_split': [5,10,1000],
'n_estimators': [10,50,100,150,200],
'subsample':[0.8,1]
}
In [188]:
GBR_test=GradientBoostingRegressor(random_state=22)
First will tune each parameter separately
In [189]:
param_grid1 = {'n_estimators': range(50,401,50)}
In [190]:
grid_search1 = GridSearchCV(estimator = GBR_test, param_grid = param_grid1,

cv = 3, n_jobs = 2, verbose = 1)
In [191]:
grid_search1.fit(X_train,y_train)
grid_search1.best_params_
Fitting 3 folds for each of 8 candidates, totalling 24 fits
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.

[Parallel(n_jobs=2)]: Done 24 out of 24 | elapsed: 57.6s finished
Out[191]:
{'n_estimators': 400}
In [192]:
grid_search1.best_params_, grid_search1.best_score_
Out[192]:
({'n_estimators': 400}, 0.7757647547223905)
n_estimators of 400 is best in range 50 to 400. Will test same until 1000
In [193]:



[Parallel(n_jobs=2)]: Done 12 out of 12 | elapsed: 1.3min finished
Out[193]:
GridSearchCV(cv=3, error_score='raise-deprecating',
estimator=GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
learning_rate=0.1, loss='ls', max_depth=3, max_features=None,
max_leaf_nodes=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_sampl...te=22, subsample=1.0, tol=0.0001,
validation_fraction=0.1, verbose=0, warm_start=False),
fit_params=None, iid='warn', n_jobs=2,
param_grid={'n_estimators': range(400, 1001, 200)},
pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
scoring=None, verbose=1)
In [194]:
grid_search2.cv_results_,grid_search2.best_params_, grid_search2.best_score_
Out[194]:
({'mean_fit_time': array([ 7.2032059 , 10.84747616, 14.41415413, 17.8464543 ]),

'std_fit_time': array([0.0866392 , 0.19536189, 0.14661922, 0.92177025]),
'mean_score_time': array([0.03063202, 0.04291979, 0.06155936, 0.07675632]),
'std_score_time': array([0.00097431, 0.00340029, 0.00733648, 0.01039824]),
'param_n_estimators': masked_array(data=[400, 600, 800, 1000],
mask=[False, False, False, False],
fill_value='?',
dtype=object),
'params': [{'n_estimators': 400},
{'n_estimators': 600},
{'n_estimators': 1000}],
'split0_test_score': array([0.77559185, 0.77864467, 0.77983937, 0.78052058]),
'split1_test_score': array([0.76537408, 0.77109939, 0.77235457, 0.7724209 ]),
'split2_test_score': array([0.78632834, 0.78828157, 0.78829273, 0.78811941]),
'mean_test_score': array([0.77576475, 0.77934188, 0.78016222, 0.78035363]),
'std_test_score': array([0.00855542, 0.0070319 , 0.00651073, 0.00640998]),
'rank_test_score': array([4, 3, 2, 1]),
'split0_train_score': array([0.86386211, 0.88101725, 0.89106634, 0.89835051]),
'mean_train_score': array([0.86142591, 0.87825044, 0.88820883, 0.89565627]),
'std_train_score': array([0.00275787, 0.00249875, 0.00218694, 0.00197394])},
0.7803536277850995)
In [195]:



Out[195]:
param_grid={'n_estimators': range(1000, 2000, 300)},
In [196]:
Out[196]:
({'n_estimators': 1000}, 0.7885965739886799)
n_estimators of 1000 is giving best result in range 400 to 1000
In [197]:
param_grid3 = {
'learning_rate': [0.1,0.2],
'min_samples_leaf': [5,10,20],
'min_samples_split': [5,10,20],
'n_estimators': [500,1000],
}
In [198]:


[Parallel(n_jobs=3)]: Done 44 tasks | elapsed: 5.1min
Out[198]:
param_grid={'learning_rate': [0.1, 0.2], 'min_samples_leaf': [5, 10, 20], 'min_samples_split'
: [5, 10, 20], 'n_estimators': [500, 1000]},
In [199]:
Out[199]:
({'learning_rate': 0.1,
'min_samples_leaf': 10,
'min_samples_split': 5,
'n_estimators': 1000},
0.7880978276736184)
In combination of 4 parameters above values are giving best result. We can see n_estimators of 1000 is best again. Now, will change the ranges
of other 3 parameters
In [200]:
param_grid4 = {
'max_depth': [5,10],
'min_samples_leaf': [5,8],
'min_samples_split': [20,30],
'n_estimators': [1000],
}
In [201]:


Out[201]:
param_grid={'learning_rate': [0.1, 0.15], 'max_depth': [5, 10], 'min_samples_leaf': [5, 8], '
min_samples_split': [20, 30], 'n_estimators': [1000]},
In [202]:
Out[202]:
'max_depth': 5,
0.7821899364744039)
Now the score has reduced compared to earlier run

In [203]:
param_grid5 = {
'learning_rate': [0.1],
'max_depth': [5],
'min_samples_leaf': [8,10],
}


Out[203]:
param_grid={'learning_rate': [0.1], 'max_depth': [5], 'min_samples_leaf': [8, 10], 'min_sampl
es_split': [30, 40], 'n_estimators': [1000]},
In [204]:
Out[204]:
'max_depth': 5,
0.7844535606632613)
Above score has improved from earlier runs

In [205]:
param_grid6 = {
'max_depth': [5],
'min_samples_leaf': [8],
}


Out[205]:
param_grid={'learning_rate': [0.1], 'max_depth': [5], 'min_samples_leaf': [8], 'min_samples_s
plit': [40, 50], 'n_estimators': [1000]},
In [206]:
Out[206]:
'max_depth': 5,
0.7828068526559553)
There is very marginal improvment in score. We are getting best score at min_samples_split of 40 among 30,40,50.
Will tune the final set of parameters along with above finalized ones
In [207]:
param_grid7 = {
'loss':['ls','lad','huber'],
'max_features': ['auto','sqrt'],
'max_depth': [5],
'min_samples_split': [40],
'subsample':[0.8,1]
}


Out[207]:
param_grid={'loss': ['ls', 'lad', 'huber'], 'max_features': ['auto', 'sqrt'], 'learning_rate'
: [0.1], 'max_depth': [5], 'min_samples_leaf': [8], 'min_samples_split': [40], 'n_estimators': [1000
], 'subsample': [0.8, 1]},
In [208]:
Out[208]:
'loss': 'huber',
'max_depth': 5,
'max_features': 'sqrt',
'n_estimators': 1000,
'subsample': 1},
0.7965973506104334)
There is improvement in the score. will try one more iteration with changing other parameters
In [209]:
param_gridF = {
'loss':['huber'],
'max_features': ['sqrt'],
'max_depth': [5,8],
'subsample':[1]
}
grid_searchF = GridSearchCV(estimator = GBR_test, param_grid = param_gridF,

grid_searchF.fit(X_train,y_train)
grid_searchF.best_params_,grid_searchF.best_score_

Out[209]:
'loss': 'huber',
'max_depth': 5,
'subsample': 1},
0.7958994895003749)
The above iteration gives best result of 0.799.

Final parameters that are giving best result on training set are:
'learning_rate': 0.1, 'loss': 'huber', 'max_depth': 5, 'max_features': 'sqrt', 'min_samples_leaf': 5, 'min_samples_split': 50, 'n_estimators': 1000,
'subsample': 1 
Hypertuning using graph

In [210]:
min_samples_leafs = range(1, 15, 1)

train_results = []
val_results = []
for min_samples_leaf in min_samples_leafs:
GBR_test=GradientBoostingRegressor(
loss='huber',
learning_rate=0.1,
n_estimators=1000,
subsample=1.0,
min_samples_split=40,
min_samples_leaf=min_samples_leaf,
max_depth=5,
random_state=22,
alpha=0.9,
)
GBR_test.fit(X_train,y_train)
y_GBR_predtr= GBR_test.predict(X_train)
y_GBR_predvl= GBR_test.predict(X_val)
result_leafs_tr=r2_score(y_GBR_predtr,y_train)
train_results.append(result_leafs_tr)
result_leafs_vl=r2_score(y_GBR_predvl,y_val)
val_results.append(result_leafs_vl)
from matplotlib.legend_handler import HandlerLine2D

line1, = plt.plot(min_samples_leafs,train_results,"b", label='Train r2')
line2, = plt.plot(min_samples_leafs, val_results,"r", label='val r2')
plt.legend(handler_map={line1: HandlerLine2D(numpoints=2)})
plt.ylabel("r2 score")
plt.xlabel("min samples leaf")
plt.show()
From above, min_samples_leaf of 6 is giving best score

In [211]:
min_samples_splits = [10,15,30,50,100,500,700,1000]
train_results_spt = []
val_results_spt = []
for min_samples_split in min_samples_splits:
loss='huber',
learning_rate=0.1,
n_estimators=1000,
subsample=1.0,
min_samples_split=min_samples_split,
min_samples_leaf=5,
max_depth=5,
random_state=22,
alpha=0.9,
)
result_spt_tr=r2_score(y_GBR_predtr,y_train)
train_results_spt.append(result_spt_tr)
result_spt_vl=r2_score(y_GBR_predvl,y_val)
val_results_spt.append(result_spt_vl)

line1, = plt.plot(min_samples_splits,train_results_spt,"b", label='Train R2')
line2, = plt.plot(min_samples_splits, val_results_spt,"r", label='Val R2')
plt.ylabel("R2 score")
plt.xlabel("min samples split")
plt.show()
From above, min_samples_splits of about 10 is giving best score. Will try expanding the range around 10
In [212]:
min_samples_splits = [10,15,20,30,40,50,60,70,80,90,100]
loss='huber',
learning_rate=0.1,
n_estimators=1000,
subsample=1.0,
min_samples_leaf=5,
max_depth=5,
random_state=22,
alpha=0.9,
)

plt.show()
From above, min_samples_splits of about 10 is giving best score

In [213]:
min_samples_splits = [7,8,9,10,11,12,13,14,15,20]
loss='huber',
learning_rate=0.1,
n_estimators=1000,
subsample=1.0,
min_samples_leaf=5,
max_depth=5,
random_state=22,
alpha=0.9,
)

plt.show()
From above, min_samples_splits of about 12 is giving best score

In [214]:
max_depths = range(3,11,1)
train_results_dpt = []
val_results_dpt = []
for max_depth in max_depths:
loss='huber',
learning_rate=0.1,
n_estimators=1000,
subsample=1.0,
min_samples_leaf=6,
max_depth=max_depth,
random_state=22,
alpha=0.9,
)
result_dpt_tr=r2_score(y_GBR_predtr,y_train)
train_results_dpt.append(result_dpt_tr)
result_dpt_vl=r2_score(y_GBR_predvl,y_val)
val_results_dpt.append(result_dpt_vl)

line1, = plt.plot(max_depths,train_results_dpt,"b", label='Train R2')
line2, = plt.plot(max_depths, val_results_dpt,"r", label='Val R2')
plt.xlabel("max depth")
plt.show()
From above, max_depth of about 6 is giving best score for validation set and not overfitting of training set
In [215]:
estimators = range(100,1500,100)
train_results_est = []
val_results_est = []
for n_estimators in estimators:
loss='huber',
learning_rate=0.1,
n_estimators=n_estimators,
subsample=1.0,
min_samples_leaf=6,
max_depth=9,
random_state=22,
alpha=0.9,
)
result_est_tr=r2_score(y_GBR_predtr,y_train)
train_results_est.append(result_est_tr)
result_est_vl=r2_score(y_GBR_predvl,y_val)
val_results_est.append(result_est_vl)

line1, = plt.plot(estimators,train_results_est,"b", label='Train R2')
line2, = plt.plot(estimators, val_results_est,"r", label='Val R2')
plt.xlabel("n_estimators")
plt.show()
From above, n_estimators of about 1000 is giving best score
In [217]:
param_gridF = {
'loss':['huber'],
'max_depth': [6],
'subsample':[1]
}

grid_searchF.best_score_

Out[217]:
0.7934419703161365
In [218]:
param_gridF = {
'loss':['huber'],
'max_depth': [5],
'subsample':[1]
}

grid_searchF.best_score_,grid_searchF.best_params_

Out[218]:
(0.7928868850462906,
{'learning_rate': 0.1,
'loss': 'huber',
'max_depth': 5,
'subsample': 1})
We can conclude from above that gridsearch CV is giving better results compared to that of tuning done by graphical method of individual
parameters
Final parameters that are giving best result on training set are:
'learning_rate': 0.1, 'loss': 'huber', 'max_depth': 5, 'max_features': 'sqrt', 'min_samples_leaf': 5, 'min_samples_split': 50, 'n_estimators': 1000,
'subsample': 1 
CONFIDENCE INTERVAL
In [219]:
GBR_bestparam=GradientBoostingRegressor(
loss='huber',
learning_rate=0.1,
n_estimators=1000,
subsample=1.0,
min_samples_leaf=5,
max_depth=5,
random_state=22,
alpha=0.9,
)
GBR_bestparam.fit(X_train,y_train)
y_GBRF_predtr= GBR_bestparam.predict(X_train)
y_GBRF_predvl= GBR_bestparam.predict(X_val)
y_GBRF_predts= GBR_bestparam.predict(X_test)
In [220]:

GBRF_trscore=r2_score(y_train,y_GBRF_predtr)
GBRF_trRMSE=np.sqrt(mean_squared_error(y_train, y_GBRF_predtr))
GBRF_trMSE=mean_squared_error(y_train, y_GBRF_predtr)
GBRF_trMAE=mean_absolute_error(y_train, y_GBRF_predtr)
GBRF_vlscore=r2_score(y_val,y_GBRF_predvl)
GBRF_vlRMSE=np.sqrt(mean_squared_error(y_val, y_GBRF_predvl))
GBRF_vlMSE=mean_squared_error(y_val, y_GBRF_predvl)
GBRF_vlMAE=mean_absolute_error(y_val, y_GBRF_predvl)
GBRF_tsscore=r2_score(y_test,y_GBRF_predts)
GBRF_tsRMSE=np.sqrt(mean_squared_error(y_test, y_GBRF_predts))
GBRF_tsMSE=mean_squared_error(y_test, y_GBRF_predts)
GBRF_tsMAE=mean_absolute_error(y_test, y_GBRF_predts)
GBRF_df=pd.DataFrame({'Method':['GBRF'],'Val Score':GBRF_vlscore,'RMSE_vl': GBRF_vlRMSE, 'MSE_vl': GBRF_vlMSE,'tr

ain Score':GBRF_trscore,'RMSE_tr': GBRF_trRMSE, 'MSE_tr': GBRF_trMSE,'test Score':GBRF_tsscore,'RMSE_ts': GBRF_ts
RMSE, 'MSE_ts': GBRF_tsMSE})
GBRF_df
Out[220]:
Method Val Score RMSE_vl MSE_vl train Score RMSE_tr MSE_tr test Score RMSE_ts MSE_ts
0 GBRF 0.80096 115867.988855 1.342539e+10 0.898909 81372.879729 6.621546e+09 0.793584 114695.310542 1.315501e+10
In [221]:
from sklearn.model_selection import KFold

num_folds = 50
seed = 7
kfold = KFold(n_splits=num_folds, random_state=seed)

model = GradientBoostingRegressor(n_estimators = 200, learning_rate = 0.1, random_state=22)
results = cross_val_score(GBR_bestparam, X, y, cv=kfold)
print(results)
print("Accuracy: %.3f%% (%.3f%%)" % (results.mean()*100.0, results.std()*100.0))
[0.86054651 0.81529 0.80351765 0.86060958 0.79642892 0.85548539

0.78098527 0.77925365 0.81822936 0.82102096 0.87499995 0.81800409
0.81853737 0.82096864 0.82206478 0.85415595 0.7952127 0.77879311
0.85529758 0.83972439 0.76258618 0.80910137 0.80208101 0.82664724
0.7825543 0.8601369 0.77441922 0.78867005 0.84107987 0.79025948
0.84773597 0.76865873 0.78487112 0.80018574 0.82324413 0.82243794
0.74048912 0.82370621 0.82606705 0.83661657 0.79192532 0.8126131
0.79097264 0.81741328 0.76640402 0.77512715 0.78013298 0.7859921
0.73054971 0.76721522]
Accuracy: 80.798% (3.241%)
In [222]:
from matplotlib import pyplot

# plot scores
pyplot.hist(results)
pyplot.show()
# confidence intervals
alpha = 0.95 # for 95% confidence
p = ((1.0-alpha)/2.0) * 100 # tail regions on right and left .25 on each side indicated by P value (
border)
lower = max(0.0, np.percentile(results, p))
p = (alpha+((1.0-alpha)/2.0)) * 100
upper = min(1.0, np.percentile(results, p))
print('%.1f confidence interval %.1f%% and %.1f%%' % (alpha*100, lower*100, upper*100))
95.0 confidence interval 74.5% and 86.1%
Dataset-1 Final summary:

The ensemble models have performed well compared to that of linear,KNN,SVR models
The best performance is given by Gradient boosting model with training (score-0.89,RMSE-81372), Validation (score-0.80,RSME-115867),
Testing(score-0.79,RMSE-114695) The 95% confidence interval scores range from 0.72 to 0.85.
The top key features that drive the price of the property are: 'furnished_1', 'yr_built', 'living_measure','quality_8', 'HouseLandRatio',
'lot_measure15', 'quality_9', 'ceil_measure', 'total_area'.
The above data is also reinforced by the analysis done during bivariate analysis.
For further improvization, the datasets can be made by treating outliers in different ways and hypertuning the ensemble models.

Dataset-2
In [2]:
import geopandas as gpd

from shapely.geometry import Point, Polygon
#For current working directory
import os
cwd = os.getcwd()
In [224]:
## Need to add file USA ZipCodes_1.xlsx to current working directory to access this data
USAZip=pd.read_excel("USA ZipCodes_1.xlsx",sheet_name="Sheet8")
USAZip.head()
Out[224]:
zipcode City County Type
2 98003 Federal Way King Standard
In [239]:
house_df = pd.read_csv('innercity.csv')
In [240]:
house_df1=house_df.merge(USAZip,how='left',on='zipcode')
#house_df.drop_duplicates()
house_df.shape
Out[240]:
(21613, 23)
In [5]:
#Add the folder WA to your current working directory

usa = gpd.read_file(cwd+'\\WA\\WSDOT__City_Limits.shp')
usa.head()
gdf = gpd.GeoDataFrame(
house_df,geometry = [Point(xy) for xy in zip(house_df['long'], house_df['lat'])])
#We can now plot our ``GeoDataFrame``
ax=usa[usa.CityName.isin(house_df.City.unique())].plot(
color='white', edgecolor='black',figsize=(20,8))
gdf.plot(ax=ax, color='green', marker='o',markersize=0.1)
Out[5]:
<matplotlib.axes._subplots.AxesSubplot at 0x1ccf1142588>
In [241]:
#After analysis in p1 - Dropping 'cid','dayhours','basement','yr_built','yr_renovated','zipcode','lat','long','Co

unty','Type',
#'geometry','quality_group','month_year' columns.
cols=['cid','dayhours']
house_df_1=house_df.drop(cols, inplace = False, axis = 1)
The dataset worked earlier are giving r2 score on validation set in range 70%-75% with RMSE in range of 96000 to 155000. Trying with a
different dataset to see if this could be improved further.
For analysis in this iteration categorizing coast, furnished and quality. As in previous version tranformed many features but not got desired
result.
TREATING OUTLIERS
Removing data points which fall into below criteria:
1. living_measure greater than 9000

2. price greater than 4000000
3. romm_bed greater than 10
4. room_bath greater than 6
We have lost 20 records which is 0.09% of the data available. These records are extreme values for which we dont have much of data to provide
their better estimate. Hence removing them.
In [242]:
house_df_2=house_df_1[(house_df['living_measure']<=9000) & (house_df_1['price']<=4000000) &

(house_df_1['room_bed']<=10) & (house_df_1['room_bath']<=6) ]
house_df_2.shape
Out[242]:
(21593, 21)
In [243]:
house_df_2.columns
Out[243]:

'ceil', 'coast', 'sight', 'condition', 'quality', 'ceil_measure',
'basement', 'yr_built', 'yr_renovated', 'zipcode', 'lat', 'long',
'living_measure15', 'lot_measure15', 'furnished', 'total_area'],
dtype='object')
In [252]:
# Convert into dummies

house_df_final = pd.get_dummies(house_df_2, columns=['coast', 'quality', 'furnished'],drop_first=True)
In [253]:
house_df_final.columns
Out[253]:

'ceil', 'sight', 'condition', 'ceil_measure', 'basement', 'yr_built',
'yr_renovated', 'zipcode', 'lat', 'long', 'living_measure15',
'lot_measure15', 'total_area', 'coast_1', 'quality_3', 'quality_4',
'quality_10', 'quality_11', 'quality_12', 'quality_13', 'furnished_1'],
dtype='object')
In [254]:
house_df_final.shape
Out[254]:
(21593, 31)
In [268]:
#Final Data columns

house_df_final.columns
Out[268]:
'ceil', 'sight', 'condition', 'ceil_measure', 'basement', 'yr_built',
'yr_renovated', 'zipcode', 'lat', 'long', 'living_measure15',
'lot_measure15', 'total_area', 'coast_1', 'quality_3', 'quality_4',
'quality_10', 'quality_11', 'quality_12', 'quality_13', 'furnished_1'],
dtype='object')
Shows the Data Correlation between Attributes with Heatmap
In [256]:
#total_area is highly correlated with lot_measure, ceil_measure is highly correlated with living_measure
house_corr_2 = house_df_final.corr(method ='pearson')
house_corr_2.to_excel("house_corr_2.xls")
sns.heatmap(house_corr_2,cmap="coolwarm", annot=True,annot_kws={"size":9},fmt='.2')
Out[256]:
<matplotlib.axes._subplots.AxesSubplot at 0x225943454a8>
In [257]:
#creating a copy of the final dataframe

dff2=house_df_final.copy()
In [258]:
df_train, df_test = train_test_split(dff2, test_size=0.2, random_state=10)

df_train, df_val = train_test_split(df_train, test_size=0.2, random_state=10)
In [259]:
print(df_train.shape)
print(df_test.shape)
print(df_val.shape)
(13819, 31)
(4319, 31)
(3455, 31)
In [260]:
# Split the 'df_train' set into X and y

X_train2 = df_train.drop(['price'],axis=1)
y_train2 = df_train['price']
len_train=len(X_train2)
X_train2.shape
y_train2.head()
Out[260]:
1320 330000
16628 245000
2923 369000
15818 532000
4665 506400
Name: price, dtype: int64
In [261]:
# Split the 'df_val' set into X and y

X_val2 = df_val.drop(['price'],axis=1)
y_val2 = df_val['price']
len_val=len(X_val2)
X_val2.shape
y_val2.head()
Out[261]:
6030 225000
16781 373500
17420 325000
4147 260000
17992 233000
In [262]:
# Split the 'df_test' set into X and y
X_test2 = df_test.drop(['price'],axis=1)
y_test2 = df_test['price']
X_test2.shape
len_test=len(X_test2)
y_test2.head()
Out[262]:
19155 510000
10450 264500
14277 266000
7601 735000
6563 600000
Will use XGboost model apart from models that used earlier on dataset-1
Creating Dataframe for Results and Function to compute the scores for each model on its Train and Validation
datasets
In [24]:
#Creating empty dataframe to capture results

result_dff=pd.DataFrame()
In [25]:
#Function to give results of the models for its train and validation dataset.
#as input it requries model name to display, algorithm, train indepedent variables, train dependent variable,
#validation indepedent variables, validation dependent variable.
def result (model,pipe_model,X_train_set,y_train_set,X_val_set,y_val_set):
pipe_model.fit(X_train_set,y_train_set)
y_train_predict= pipe_model.predict(X_train_set)
y_val_predict= pipe_model.predict(X_val_set)
trscore=r2_score(y_train_set,y_train_predict)
trRMSE=np.sqrt(mean_squared_error(y_train_set,y_train_predict))
trMSE=mean_squared_error(y_train_set,y_train_predict)
trMAE=mean_absolute_error(y_train_set,y_train_predict)
vlscore=r2_score(y_val,y_val_predict)
vlRMSE=np.sqrt(mean_squared_error(y_val,y_val_predict))
vlMSE=mean_squared_error(y_val,y_val_predict)
vlMAE=mean_absolute_error(y_val,y_val_predict)
result_df=pd.DataFrame({'Method':[model],'val score':vlscore,'RMSE_val':vlRMSE,'MSE_val':vlMSE,'MAE_vl': vlMA
E,
'train Score':trscore,'RMSE_tr': trRMSE,'MSE_tr': trMSE, 'MAE_tr': trMAE})
#Plot between actual and predicted values
sns.lineplot(range(len(y_val_set)),y_val_set,color='blue',linewidth=1.5)
sns.lineplot(range(len(y_val_set)),y_val_predict,color='hotpink',linewidth=.5)
plt.title('Actual and Predicted', fontsize=20) # Plot heading
plt.xlabel('Index', fontsize=10) # X-label
plt.ylabel('Values', fontsize=10) # Y-label
return result_df
LINEAR REGRESSION
In [26]:
#Starting with RFE first as there are many features

from sklearn.linear_model import LinearRegression
In [27]:
clf=LinearRegression()
pipe_lr = Pipeline([('LR', clf)])
result_dff=pd.concat([result_dff,result('Linear Reg',pipe_lr,X_train,y_train,X_val,y_val)])
result_dff
Out[27]:
Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr
0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
In [28]:
#checking the magnitude of coefficients

predictors = X_train.columns
coef = pd.Series(clf.coef_,predictors).sort_values()
coef.plot(kind='bar', title='Model Coefficients',color='darkblue',figsize=(10,5))
Out[28]:
<matplotlib.axes._subplots.AxesSubplot at 0x1ccf527d438>
RIDGE REGRESSION
In [29]:
from sklearn.linear_model import Ridge

from sklearn.preprocessing import StandardScaler
In [30]:
clf=Ridge()
pipe_ridge = Pipeline([('Ridge', clf)])
result_dff=pd.concat([result_dff,result('Ridge_Reg_1',pipe_ridge,X_train,y_train,X_val,y_val)])
result_dff
Out[30]:
0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158

In [31]:

Out[31]:
<matplotlib.axes._subplots.AxesSubplot at 0x1ccf5c6cd30>
In [32]:
#Iteration 2
clf=Ridge(alpha=0.08)
pipe_ridge_1 = Pipeline([('Ridge',clf )])
result_dff=pd.concat([result_dff,result('Ridge_Reg_2',pipe_ridge_1,X_train,y_train,X_val,y_val)])
result_dff
Out[32]:
0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012

In [33]:

Out[33]:
<matplotlib.axes._subplots.AxesSubplot at 0x1ccf5d78358>
LASSO REGRESSION
In [34]:
from sklearn.linear_model import Lasso

In [35]:
clf=Lasso(alpha=10, max_iter=1000)
pipe_lasso_1 = Pipeline([('Lasso',clf )])
result_dff=pd.concat([result_dff,result('Lasso_Reg_1',pipe_lasso_1,X_train,y_train,X_val,y_val)])
result_dff
Out[35]:
0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034

In [36]:

coef = pd.Series(clf.coef_,predictors).sort_values(ascending=False)
coef
Out[36]:
quality_13 1282757.93547
quality_12 720634.25526
lat 603385.77684
coast_1 515951.63187
furnished_1 356060.28711
quality_11 254292.72718
quality_8 51062.02158
sight 48526.08682
quality_3 47977.01520
room_bath 44364.15660
condition 35706.89861
ceil 28507.08484
yr_renovated 23.50688
total_area 0.35066
quality_10 -0.00000
lot_measure -0.19029
lot_measure15 -0.29272
basement -8.54716
ceil_measure -15.67109
zipcode -512.28283
yr_built -2269.56051
quality_7 -16734.79835
room_bed -18988.08235
quality_6 -63515.14160
quality_4 -89548.16152
quality_5 -97142.30729
long -172566.03480
quality_9 -177720.13306
dtype: float64
KNN Regressor
In [37]:
from sklearn.neighbors import KNeighborsRegressor
pipe_knr = Pipeline([('KNNR', KNeighborsRegressor(n_neighbors=20,weights='distance'))])

result_dff=pd.concat([result_dff,result('KNN Reg',pipe_knr,X_train,y_train,X_val,y_val)])
result_dff
Out[37]:
0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
Support Vector Regressor
In [38]:
#The model is not performing well at all.

#from sklearn.svm import SVR
#from sklearn.preprocessing import StandardScaler
#pipe_svr_1 = Pipeline([('scl', StandardScaler()),('SVR_1', SVR(kernel='rbf'))])

#result_dff=pd.concat([result_dff,result('SVR_1',pipe_svr_1,X_train_rfe,y_train,X_val_rfe,y_val)])
#result_dff
DECISION TREE
In [39]:
#Feature importance function

def feat_imp(model,X_data_set):
imp_feature_1=pd.DataFrame(model.feature_importances_, columns = ["Imp"], index = X_data_set.columns)
imp_feature_1=imp_feature_1.sort_values(by="Imp",ascending=False)
print(imp_feature_1)
#feature importance
imp_feature_1[:30].plot.bar(figsize=(15,5))
#First 20 and 30 feature importance sum

print("\nFirst 8 feature importance:\t",(imp_feature_1[:8].sum())*100)
print("\nFirst 12 feature importance:\t",(imp_feature_1[:12].sum())*100)
In [40]:
#Import library
from sklearn.tree import DecisionTreeRegressor
clf=DecisionTreeRegressor(random_state=1)
pipe_DT_1=Pipeline([('DT1',clf)])
result_dff=pd.concat([result_dff,result('DT1',pipe_DT_1,X_train,y_train,X_val,y_val)])
result_dff
Out[40]:
0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
In [41]:
#Feature importance
feat_imp(clf,X_train)
Imp
furnished_1 0.33440
lat 0.17853
long 0.06748
coast_1 0.03510
yr_built 0.03233
lot_measure 0.01480
zipcode 0.01341
total_area 0.00832
quality_9 0.00781
room_bath 0.00697
sight 0.00633
quality_8 0.00496
basement 0.00436
condition 0.00266
quality_12 0.00247
quality_10 0.00206
room_bed 0.00199
ceil 0.00180
quality_13 0.00048
quality_11 0.00044
quality_7 0.00030
quality_6 0.00026
quality_5 0.00008
quality_4 0.00000
quality_3 0.00000

dtype: float64

dtype: float64
RANDOM FOREST REGRESSOR
In [42]:
from sklearn.ensemble import RandomForestRegressor
In [43]:
clf=RandomForestRegressor(random_state=2)
pipe_RF_1=Pipeline([('RF1',clf)])
result_dff=pd.concat([result_dff,result('RF1',pipe_RF_1,X_train,y_train,X_val,y_val)])
result_dff
Out[43]:
0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
In [44]:
#Feature importance
Imp
furnished_1 0.30826
lat 0.17234
long 0.06825
yr_built 0.02564
coast_1 0.02493
sight 0.01985
zipcode 0.01531
quality_9 0.01243
total_area 0.01047
lot_measure 0.00850
room_bath 0.00705
basement 0.00688
quality_8 0.00417
room_bed 0.00380
condition 0.00321
quality_12 0.00262
ceil 0.00221
quality_11 0.00169
quality_10 0.00148
quality_13 0.00096
quality_7 0.00063
quality_6 0.00030
quality_5 0.00005
quality_4 0.00001
quality_3 0.00000

dtype: float64

dtype: float64

In [45]:
clf=RandomForestRegressor(n_estimators=50,max_depth=18,min_samples_leaf=10,random_state=3)
pipe_RF_2=Pipeline([('RF2',clf)])
result_dff=pd.concat([result_dff,result('RF2',pipe_RF_2,X_train,y_train,X_val,y_val)])
result_dff
Out[45]:
0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
In [46]:
#Feature importance
Imp
furnished_1 0.34209
lat 0.18194
long 0.07106
yr_built 0.02336
sight 0.01984
zipcode 0.01135
quality_9 0.00908
coast_1 0.00864
total_area 0.00561
quality_8 0.00449
lot_measure 0.00336
room_bath 0.00277
basement 0.00172
quality_12 0.00139
condition 0.00123
quality_11 0.00095
room_bed 0.00073
quality_10 0.00073
quality_7 0.00044
ceil 0.00036
quality_6 0.00017
quality_5 0.00001
quality_4 0.00000
quality_3 0.00000
quality_13 0.00000

dtype: float64

dtype: float64
Gradient Boost Regressor

In [47]:
from sklearn.ensemble import GradientBoostingRegressor
clf=GradientBoostingRegressor(random_state=4)
pipe_GB_1=Pipeline([('GB1',clf)])
result_dff=pd.concat([result_dff,result('GB1',pipe_GB_1,X_train,y_train,X_val,y_val)])
result_dff
Out[47]:
0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
In [48]:
#Feature importance
Imp
furnished_1 0.21738
lat 0.17507
long 0.06494
coast_1 0.03081
yr_built 0.03081
sight 0.02848
zipcode 0.01718
quality_9 0.01411
quality_12 0.00933
quality_8 0.00850
room_bath 0.00848
quality_11 0.00673
quality_13 0.00363
condition 0.00300
basement 0.00221
total_area 0.00147
lot_measure 0.00079
quality_7 0.00052
ceil 0.00048
quality_10 0.00046
room_bed 0.00037
quality_6 0.00017
quality_3 0.00000
quality_4 0.00000
quality_5 0.00000

dtype: float64

dtype: float64

In [49]:
clf=GradientBoostingRegressor(n_estimators=150,max_depth=5,random_state=5)
pipe_GB_2=Pipeline([('GB2',clf)])
result_dff=pd.concat([result_dff,result('GB2',pipe_GB_2,X_train,y_train,X_val,y_val)])
result_dff
Out[49]:
0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
In [50]:
#Feature importance
Imp
furnished_1 0.22921
lat 0.17826
long 0.07063
yr_built 0.03118
coast_1 0.03031
quality_9 0.02114
sight 0.02033
zipcode 0.01644
quality_8 0.00939
quality_10 0.00815
total_area 0.00797
room_bath 0.00609
lot_measure 0.00417
basement 0.00412
quality_12 0.00352
quality_11 0.00347
condition 0.00311
quality_13 0.00196
room_bed 0.00107
ceil 0.00096
quality_7 0.00053
quality_6 0.00009
quality_5 0.00004
quality_3 0.00000
quality_4 0.00000

dtype: float64

dtype: float64
XGBOOST REGRESSOR
In [51]:
from xgboost.sklearn import XGBRegressor
clf=XGBRegressor(objective='reg:squarederror',random_state=6)
pipe_XGB_1=Pipeline([('XGB1',clf)])
result_dff=pd.concat([result_dff,result('XGB1',pipe_XGB_1,X_train,y_train,X_val,y_val)])
result_dff
Out[51]:
0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
In [52]:
#Feature importance
Imp
furnished_1 0.44495
quality_9 0.15441
coast_1 0.04030
sight 0.03631
quality_8 0.03330
lat 0.03246
long 0.02696
quality_12 0.02049
yr_built 0.01917
room_bath 0.01360
zipcode 0.01226
quality_11 0.01098
quality_7 0.00875
quality_13 0.00664
condition 0.00428
basement 0.00252
lot_measure 0.00238
ceil 0.00213
total_area 0.00198
quality_6 0.00197
room_bed 0.00186
quality_3 0.00000
quality_4 0.00000
quality_5 0.00000
quality_10 0.00000

dtype: float32

dtype: float32

In [53]:
clf=XGBRegressor(n_estimators=150,max_depth=5,random_state=7)
pipe_XGB_2=Pipeline([('XGB2',clf)])
result_dff=pd.concat([result_dff,result('XGB2',pipe_XGB_2,X_train,y_train,X_val,y_val)])
result_dff
[18:09:21] WARNING: src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of re

g:squarederror.
Out[53]:
0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395
In [54]:
#Feature importance
Imp
furnished_1 0.59499
quality_9 0.06470
coast_1 0.04365
quality_8 0.03345
lat 0.03122
quality_10 0.03027
sight 0.02396
long 0.01961
quality_12 0.01526
yr_built 0.01016
quality_11 0.00679
quality_13 0.00676
zipcode 0.00626
quality_7 0.00439
condition 0.00407
total_area 0.00362
room_bath 0.00278
lot_measure 0.00217
basement 0.00205
quality_6 0.00153
ceil 0.00153
room_bed 0.00113
quality_5 0.00055
quality_4 0.00000
quality_3 0.00000

dtype: float32

dtype: float32
ADABOOST REGRESSOR
In [55]:
from sklearn.ensemble import AdaBoostRegressor
clf= AdaBoostRegressor(DecisionTreeRegressor(random_state=8))
pipe_ADAB_1=Pipeline([('ADAB1',clf)])
result_dff=pd.concat([result_dff,result('ADAB1',pipe_ADAB_1,X_train,y_train,X_val,y_val)])
result_dff
Out[55]:
0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395
0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105
In [56]:
#Feature importance
Imp
lat 0.09959
furnished_1 0.06601
long 0.06142
coast_1 0.04096
sight 0.03042
yr_built 0.01886
zipcode 0.01391
total_area 0.01116
room_bath 0.01004
lot_measure 0.00888
quality_11 0.00824
basement 0.00793
quality_12 0.00540
quality_13 0.00373
quality_9 0.00355
room_bed 0.00343
ceil 0.00261
condition 0.00235
quality_8 0.00226
quality_10 0.00209
quality_7 0.00055
quality_6 0.00017
quality_5 0.00002
quality_4 0.00000
quality_3 0.00000

dtype: float64

dtype: float64

In [57]:
clf= AdaBoostRegressor(DecisionTreeRegressor(max_depth=20),n_estimators=250,learning_rate=0.005,random_state=9)
pipe_ADAB_2=Pipeline([('ADAB2',clf)])
result_dff=pd.concat([result_dff,result('ADAB2',pipe_ADAB_2,X_train,y_train,X_val,y_val)])
result_dff
Out[57]:
0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395
0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105
0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662
In [58]:
#Feature importance
Imp
furnished_1 0.22982
lat 0.16848
long 0.07221
coast_1 0.02876
yr_built 0.02456
sight 0.01669
zipcode 0.01550
total_area 0.01060
lot_measure 0.00862
quality_9 0.00846
room_bath 0.00701
basement 0.00560
quality_8 0.00364
room_bed 0.00335
condition 0.00317
quality_12 0.00265
quality_11 0.00260
ceil 0.00218
quality_10 0.00175
quality_13 0.00100
quality_7 0.00080
quality_6 0.00032
quality_5 0.00008
quality_4 0.00001
quality_3 0.00000

dtype: float64

dtype: float64
BAGGING REGRESSION
In [59]:
from sklearn.ensemble import BaggingRegressor
clf= BaggingRegressor(random_state=10)
pipe_BAG_1=Pipeline([('BAG1',clf)])
result_dff=pd.concat([result_dff,result('BAG1',pipe_BAG_1,X_train,y_train,X_val,y_val)])
result_dff
Out[59]:
0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395
0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105
0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662
0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947

In [60]:
#Feature Importance
feature_importances = np.mean([ tree.feature_importances_ for tree in clf.estimators_], axis=0)
bg_imp_feature=pd.DataFrame(feature_importances, columns = ["Imp"],index=X_train.columns)
bg_imp_feature.sort_values(by="Imp",ascending=False)
Out[60]:
Imp
furnished_1 0.32952
lat 0.17412
long 0.06964
yr_built 0.03000
coast_1 0.02448
zipcode 0.01548
sight 0.01531
total_area 0.00974
quality_9 0.00967
lot_measure 0.00809
room_bath 0.00737
basement 0.00434
room_bed 0.00403
quality_8 0.00399
condition 0.00313
ceil 0.00228
quality_11 0.00182
quality_12 0.00148
quality_10 0.00137
quality_13 0.00084
quality_7 0.00068
quality_6 0.00042
quality_5 0.00010
quality_4 0.00001
quality_3 0.00000
In [61]:
clf= BaggingRegressor(DecisionTreeRegressor(max_depth=12),n_estimators=250,random_state=11)
pipe_BAG_2=Pipeline([('BAG2',clf)])
result_dff=pd.concat([result_dff,result('BAG2',pipe_BAG_2,X_train,y_train,X_val,y_val)])
result_dff
Out[61]:
0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395
0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105
0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662
0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947
0 BAG2 0.87271 121403.44959 14738797572.78922 71079.17635 0.95589 73206.74410 5359227381.64863 48075.77140

In [62]:
#Feature Importance
pd.options.display.float_format = '{:.5f}'.format
feature_importances = np.mean([ tree.feature_importances_ for tree in clf.estimators_], axis=0)
bg_imp_feature=pd.DataFrame(feature_importances, columns = ["Imp"],index=X_train.columns)
bg_imp_feature.sort_values(by="Imp",ascending=False)
Out[62]:
Imp
furnished_1 0.31748
lat 0.17613
long 0.06834
coast_1 0.02891
yr_built 0.02585
sight 0.01504
zipcode 0.01456
quality_9 0.00935
total_area 0.00814
lot_measure 0.00665
room_bath 0.00590
basement 0.00445
quality_8 0.00431
quality_12 0.00272
quality_11 0.00231
condition 0.00229
room_bed 0.00227
ceil 0.00158
quality_10 0.00150
quality_13 0.00076
quality_7 0.00048
quality_6 0.00021
quality_5 0.00006
quality_4 0.00000
quality_3 0.00000
In [ ]:
Dataset-2 model performance Summary
We have used Linear Regression, Ridge and Lasso, KNN, Ensemble Techniques - Decision Trees, Random Forest, Bagging, AdaBoost,
Gradient Boost and XGBoost - its gradient boost with regularization and its faster. R2 score on validation in range 70%-87% with RMSE in range
76000-107000. The model is showing better results.Lets hypertune to see if results could be improved further. Will use Random Forest,
Gradient Boosting, XGBoost and AdaBoost hypertuning. Dropping features which are zero or very close to zero in all above 4 algos -
quality_12, quality_3, quality_4.
Kindly refer Excel sheet to compare the results.
In [ ]:
#Dropping features
X_train_ht=X_train.drop(['quality_5', 'quality_3', 'quality_4'],1)
X_test_ht=X_test.drop(['quality_5', 'quality_3', 'quality_4'],1)
X_val_ht=X_val.drop(['quality_5', 'quality_3', 'quality_4'],1)
In [ ]:
skf = KFold(n_splits=5, random_state=12)
RANDOM FOREST HYPERTUNE
In [65]:
#Tuning of Random Forest

RF_ht = RandomForestRegressor()
params = {"n_estimators": np.arange(76,84,1),"max_depth": np.arange(16,20,1),

"max_features":np.arange(6,9,1),'min_samples_leaf': range(5, 8, 1),
'min_samples_split': range(18, 20, 1)}
RF_GV_1 = GridSearchCV(estimator = RF_ht, param_grid = params,cv=skf,verbose=1,return_train_score=True,n_jobs=2)

RF_GV_1.fit(X_train_ht,y_train)

[Parallel(n_jobs=2)]: Done 46 tasks | elapsed: 35.0s
Out[65]:
GridSearchCV(cv=KFold(n_splits=5, random_state=12, shuffle=False),

error_score='raise-deprecating',
estimator=RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators='warn', n_jobs=None,
oob_score=False, random_state=None, verbose=0, warm_start=False),
param_grid={'n_estimators': array([76, 77, 78, 79, 80, 81, 82, 83]), 'max_depth': array([16,
17, 18, 19]), 'max_features': array([6, 7, 8]), 'min_samples_leaf': range(5, 8), 'min_samples_split'
: range(18, 20)},
pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
In [66]:
# results of grid search CV

RF_results = pd.DataFrame(RF_GV_1.cv_results_)
#parameters best value

best_score_rf = RF_GV_1.best_score_
best_rf = RF_GV_1.best_params_
best_rf
Out[66]:
{'max_depth': 18,
'max_features': 8,
'n_estimators': 81}
In [67]:
rf_best = RandomForestRegressor(max_depth= 18, max_features= 8,n_estimators=80,min_samples_leaf=5,min_samples_spl

it=18,
random_state=14)
result_dff=pd.concat([result_dff,result('RF_ht',rf_best,X_train_ht,y_train,X_val_ht,y_val)])
result_dff
Out[67]:
0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395
0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105
0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662
0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947
0 BAG2 0.87271 121403.44959 14738797572.78922 71079.17635 0.95589 73206.74410 5359227381.64863 48075.77140
0 RF_ht 0.86043 127122.92645 16160238428.30151 72573.17472 0.90876 105283.74570 11084667108.75019 57545.17898

In [68]:
#Feature importance
feat_imp(rf_best,X_train_ht)
Imp
furnished_1 0.16841
lat 0.15958
long 0.05371
room_bath 0.04081
yr_built 0.03216
sight 0.02628
zipcode 0.02266
quality_9 0.02174
coast_1 0.01627
basement 0.01216
quality_8 0.01187
total_area 0.01125
quality_11 0.00995
lot_measure 0.00946
quality_10 0.00673
quality_7 0.00643
condition 0.00437
quality_12 0.00272
quality_6 0.00216
room_bed 0.00135
ceil 0.00118
quality_13 0.00057

dtype: float64

dtype: float64
GRADIENT BOOST HYPERTUNE

In [69]:
GB_ht=GradientBoostingRegressor()
params = {"n_estimators": [138,142,1],"learning_rate":[0.08,0.09],"max_depth": np.arange(8, 11,1),
"max_features":np.arange(5,8,1),'min_samples_leaf': range(16, 21, 1)}
GB_GV_1 = GridSearchCV(estimator = GB_ht, param_grid = params,cv=skf,verbose=1,return_train_score=True,n_jobs=2)
GB_GV_1.fit(X_train_ht,y_train)

GB_results = pd.DataFrame(GB_GV_1.cv_results_)
best_score_rf = GB_GV_1.best_score_
best_gb = GB_GV_1.best_params_
best_gb

Out[69]:
{'learning_rate': 0.09,
'max_depth': 8,
'max_features': 7,
'n_estimators': 142}
In [70]:
gb_best = GradientBoostingRegressor(learning_rate= 0.09, n_estimators= 150,max_depth= 10,

max_features= 7,min_samples_leaf=19)
result_dff=pd.concat([result_dff,result('GB_ht',gb_best,X_train_ht,y_train,X_val_ht,y_val)])
result_dff
Out[70]:
0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395
0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105
0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662
0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947
0 BAG2 0.87271 121403.44959 14738797572.78922 71079.17635 0.95589 73206.74410 5359227381.64863 48075.77140
0 RF_ht 0.86043 127122.92645 16160238428.30151 72573.17472 0.90876 105283.74570 11084667108.75019 57545.17898
0 GB_ht 0.89602 109723.26914 12039195791.19686 64651.56480 0.96775 62595.11219 3918148069.74015 41729.66244

In [71]:
#Feature importance
feat_imp(gb_best,X_train_ht)
Imp
lat 0.15607
furnished_1 0.13874
long 0.06063
room_bath 0.04955
sight 0.03128
yr_built 0.02943
coast_1 0.02644
zipcode 0.02530
quality_9 0.01336
total_area 0.01053
lot_measure 0.01020
basement 0.00833
condition 0.00829
quality_7 0.00649
quality_12 0.00592
quality_8 0.00494
quality_11 0.00487
quality_10 0.00320
quality_6 0.00318
room_bed 0.00243
ceil 0.00125
quality_13 0.00000

dtype: float64

dtype: float64
ADABOOST HYPERTUNE
In [72]:
ADAB_ht=AdaBoostRegressor(DecisionTreeRegressor(max_depth=28))
params = {"n_estimators": [176,182,1],"learning_rate":[0.4,0.5,0.6],'loss':['linear','square']}
ADAB_GV_1 = GridSearchCV(estimator = ADAB_ht, param_grid = params,cv=skf,verbose=1,return_train_score=True,n_jobs
=2)
ADAB_GV_1.fit(X_train_ht,y_train)

Out[72]:

estimator=AdaBoostRegressor(base_estimator=DecisionTreeRegressor(criterion='mse', max_depth=2
8, max_features=None,
min_samples_split=2, min_weight_fraction_leaf=0.0,
presort=False, random_state=None, splitter='best'),
learning_rate=1.0, loss='linear', n_estimators=50,
random_state=None),
param_grid={'n_estimators': [176, 182, 1], 'learning_rate': [0.4, 0.5, 0.6], 'loss': ['linear
', 'square']},
In [73]:

ADAB_results = pd.DataFrame(ADAB_GV_1.cv_results_)
best_score_rf = ADAB_GV_1.best_score_
best_adab = ADAB_GV_1.best_params_
best_adab
Out[73]:
{'learning_rate': 0.5, 'loss': 'linear', 'n_estimators': 176}
In [74]:
adab_best = AdaBoostRegressor(DecisionTreeRegressor(max_depth=28),n_estimators=180,learning_rate=0.5,loss='linear
',
random_state=15)
result_dff=pd.concat([result_dff,result('ADAB_ht',adab_best,X_train_ht,y_train,X_val_ht,y_val)])
result_dff
Out[74]:
0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395
0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105
0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662
0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947
0 BAG2 0.87271 121403.44959 14738797572.78922 71079.17635 0.95589 73206.74410 5359227381.64863 48075.77140
0 RF_ht 0.86043 127122.92645 16160238428.30151 72573.17472 0.90876 105283.74570 11084667108.75019 57545.17898
0 GB_ht 0.89602 109723.26914 12039195791.19686 64651.56480 0.96775 62595.11219 3918148069.74015 41729.66244
0 ADAB_ht 0.89201 111821.98299 12504155880.21770 67601.23755 0.99349 28117.84264 790613074.89588 14412.70670

In [75]:
#Feature importance
feat_imp(adab_best,X_train_ht)
Imp
furnished_1 0.10561
lat 0.09726
long 0.05701
coast_1 0.03784
sight 0.02427
yr_built 0.01993
zipcode 0.01534
room_bath 0.01240
total_area 0.01094
lot_measure 0.00964
basement 0.00810
quality_9 0.00798
quality_11 0.00571
quality_12 0.00445
room_bed 0.00382
condition 0.00291
quality_10 0.00286
ceil 0.00268
quality_8 0.00253
quality_13 0.00252
quality_7 0.00073
quality_6 0.00017

dtype: float64

dtype: float64
XGBoost Regressor
In [76]:
#Regularization using GridSearchCV - 1st Iteration

XGB_ht_1=XGBRegressor(objective='reg:squarederror')
params1 = {
"colsample_bytree": [i/100.0 for i in range(66,74,2)],
"learning_rate": [0.2,0.22,0.24],
"n_estimators": [185,188,1],
"subsample": [i/100.0 for i in range(62,68,1)]
}
XGB_GV_1 = GridSearchCV(estimator = XGB_ht_1, param_grid = params1,
cv=skf,
verbose = 1,
return_train_score=True,n_jobs=2)
XGB_GV_1.fit(X_train_ht,y_train)

[Parallel(n_jobs=2)]: Done 1077 out of 1080 | elapsed: 7.5min remaining: 1.2s
Out[76]:

estimator=XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, gamma=0,
importance_type='gain', learning_rate=0.1, max_delta_step=0,
max_depth=3, min_child_weight=1, missing=None, n_estimators=100,
n_jobs=1, nthread=None, objective='reg:squarederror',
random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1,
seed=None, silent=None, subsample=1, verbosity=1),
param_grid={'colsample_bytree': [0.66, 0.68, 0.7, 0.72], 'learning_rate': [0.2, 0.22, 0.24],
'n_estimators': [185, 188, 1], 'subsample': [0.62, 0.63, 0.64, 0.65, 0.66, 0.67]},
In [77]:

XGB_results_1 = pd.DataFrame(XGB_GV_1.cv_results_)
best_score_xgb_1 = XGB_GV_1.best_score_
best_xgb_1 = XGB_GV_1.best_params_
best_xgb_1
Out[77]:
{'colsample_bytree': 0.68,
'learning_rate': 0.2,
'subsample': 0.67}
In [78]:
#Choosing best parameter from 1st Iteration

xgb_best_1 = XGBRegressor(colsample_bytree=0.7,learning_rate=0.22,n_estimators=186,subsample=0.65,objective='reg:
squarederror',
random_state=16)
result_dff=pd.concat([result_dff,result('xgb_1_ht',xgb_best_1,X_train_ht,y_train,X_val_ht,y_val)])
result_dff
Out[78]:
0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395
0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105
0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662
0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947
0 BAG2 0.87271 121403.44959 14738797572.78922 71079.17635 0.95589 73206.74410 5359227381.64863 48075.77140
0 RF_ht 0.86043 127122.92645 16160238428.30151 72573.17472 0.90876 105283.74570 11084667108.75019 57545.17898
0 GB_ht 0.89602 109723.26914 12039195791.19686 64651.56480 0.96775 62595.11219 3918148069.74015 41729.66244
0 ADAB_ht 0.89201 111821.98299 12504155880.21770 67601.23755 0.99349 28117.84264 790613074.89588 14412.70670
0 xgb_1_ht 0.88282 116484.15375 13568558075.71564 70335.11906 0.93052 91871.93035 8440451586.23162 61776.77670

In [79]:
#Feature importance
feat_imp(xgb_best_1,X_train_ht)
Imp
furnished_1 0.45991
quality_9 0.07530
lat 0.05495
sight 0.04581
coast_1 0.04306
quality_8 0.02980
long 0.02709
quality_12 0.02143
quality_6 0.01387
quality_11 0.01343
quality_13 0.01221
zipcode 0.01197
room_bath 0.01166
yr_built 0.01035
condition 0.01023
quality_10 0.00940
basement 0.00661
total_area 0.00550
ceil 0.00534
lot_measure 0.00348
room_bed 0.00313
quality_7 0.00312

dtype: float32

dtype: float32

In [80]:
#Regularization using GridSearchCV - 2nd Iteration
params2 = {
'min_child_weight':[6,7,8,9,10],"max_depth": [3,4,5],
}
xgb_best_2 = GridSearchCV(estimator = xgb_best_1, param_grid = params2,

cv=skf,
verbose = 1,
return_train_score=True,n_jobs=2)
xgb_best_2.fit(X_train_ht, y_train)

XGB_results_2 = pd.DataFrame(xgb_best_2.cv_results_)
XGB_results_2

best_score_xgb_2 = xgb_best_2.best_score_
best_xgb_2 = xgb_best_2.best_params_
best_xgb_2

Out[80]:
{'max_depth': 5, 'min_child_weight': 7}
In [81]:
#Choosing best parameter from 2nd Iteration

squarederror',
random_state=17,max_depth=4,min_child_weight=8)
result_dff
Out[81]:
0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395
0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105
0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662
0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947
0 BAG2 0.87271 121403.44959 14738797572.78922 71079.17635 0.95589 73206.74410 5359227381.64863 48075.77140
0 RF_ht 0.86043 127122.92645 16160238428.30151 72573.17472 0.90876 105283.74570 11084667108.75019 57545.17898
0 GB_ht 0.89602 109723.26914 12039195791.19686 64651.56480 0.96775 62595.11219 3918148069.74015 41729.66244
0 ADAB_ht 0.89201 111821.98299 12504155880.21770 67601.23755 0.99349 28117.84264 790613074.89588 14412.70670
0 xgb_1_ht 0.88282 116484.15375 13568558075.71564 70335.11906 0.93052 91871.93035 8440451586.23162 61776.77670
0 xgb_2_ht 0.88877 113483.76321 12878564511.84933 69351.16691 0.94593 81048.97730 6568936720.98572 55372.52924

In [82]:
#Feature importance
Imp
furnished_1 0.46173
quality_9 0.08325
coast_1 0.06976
lat 0.04473
sight 0.02880
room_bath 0.02802
quality_8 0.02780
long 0.02034
quality_10 0.01916
quality_7 0.01823
yr_built 0.01647
quality_12 0.01277
zipcode 0.01129
quality_11 0.01026
quality_13 0.00990
condition 0.00637
lot_measure 0.00535
room_bed 0.00455
total_area 0.00408
ceil 0.00406
basement 0.00383
quality_6 0.00243

dtype: float32

dtype: float32

In [83]:
#Regularization using GridSearchCV - 3rd Iteration
params3 = {
'gamma':[i/1.0 for i in range(50,55,1)]
}
xgb_best_3 = GridSearchCV(estimator = xgb_best_2, param_grid = params3,

cv=skf,
verbose = 1,
return_train_score=True)
xgb_best_3.fit(X_train_ht, y_train)

XGB_results_3 = pd.DataFrame(xgb_best_3.cv_results_)
XGB_results_3

best_score_xgb_3 = xgb_best_3.best_score_
best_xgb_3 = xgb_best_3.best_params_
best_xgb_3
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.

Out[83]:
{'gamma': 50.0}
In [84]:
#Choosing best parameter from 3rd Iteration

squarederror',
random_state=18,max_depth=4,min_child_weight=8,reg_lambda=52)
result_dff
Out[84]:
0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395
0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105
0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662
0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947
0 BAG2 0.87271 121403.44959 14738797572.78922 71079.17635 0.95589 73206.74410 5359227381.64863 48075.77140
0 RF_ht 0.86043 127122.92645 16160238428.30151 72573.17472 0.90876 105283.74570 11084667108.75019 57545.17898
0 GB_ht 0.89602 109723.26914 12039195791.19686 64651.56480 0.96775 62595.11219 3918148069.74015 41729.66244
0 ADAB_ht 0.89201 111821.98299 12504155880.21770 67601.23755 0.99349 28117.84264 790613074.89588 14412.70670
0 xgb_1_ht 0.88282 116484.15375 13568558075.71564 70335.11906 0.93052 91871.93035 8440451586.23162 61776.77670
0 xgb_2_ht 0.88877 113483.76321 12878564511.84933 69351.16691 0.94593 81048.97730 6568936720.98572 55372.52924
0 xgb_3_ht 0.89860 108356.33811 11741096009.16987 67404.86000 0.93004 92192.80765 8499513782.44646 60276.22610

In [85]:
#Feature importance
Imp
furnished_1 0.55026
coast_1 0.04955
sight 0.04906
lat 0.04026
quality_8 0.03380
long 0.01515
quality_6 0.01480
quality_11 0.01418
quality_12 0.01409
quality_9 0.00959
zipcode 0.00910
condition 0.00887
quality_10 0.00752
yr_built 0.00665
total_area 0.00638
quality_13 0.00578
room_bath 0.00527
room_bed 0.00497
ceil 0.00429
lot_measure 0.00399
basement 0.00379
quality_7 0.00128

dtype: float32

dtype: float32
We have executed many models and post comparing results we hyper tuned four models. All models are working well with R2 score greater
than 86% RMSE is below 132600.
But best of of all is Xtreme Gradient boost - which is enhanced version of gradient boost. It includes regularisation and is faster too. Its giving
R2 score of around 89.5% with RMSE of around 109000.
Moving forward this model can be improved further as dont have much data for very high priced houses. So when more data comes in we can
revisit our model and make mecessary changes to accommodate more variation in data to deliver better results, maybe try to decrease RMSE.
Finally lets run our model on test data, which we havent used till now and see how it performs.
Executing xgb_3_ht on test data set

In [86]:
result_dff=pd.concat([result_dff,result('xgb_test',xgb_best_3,X_test_ht,y_test,X_val_ht,y_val)])
result_dff
Out[86]:
0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426
0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158
0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012
0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034
0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898
0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898
0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866
0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388
0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443
0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107
0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605
0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395
0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105
0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662
0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947
0 BAG2 0.87271 121403.44959 14738797572.78922 71079.17635 0.95589 73206.74410 5359227381.64863 48075.77140
0 RF_ht 0.86043 127122.92645 16160238428.30151 72573.17472 0.90876 105283.74570 11084667108.75019 57545.17898
0 GB_ht 0.89602 109723.26914 12039195791.19686 64651.56480 0.96775 62595.11219 3918148069.74015 41729.66244
0 ADAB_ht 0.89201 111821.98299 12504155880.21770 67601.23755 0.99349 28117.84264 790613074.89588 14412.70670
0 xgb_1_ht 0.88282 116484.15375 13568558075.71564 70335.11906 0.93052 91871.93035 8440451586.23162 61776.77670
0 xgb_2_ht 0.88877 113483.76321 12878564511.84933 69351.16691 0.94593 81048.97730 6568936720.98572 55372.52924
0 xgb_3_ht 0.89860 108356.33811 11741096009.16987 67404.86000 0.93004 92192.80765 8499513782.44646 60276.22610
0 xgb_test 0.87484 120381.47322 14491699093.75940 72694.97007 0.94998 78343.92038 6137769859.95983 53777.32335

In [87]:
#Feature importance
feat_imp(xgb_best_3,X_test_ht)
Imp
furnished_1 0.53507
sight 0.05095
coast_1 0.03399
lat 0.03394
quality_9 0.02366
quality_8 0.02151
long 0.01854
quality_7 0.01645
room_bath 0.01410
condition 0.01068
yr_built 0.00925
quality_11 0.00810
zipcode 0.00625
total_area 0.00579
quality_6 0.00567
basement 0.00538
quality_12 0.00534
quality_10 0.00493
lot_measure 0.00478
ceil 0.00428
room_bed 0.00379
quality_13 0.00000

dtype: float32

dtype: float32
CALCULATING CONFIDENCE INTERVAL ON THE FINAL SELECTED MODEL at 95% ALPHA

In [88]:
from sklearn.model_selection import KFold

num_folds = 200
seed = 7
kfold = KFold(n_splits=num_folds, random_state=seed)

results = cross_val_score(xgb_best_3, X_test_ht, y_test, cv=kfold)
print(results)
print("Accuracy: %.3f%% (%.3f%%)" % (results.mean()*100.0, results.std()*100.0))
[ 0.90391651 0.90916102 0.90837406 0.92102958 0.84137223 0.85154394

0.79054082 0.97225657 0.62255882 0.95655541 0.87971233 0.54938847
0.919873 0.90449203 0.91960521 0.82845636 0.85087562 0.84624357
0.84949297 0.79902964 0.88093633 0.7965441 0.85767605 0.89117899
0.87695964 0.81590065 0.77554087 0.82172976 0.89524705 0.60028268
0.91819488 0.7676954 0.92467382 0.76400042 -0.01087648 0.94301005
0.7988163 0.8973989 0.80375734 0.87449297 0.95865757 0.9275524
0.9097657 0.91836083 0.92456681 0.96787804 0.8355066 0.97563326
0.90399211 0.89793941 0.85086961 0.89391916 0.59636222 0.94398635
0.53656514 0.87802398 0.86956142 0.86946016 0.82775075 0.90893744
0.92036889 0.92163685 0.81946895 0.9143283 0.81252437 0.92824432
0.75878566 0.81404196 0.87121462 0.73438774 0.80718153 0.88708332
0.91354842 0.52667519 0.94112667 0.93731003 0.94483886 0.97033654
0.76244928 0.93123175 0.77286008 0.87546557 0.60705664 0.72760754
0.82665212 0.91951727 0.94649817 0.93530476 0.91908615 0.94478304
0.93804561 0.80743798 0.95095218 0.84086034 0.94263966 0.85434296
0.8939842 0.91195926 0.89329183 0.94217187 0.92094018 0.92534352
0.84231454 0.80070691 0.78969709 0.89154176 0.75224552 0.98563106
0.96707234 0.90153511 0.77089402 0.89182195 0.89960071 0.85305716
0.94549166 0.86431631 0.85722134 0.67693538 0.90097462 0.92198301
0.78518065 0.76819692 0.88903017 0.90340532 0.89964216 0.71263816
0.98670033 0.85944924 0.81788499 0.90645091 0.77838803 0.86403478
0.85040232 0.73824728 0.93391523 0.89215502 0.9170631 0.86449047
0.81659417 0.87965375 0.89630691 0.75384405 0.91273398 0.90846708
0.98175881 0.89090127 0.87495474 0.94566111 0.88549609 0.78429757
0.8835784 0.83106831 0.71277922 0.92337898 0.96179742 0.70433655
0.87525256 0.62843049 0.92354528 0.93623984 0.88524244 0.86559362
0.78977878 0.93659078 0.92459342 0.89326338 0.77853101 0.88929344
0.75543453 0.76270482 0.91536853 0.77264839 0.73741813 0.96582459
0.89034114 0.81234031 0.81053727 0.86102493 0.97418468 0.94098004
0.90470082 0.89779213 0.77860791 0.92766247 0.66861 0.30180163
0.7851057 0.91198086 0.87794581 0.84816996 0.93551467 0.97131443
0.93234322 0.74688263 0.69960959 0.93554804 0.94104945 0.92845367
0.82424248 0.77653242]
Accuracy: 85.137% (11.459%)
In [89]:
from matplotlib import pyplot

# plot scores
pyplot.hist(results)
pyplot.show()
# confidence intervals
alpha = 0.95 # for 95% confidence
p = ((1.0-alpha)/2.0) * 100 # tail regions on right and left .25 on each side indicated by P value (border)
lower = max(0.0, np.percentile(results, p))
p = (alpha+((1.0-alpha)/2.0)) * 100
upper = min(1.0, np.percentile(results, p))
print('%.1f confidence interval %.1f%% and %.1f%%' % (alpha*100, lower*100, upper*100))
print('Average accuracy result on test data is %.3f%%:' % (np.mean(results)*100))
95.0 confidence interval 59.5% and 97.2%

Average accuracy result on test data is 85.137%:
In [92]:
sns.jointplot(x=y_val, y=xgb_best_3.predict(X_val_ht), kind="reg", color="k")

plt.title('Actual and Predicted', fontsize=20) # Plot heading
plt.xlabel('Actual', fontsize=10) # X-label
plt.ylabel('Predicted', fontsize=10)
plt.tight_layout()
Dataset-2 Final summary

Finally we have the result, our final selected model is performing well on the test data R2 score of around 87.0% with RMSE of around 120000.

Most important feature for pricing is furnished.The furnished house is priced higher.
Some other important features that affect price the most are living measure, latitude, above average quality of house and coastal house. So,
one needs to thoroughly introspect its property on parameters suggested and list its price accordingly, similarly if one wants buy house -
needs to check the features suggested above in house and calculate the predicted price. The same can than be compared to listed price.
Dataset-1 Final summary:

The ensemble models have performed well compared to that of linear,KNN,SVR models
The best performance is given by Gradient boosting model with training (score-0.89,RMSE-81372), Validation (score-0.80,RSME-115867),
Testing(score-0.79,RMSE-114695) The 95% confidence interval scores range from 0.72 to 0.85.
The top key features that drive the price of the property are: 'furnished_1', 'yr_built', 'living_measure','quality_8', 'HouseLandRatio',
'lot_measure15', 'quality_9', 'ceil_measure', 'total_area'.
The above data is also reinforced by the analysis done during bivariate analysis.
For further improvization, the datasets can be made by treating outliers in different ways and hypertuning the ensemble models.

CONCLUSION:
We have build different models on 2 datasets. The performance (score and 95% confidence interval scores) of the model build on dataset-1 is
better than dataset-2 as the 95% confidence interval of dataset-1 is very narrow compared to that of dataset-2. Even though the score of
dataset-2 model is higher, the model has very vast range of performance scores.
The top key features to consider for pricing a property are:'furnished_1', 'yr_built', 'living_measure','quality_8', 'lot_measure15', 'quality_9',
'ceil_measure', 'total_area'. These are almost similar in both the models
So, one needs to thoroughly introspect its property on parameters suggested and list its price accordingly, similarly if one wants buy house -
needs to check the features suggested above in house and calculate the predicted price. The same can than be compared to listed price.
For further improvization, the datasets can be made by treating outliers in different ways and hypertuning the ensemble models. Making
polynomial features and improvising the model performance can also be explored further.
Pickle file Creation
First we will define the function for data-preprocessing that is required to run through the model. Then we will recall the same for predicting the
price(target) of the property.
The pickle file is created as per the steps followed for dataset-2.
In [9]:
#Defining Funcation to process all required steps as done in model

def model(data):
import pandas as pd
import numpy as np
X_test = pd.read_excel(data)
#Removing outliers
X_test_1=X_test[(X_test['living_measure']<=9000) & (X_test['price']<=4000000) &
(X_test['room_bed']<=10) & (X_test['room_bath']<=6)]
cols=['cid','dayhours']
X_test_1=X_test.drop(cols, inplace = False, axis = 1)
#columns to be converted to category

categ=['coast', 'furnished','quality']
#X_test_2=X_test_1[categ].astype('category')
# Concatenate X_test_dummy_1 variables with X_test_2

#X_test_final = pd.concat([X_test_1, X_test_2], axis=1)
X_test_final=X_test_1.copy()
for i in range(1,2):
X_test_final['coast_'+str(i)]=0
X_test_final['furnished_'+str(i)]=0
X_test_final['quality_'+str(i)]=0
if ((X_test_final['coast']==i).bool()):
X_test_final['coast_'+str(i)]=1
if ((X_test_final['furnished']==i).bool()):
X_test_final['furnished_'+str(i)]=1
if ((X_test_final['quality']==i).bool()):
X_test_final['quality_'+str(i)]=1
X_test_final=X_test_final.drop([ 'quality_3', 'quality_4', 'quality_1', 'quality_2', 'quality_5','price'],1)
# Drop categorical variable columns
X_test_final = X_test_final.drop(X_test_final[categ], axis=1)
return X_test_final
Test run on pickle file:
In [ ]:
import pickle
with open('model_pickle','wb') as f:
pickle.dump(xgb_best_3,f)
In [11]:
with open('model_pickle','rb') as f:
mp=pickle.load(f)
In [14]:
X_test=model('innercity.xlsx')
mp.predict(X_test)
#X_test.columns
Out[14]:
array([314002.16], dtype=float32)
We can see that with the given parameters, pickle file has run through the model and given predicted price of
the property
In [ ]:

Capstone Project Report

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Capstone Project Report

Uploaded by

Copyright:

Available Formats

HOUSE PRICE PREDICTION

Prerequisites for the running the file:

# loading the library required for data loading and processing

# read the data using pandas function from 'innercity.csv' file

We have more than 21k records having 23 features

# let's check out the columns/features we have in the dataset

Index(['cid', 'dayhours', 'price', 'room_bed', 'room_bath', 'living_measure',

These columns provide below information

# let's see the data types of the features

# let's check whether our dataset have any null/missing values

# let's do the 5 - factor analysis of the features

count mean std min 25% 50% 75% max

cid 21613.0 4.580302e+09 2.876566e+09 1.000102e+06 2.123049e+09 3.904930e+09 7.308900e+09 9.900000e+09

price 21613.0 5.401822e+05 3.673622e+05 7.500000e+04 3.219500e+05 4.500000e+05 6.450000e+05 7.700000e+06

room_bed 21613.0 3.370842e+00 9.300618e-01 0.000000e+00 3.000000e+00 3.000000e+00 4.000000e+00 3.300000e+01

room_bath 21613.0 2.114757e+00 7.701632e-01 0.000000e+00 1.750000e+00 2.250000e+00 2.500000e+00 8.000000e+00

living_measure 21613.0 2.079900e+03 9.184409e+02 2.900000e+02 1.427000e+03 1.910000e+03 2.550000e+03 1.354000e+04

lot_measure 21613.0 1.510697e+04 4.142051e+04 5.200000e+02 5.040000e+03 7.618000e+03 1.068800e+04 1.651359e+06

ceil 21613.0 1.494309e+00 5.399889e-01 1.000000e+00 1.000000e+00 1.500000e+00 2.000000e+00 3.500000e+00

coast 21613.0 7.541757e-03 8.651720e-02 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00

sight 21613.0 2.343034e-01 7.663176e-01 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 4.000000e+00

condition 21613.0 3.409430e+00 6.507430e-01 1.000000e+00 3.000000e+00 3.000000e+00 4.000000e+00 5.000000e+00

quality 21613.0 7.656873e+00 1.175459e+00 1.000000e+00 7.000000e+00 7.000000e+00 8.000000e+00 1.300000e+01

ceil_measure 21613.0 1.788391e+03 8.280910e+02 2.900000e+02 1.190000e+03 1.560000e+03 2.210000e+03 9.410000e+03

basement 21613.0 2.915090e+02 4.425750e+02 0.000000e+00 0.000000e+00 0.000000e+00 5.600000e+02 4.820000e+03

yr_built 21613.0 1.971005e+03 2.937341e+01 1.900000e+03 1.951000e+03 1.975000e+03 1.997000e+03 2.015000e+03

yr_renovated 21613.0 8.440226e+01 4.016792e+02 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 2.015000e+03

zipcode 21613.0 9.807794e+04 5.350503e+01 9.800100e+04 9.803300e+04 9.806500e+04 9.811800e+04 9.819900e+04

lat 21613.0 4.756005e+01 1.385637e-01 4.715590e+01 4.747100e+01 4.757180e+01 4.767800e+01 4.777760e+01

long 21613.0 -1.222139e+02 1.408283e-01 -1.225190e+02 -1.223280e+02 -1.222300e+02 -1.221250e+02 -1.213150e+02

living_measure15 21613.0 1.986552e+03 6.853913e+02 3.990000e+02 1.490000e+03 1.840000e+03 2.360000e+03 6.210000e+03

lot_measure15 21613.0 1.276846e+04 2.730418e+04 6.510000e+02 5.100000e+03 7.620000e+03 1.008300e+04 8.712000e+05

furnished 21613.0 1.966872e-01 3.975030e-01 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00

total_area 21613.0 1.718687e+04 4.158908e+04 1.423000e+03 7.035000e+03 9.575000e+03 1.300000e+04 1.652659e+06

From above analysis we got to know,

Exploratory Data Analysis

Univariate Analysis - By BoxPlot

#let's first import the required libraries for the plots

# size of plots to make it uniform throughout our analysis in the notebook

Analyzing Feature: cid

Analyzing Feature: dayhours

We successfully converted dayhours feature to month_year for better analysis.

We can see, most houses sold in April, July month

Analyzing Feature: Price (our Target)

The Price is ranging from 75,000 to 77,00,000 and distribution is right-skewed.

Analyzing Feature: room_bed

Most of the houses/properties have 3 or 4 bedrooms

Analyzing Feature: room_bath

Majority of the properties have bathroom in the range of 1.0 to 2.5

Analyzing Feature: Living measure

#Data is skewed as visible from plot, as its distribution is normal

#Let's plot the boxplot for living_measure

Analyzing Feature: lot_measure

#Data is skewed as visible from plot

Analyzing Feature: ceil

#let's see the ceil count for all the records

We can see, most houses have 1 floor

Analyzing Feature: coast

Analyzing Feature: sight

Analyzing Feature: condition

#Quality - most properties have quality rating between 6 to 10

#checking the no. of data points with quality rating as 13

#ceil_measure - its highly skewed

sns.factorplot(x='ceil',y='ceil_measure',data=house_df, size = 4, aspect = 2)