Download as pdf or txt
Download as pdf or txt
You are on page 1of 187

HOUSE PRICE PREDICTION

Structure:
1. Introduction
2. Data Loading
3. EDA - Univariate
4. EDA - Bivariate
5. Data Preprocessing
6. Model Building with Dataset-1
7. Hypertuning Dataset-1
8. Summary - Dataset-1
9. Model Building with Dataset-2
10. Hypertuning Dataset-2
11. Summary - Dataset -2
12. Conclusion
13. Pickle file creation

Note:
Dataset - 1 = 22 features
['price', 'room_bed', 'room_bath', 'living_measure', 'lot_measure', 'ceil', 'coast', 'sight', 'condition', 'quality', 'ceil_measure', 'basement', 'yr_built',
'living_measure15', 'lot_measure15', 'furnished', 'total_area', 'month_year', 'City', 'has_basement', 'HouseLandRatio', 'has_renovated']

Dataset - 2 = 31 features (important features after imputing dummy and analyzing different models)
['price', 'room_bed', 'room_bath', 'living_measure', 'lot_measure', 'ceil', 'sight', 'condition', 'ceil_measure', 'basement', 'yr_built', 'yr_renovated', 'zipcode', 'lat',
'long', 'living_measure15', 'lot_measure15', 'total_area', 'coast_1', 'quality_3', 'quality_4', 'quality_5', 'quality_6', 'quality_7', 'quality_8', 'quality_9',
'quality_10', 'quality_11', 'quality_12', 'quality_13', 'furnished_1'

Prerequisites for the running the file:


Below are 2 files needed to be added to you current working directory.

1. Need to add file USA ZipCodes_1.xlsx to current working directory to access this data
2. Add the folder WA to your current working directory
3. Install below 2 libraries
conda install -c conda-forge/label/cf201901 geopandas
conda install -c conda-forge/label/cf201901 shapely

</b>

This Jupyter Notebook is done as part of PGPML Great Learning Programme for Capstone Project. Let's first, define the problem, objective of
this excercise.

We have the problem statment well defined in the given document which is as follows
INTRODUCTION
Problem Statement
As a house value is simply more than location and square footage. Like the features that make up a person, an educated party would want to
know all aspects that give a house its value. For example, if we want to sell a house and we don't know the price which we can take, as it can't
be too low or too high. To find house price we usually try to find similar properties in our neighbourhood and based on collected data we trying
to assess our house price.

Problem Definition
When any person/business wants to sell or buy a house, they always face this kind of issue as they don't know the price which they should
offer. Due to this they might be offering too low or high for the property. Therefore, we can analyze the available data of the properties in the
area and can predict the price. We need to find how these attributes influence the house prices Right pricing is very imporatnt aspect to sell
house. It is very important to understand what are the factors and how they influence the house price. Objective is to predict the right price of
the house based on the attributes

Objective
Build model which will predict the house price when required features passed to the model. So we will

Find out the significant features from the given features dataset which affects the house price the most.
Build best feasible model to predict the house price with 95% confidence level

Business Reason
As people don't know the features/aspects which commulate property price, we can provide them HouseBuyingSelling guiding services in the
area so they can buy or sell their property with most suitable price tag and they didn't lose their hard earned money by offering low price or
keep waiting for buyers by putting high prices.

DATA LOADING
First, we will load the data from the given csv(comma seperated values) file provided as part of the Capstone Project.

In [2]:

# loading the library required for data loading and processing


import pandas as pd
import numpy as np

#Supress warnings
import warnings
warnings.filterwarnings('ignore')

# read the data using pandas function from 'innercity.csv' file


house_df = pd.read_csv('innercity.csv')

In [3]:

# let's check whether data loaded successfully or not, by checking first few records
house_df.head()

Out[3]:

cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... basement yr_built yr_renovated zipcode

0 3034200666 20141107T000000 808100 4 3.25 3020 13457 1.0 0 0 ... 0 1956 0 98133

1 8731981640 20141204T000000 277500 4 2.50 2550 7500 1.0 0 0 ... 800 1976 0 98023

2 5104530220 20150420T000000 404000 3 2.50 2370 4324 2.0 0 0 ... 0 2006 0 98038

3 6145600285 20140529T000000 300000 2 1.00 820 3844 1.0 0 0 ... 0 1916 0 98133

4 8924100111 20150424T000000 699000 2 1.50 1400 4050 1.0 0 0 ... 0 1954 0 98115

5 rows × 23 columns

Data is loaded successfully as we can see first 5 records from the dataset.
Data Understanding
After loading data into our pandas library dataframe, we can now try to understand the kind of data we have with us.

In [4]:

# print the number of records and features/aspects we have in the provided file
house_df.shape

Out[4]:

(21613, 23)

We have more than 21k records having 23 features

In [5]:

# let's check out the columns/features we have in the dataset

house_df.columns

Out[5]:

Index(['cid', 'dayhours', 'price', 'room_bed', 'room_bath', 'living_measure',


'lot_measure', 'ceil', 'coast', 'sight', 'condition', 'quality',
'ceil_measure', 'basement', 'yr_built', 'yr_renovated', 'zipcode',
'lat', 'long', 'living_measure15', 'lot_measure15', 'furnished',
'total_area'],
dtype='object')

From the above we can see the different columns we have in dataset.

These columns provide below information

1. cid: Notation for a house. Will not of our use. So we will drop this column
2. dayhours: Represents Date, when house was sold.
3. price: It's our TARGET feature, that we have to predict based on other featues
4. room_bed: Represents number of bedrooms in a house
5. room_bath: Represents number of bathrooms
6. living_measure: Represents square footage of house
7. lot_measure: Represents square footage of lot
8. ceil: Represents number of floors in house
9. coast: Represents whether house has waterfront view. It seems to be a categorical variable. We will see in our further data analysis
10. sight: Represents how many times sight has been viewed.
11. condition: Represents the overall condition of the house. It's kind of rating given to the house.
12. quality: Represents grade given to the house based on grading system
13. ceil_measure: Represents square footage of house apart from basement
14. basement: Represents square footage of basement
15. yr_built: Represents the year when house was built
16. yr_renovated: Represents the year when house was last renovated
17. zipcode: Represents zipcode as name implies
18. lat: Represents Lattitude co-ordniates
19. long: Represents Longitude co-ordinates
20. living_measure15: Represents square footage of house, when measured in 2015 year as house area may or may not changed after
renovation if any happened
21. lot_measure15: Represents square footage of lot, when measured in 2015 year as lot area may or may not change after renovation if any
done
22. furnished: Tells whether house is furnished or not. It seems to be categorical variable as description implies
23. total_area: Represents total area i.e. area of both living and lot
In [6]:

# let's see the data types of the features


house_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21613 entries, 0 to 21612
Data columns (total 23 columns):
cid 21613 non-null int64
dayhours 21613 non-null object
price 21613 non-null int64
room_bed 21613 non-null int64
room_bath 21613 non-null float64
living_measure 21613 non-null int64
lot_measure 21613 non-null int64
ceil 21613 non-null float64
coast 21613 non-null int64
sight 21613 non-null int64
condition 21613 non-null int64
quality 21613 non-null int64
ceil_measure 21613 non-null int64
basement 21613 non-null int64
yr_built 21613 non-null int64
yr_renovated 21613 non-null int64
zipcode 21613 non-null int64
lat 21613 non-null float64
long 21613 non-null float64
living_measure15 21613 non-null int64
lot_measure15 21613 non-null int64
furnished 21613 non-null int64
total_area 21613 non-null int64
dtypes: float64(4), int64(18), object(1)
memory usage: 3.8+ MB

In the dataset, we have more than 21k records and 23 columns, out of which
4 features are of float type
18 features are of integer type
1 feature is of object type (we may need to convert this object type to specific datatype)

In [7]:

# let's check whether our dataset have any null/missing values


house_df.isnull().sum()

Out[7]:

cid 0
dayhours 0
price 0
room_bed 0
room_bath 0
living_measure 0
lot_measure 0
ceil 0
coast 0
sight 0
condition 0
quality 0
ceil_measure 0
basement 0
yr_built 0
yr_renovated 0
zipcode 0
lat 0
long 0
living_measure15 0
lot_measure15 0
furnished 0
total_area 0
dtype: int64

We don't have any null or missing values for any of the columns

In [8]:

# let's check whether there's any duplicate record in our dataset or not. If present, we have to remove them
house_df.duplicated().sum()

Out[8]:

0
We don't have any duplicate record in out dataset. So we can say we have more than 21k Unique records

In [9]:

# let's do the 5 - factor analysis of the features

house_df.describe().transpose()

Out[9]:

count mean std min 25% 50% 75% max

cid 21613.0 4.580302e+09 2.876566e+09 1.000102e+06 2.123049e+09 3.904930e+09 7.308900e+09 9.900000e+09

price 21613.0 5.401822e+05 3.673622e+05 7.500000e+04 3.219500e+05 4.500000e+05 6.450000e+05 7.700000e+06

room_bed 21613.0 3.370842e+00 9.300618e-01 0.000000e+00 3.000000e+00 3.000000e+00 4.000000e+00 3.300000e+01

room_bath 21613.0 2.114757e+00 7.701632e-01 0.000000e+00 1.750000e+00 2.250000e+00 2.500000e+00 8.000000e+00

living_measure 21613.0 2.079900e+03 9.184409e+02 2.900000e+02 1.427000e+03 1.910000e+03 2.550000e+03 1.354000e+04

lot_measure 21613.0 1.510697e+04 4.142051e+04 5.200000e+02 5.040000e+03 7.618000e+03 1.068800e+04 1.651359e+06

ceil 21613.0 1.494309e+00 5.399889e-01 1.000000e+00 1.000000e+00 1.500000e+00 2.000000e+00 3.500000e+00

coast 21613.0 7.541757e-03 8.651720e-02 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00

sight 21613.0 2.343034e-01 7.663176e-01 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 4.000000e+00

condition 21613.0 3.409430e+00 6.507430e-01 1.000000e+00 3.000000e+00 3.000000e+00 4.000000e+00 5.000000e+00

quality 21613.0 7.656873e+00 1.175459e+00 1.000000e+00 7.000000e+00 7.000000e+00 8.000000e+00 1.300000e+01

ceil_measure 21613.0 1.788391e+03 8.280910e+02 2.900000e+02 1.190000e+03 1.560000e+03 2.210000e+03 9.410000e+03

basement 21613.0 2.915090e+02 4.425750e+02 0.000000e+00 0.000000e+00 0.000000e+00 5.600000e+02 4.820000e+03

yr_built 21613.0 1.971005e+03 2.937341e+01 1.900000e+03 1.951000e+03 1.975000e+03 1.997000e+03 2.015000e+03

yr_renovated 21613.0 8.440226e+01 4.016792e+02 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 2.015000e+03

zipcode 21613.0 9.807794e+04 5.350503e+01 9.800100e+04 9.803300e+04 9.806500e+04 9.811800e+04 9.819900e+04

lat 21613.0 4.756005e+01 1.385637e-01 4.715590e+01 4.747100e+01 4.757180e+01 4.767800e+01 4.777760e+01

long 21613.0 -1.222139e+02 1.408283e-01 -1.225190e+02 -1.223280e+02 -1.222300e+02 -1.221250e+02 -1.213150e+02

living_measure15 21613.0 1.986552e+03 6.853913e+02 3.990000e+02 1.490000e+03 1.840000e+03 2.360000e+03 6.210000e+03

lot_measure15 21613.0 1.276846e+04 2.730418e+04 6.510000e+02 5.100000e+03 7.620000e+03 1.008300e+04 8.712000e+05

furnished 21613.0 1.966872e-01 3.975030e-01 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 1.000000e+00

total_area 21613.0 1.718687e+04 4.158908e+04 1.423000e+03 7.035000e+03 9.575000e+03 1.300000e+04 1.652659e+06


1. CID: House ID/Property ID.Not used for analysis
2. Dayhours: 5 factor analysis is reflecting for this column
3. price: Our taget column value is in 75k - 7700k range. As Mean > Median, it's Right-Skewed.
4. room_bed: Number of bedrooms range from 0 - 33. As Mean slightly > Median, it's slightly Right-Skewed.
5. room_bath: Number of bathrooms range from 0 - 8. As Mean slightly < Median, it's slightly Left-Skewed.
6. living_measure: Square footage of house range from 290 - 13,540. As Mean > Median, it's Right-Skewed.
7. lot_measure: Square footage of lot range from 520 - 16,51,359. As Mean almost double of Median, it's Hightly Right-Skewed.
8. ceil: Number of floors range from 1 - 3.5 As Mean ~ Median, it's almost Normal Distributed.
9. coast: As this value represent whether house has waterfront view or not. It's categorical column. From above analysis we got know, very
few houses has waterfront view.
10. sight: Value ranges from 0 - 4. As Mean > Median, it's Right-Skewed
11. condition: Represents rating of house which ranges from 1 - 5. As Mean > Median, it's Right-Skewed
12. quality: Representign grade given to house which range from 1 - 13. As Mean > Median, it's Right-Skewed.
13. ceil_measure: Square footage of house apart from basement ranges in 290 - 9,410. As Mean > Median, it's Right-Skewed.
14. basement: Square footage house basement ranges in 0 - 4,820. As Mean highlty > Median, it's Highly Right-Skewed.
15. yr_built: House built year ranges from 1900 - 2015. As Mean < Median, it's Left-Skewed.
16. yr_renovated: House renovation year only 2015. So this column can be used as Categorical Variable for knowing whether house is
renovated or not.
17. zipcode: House ZipCode ranges from 98001 - 98199. As Mean > Median, it's Right-Skewed.
18. lat: Lattitude ranges from 47.1559 - 47.7776 As Mean < Median, it's Left-Skewed.
19. long: Longittude ranges from -122.5190 to -121.315 As Mean > Median, it's Right-Skewed.
20. living_measure15: Value ragnes from 399 to 6,210. As Mean > Median, it's Right-Skewed.
21. lot_measure15: Value ragnes from 651 to 8,71,200. As Mean highly > Median, it's Highly Right-Skewed.
22. furnished: Representing whether house is furnished or not. It's a Categorical Variable
23. total_area Total area of house ranges from 1,423 to 16,52,659. As Mean is almost double of Median, it's Highly Right-Skewed

From above analysis we got to know,

Most columns distribution is Right-Skewed and only few features are Left-Skewed (like room_bath, yr_built, lat).

We have columns which are Categorical in nature are -> coast, yr_renovated, furnished

Exploratory Data Analysis


Let's do some visual data analysis of the features

Univariate Analysis - By BoxPlot

In [10]:

#let's first import the required libraries for the plots


import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# size of plots to make it uniform throughout our analysis in the notebook


plotSizeX = 12
plotSizeY = 6
# let's boxplot all the numerical columns and see if there any outliers
for i in house_df.iloc[:, 2:].columns:
house_df.iloc[:, 1:].boxplot(column=i)
plt.show()
We can see, there are lot of features which have outliers. So we might need to treat those before building model

Analyzing Feature: cid


In [11]:

#cid - CID is appearing muliple times, it seems data contains house which is sold multiple times
cid_count=house_df.cid.value_counts()
cid_count[cid_count>1].shape

Out[11]:

(176,)

We have 176 properties that were sold more than once in the given data

Analyzing Feature: dayhours

In [12]:

#we will create new data frame that can be used for modeling
#We will convert the dayhours to 'month_year' as sale month-year is relevant for analysis

house_dfr=house_df.copy()
house_df.dayhours=house_df.dayhours.str.replace('T000000', "")
house_df.dayhours=pd.to_datetime(house_df.dayhours,format='%Y%m%d')
house_df['month_year']=house_df['dayhours'].apply(lambda x: x.strftime('%B-%Y'))
house_df['month_year'].head()

Out[12]:

0 November-2014
1 December-2014
2 April-2015
3 May-2014
4 April-2015
Name: month_year, dtype: object

We successfully converted dayhours feature to month_year for better analysis.

In [13]:
house_df['month_year'].value_counts()

Out[13]:

April-2015 2231
July-2014 2211
June-2014 2180
August-2014 1940
October-2014 1878
March-2015 1875
September-2014 1774
May-2014 1768
December-2014 1471
November-2014 1411
February-2015 1250
January-2015 978
May-2015 646
Name: month_year, dtype: int64

We can see, most houses sold in April, July month

In [14]:

house_df.groupby(['month_year'])['price'].agg('mean')

Out[14]:

month_year
April-2015 561933.463021
August-2014 536527.039691
December-2014 524602.893270
February-2015 507919.603200
January-2015 525963.251534
July-2014 544892.161013
June-2014 558123.736239
March-2015 544057.683200
May-2014 548166.600113
May-2015 558193.095975
November-2014 522058.861800
October-2014 539127.477636
September-2014 529315.868095
Name: price, dtype: float64
So the time line of the sale data of the properties is from May-2014 to May-2015 and April month have the highest mean price.

Analyzing Feature: Price (our Target)

In [15]:

house_df.price.describe()

Out[15]:

count 2.161300e+04
mean 5.401822e+05
std 3.673622e+05
min 7.500000e+04
25% 3.219500e+05
50% 4.500000e+05
75% 6.450000e+05
max 7.700000e+06
Name: price, dtype: float64

In [16]:

plt.figure(figsize=(plotSizeX, plotSizeY))
sns.distplot(house_df['price'])

Out[16]:

<matplotlib.axes._subplots.AxesSubplot at 0x225ef84d550>

The Price is ranging from 75,000 to 77,00,000 and distribution is right-skewed.

Analyzing Feature: room_bed

In [17]:
house_df['room_bed'].value_counts()

Out[17]:

3 9824
4 6882
2 2760
5 1601
6 272
1 199
7 38
8 13
0 13
9 6
10 3
11 1
33 1
Name: room_bed, dtype: int64

The value of 33 seems to be outlier we need to check the data point before imputing the same
In [18]:

house_df[house_df['room_bed']==33]

Out[18]:

cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... yr_built yr_renovated zipcode lat long

2014-06-
750 2402100895 640000 33 1.75 1620 6000 1.0 0 0 ... 1947 0 98103 47.6878 -122.331
25

1 rows × 24 columns

Will delete this data point after bivariate analysis as it looks to be an outlier as it has low price for 33 bed room property

In [19]:

plt.figure(figsize=(plotSizeX, plotSizeY))
sns.countplot(house_df.room_bed,color='green')

Out[19]:

<matplotlib.axes._subplots.AxesSubplot at 0x225ef14f780>

Most of the houses/properties have 3 or 4 bedrooms

Analyzing Feature: room_bath

In [20]:
plt.figure(figsize=(plotSizeX, plotSizeY))
sns.countplot(house_df.room_bath,color='green')
house_df['room_bath'].value_counts().sort_index()
Out[20]:

0.00 10
0.50 4
0.75 72
1.00 3852
1.25 9
1.50 1446
1.75 3048
2.00 1930
2.25 2047
2.50 5380
2.75 1185
3.00 753
3.25 589
3.50 731
3.75 155
4.00 136
4.25 79
4.50 100
4.75 23
5.00 21
5.25 13
5.50 10
5.75 4
6.00 6
6.25 2
6.50 2
6.75 2
7.50 1
7.75 1
8.00 2
Name: room_bath, dtype: int64

Majority of the properties have bathroom in the range of 1.0 to 2.5


In [21]:

plt.figure(figsize=(plotSizeX, plotSizeY))
print("Skewness is :",house_df.room_bath.skew())
sns.distplot(house_df.room_bath)

Skewness is : 0.511107573347417

Out[21]:
<matplotlib.axes._subplots.AxesSubplot at 0x225ef14f748>

Analyzing Feature: Living measure

In [22]:

#Data is skewed as visible from plot, as its distribution is normal


plt.figure(figsize=(plotSizeX, plotSizeY))
print("Skewness is :",house_df.living_measure.skew())
sns.distplot(house_df.living_measure)
house_df.living_measure.describe()

Skewness is : 1.471555426802092

Out[22]:
count 21613.000000
mean 2079.899736
std 918.440897
min 290.000000
25% 1427.000000
50% 1910.000000
75% 2550.000000
max 13540.000000
Name: living_measure, dtype: float64
Data distribution tells us, living_measure is right-skewed.

In [23]:

#Let's plot the boxplot for living_measure


plt.figure(figsize=(plotSizeX, plotSizeY))
sns.boxplot(house_df.living_measure)

Out[23]:

<matplotlib.axes._subplots.AxesSubplot at 0x225ef0f5b70>

There are many outliers in living measure. Need to review further to treat the same.

In [24]:

#checking the no. of data points with Living measure greater than 8000
house_df[house_df['living_measure']>8000]

Out[24]:

cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... yr_built yr_renovated zipcode lat

2014-09-
264 9208900037 6890000 6 7.75 9890 31374 2.0 0 4 ... 2001 0 98039 47.6305 -122
19

2014-06-
668 1924059029 4670000 5 6.75 9640 13068 1.0 1 4 ... 1983 2009 98040 47.5570 -122
17

2014-06-
1123 2303900035 2890000 5 6.25 8670 64033 2.0 0 4 ... 1965 2003 98177 47.7295 -122
11

2014-10-
4789 1247600105 5110000 5 5.25 8010 45517 2.0 1 4 ... 1999 0 98033 47.6767 -122
20

2014-10-
16785 6762700020 7700000 6 8.00 12050 27600 2.5 0 3 ... 1910 1987 98102 47.6298 -122
13

2014-07-
18393 6072800246 3300000 5 6.25 8020 21738 2.0 0 0 ... 2001 0 98006 47.5675 -122
02

2014-06-
19888 9808700762 7060000 5 4.50 10040 37325 2.0 1 2 ... 1940 2001 98004 47.6500 -122
11

2014-05-
20740 1225069038 2280000 7 8.00 13540 307752 3.0 0 4 ... 1999 0 98053 47.6675 -121
05

2014-08-
20917 2470100110 5570000 5 5.75 9200 35069 2.0 0 0 ... 2001 0 98039 47.6289 -122
04

9 rows × 24 columns

We have only 9 properties/house which have more than 8k living_measure. So will treat these outliers.

Analyzing Feature: lot_measure


In [25]:

#Data is skewed as visible from plot


plt.figure(figsize=(plotSizeX, plotSizeY))
print("Skewness is :",house_df.lot_measure.skew())
sns.boxplot(house_df.lot_measure)
house_df.lot_measure.describe()

Skewness is : 13.06001895903175

Out[25]:
count 2.161300e+04
mean 1.510697e+04
std 4.142051e+04
min 5.200000e+02
25% 5.040000e+03
50% 7.618000e+03
75% 1.068800e+04
max 1.651359e+06
Name: lot_measure, dtype: float64

In [26]:

#checking the no. of data points with Lot measure greater than 1250000
house_df[house_df['lot_measure']>1250000]

Out[26]:

cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... yr_built yr_renovated zipcode lat

2015-03-
1113 1020069017 700000 4 1.0 1300 1651359 1.0 0 3 ... 1920 0 98022 47.2313 -122.02
27

1 rows × 24 columns

We have only 1 property with more than 12,50,000 lot_measure. So we need to treat this.

Analyzing Feature: ceil

In [27]:

#let's see the ceil count for all the records


house_df.ceil.value_counts()

Out[27]:

1.0 10680
2.0 8241
1.5 1910
3.0 613
2.5 161
3.5 8
Name: ceil, dtype: int64

We can see, most houses have 1 floor


In [28]:

plt.figure(figsize=(plotSizeX, plotSizeY))
sns.countplot('ceil',data=house_df)

Out[28]:

<matplotlib.axes._subplots.AxesSubplot at 0x225ef19ff60>

Above grapth confirming the same, that most properties have 1 and 2 floors

Analyzing Feature: coast

In [29]:

#coast - most houses donot have waterfront view, very few are waterfront
house_df.coast.value_counts()

Out[29]:

0 21450
1 163
Name: coast, dtype: int64

Analyzing Feature: sight

In [30]:
#sight - most sights have not been viewed
house_df.sight.value_counts()

Out[30]:

0 19489
2 963
3 510
1 332
4 319
Name: sight, dtype: int64

Analyzing Feature: condition

In [31]:

#condition - Overall most houses are rated as 3 and above for its condition overall
house_df.condition.value_counts()

Out[31]:

3 14031
4 5679
5 1701
2 172
1 30
Name: condition, dtype: int64
Analyzing Feature: quality

In [32]:

#Quality - most properties have quality rating between 6 to 10


house_df.quality.value_counts()
plt.figure(figsize=(plotSizeX, plotSizeY))
sns.countplot('quality',data=house_df)

Out[32]:

<matplotlib.axes._subplots.AxesSubplot at 0x225eedbd358>

In [33]:

#checking the no. of data points with quality rating as 13


house_df[house_df['quality']==13]

Out[33]:

cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... yr_built yr_renovated zipcode lat

2014-09-
264 9208900037 6890000 6 7.75 9890 31374 2.0 0 4 ... 2001 0 98039 47.6305 -122
19

2014-06-
1123 2303900035 2890000 5 6.25 8670 64033 2.0 0 4 ... 1965 2003 98177 47.7295 -122
11

2015-01-
1583 2426039123 2420000 5 4.75 7880 24250 2.0 0 2 ... 1996 0 98177 47.7334 -122
30

2014-09-
7095 2303900100 3800000 3 4.25 5510 35000 2.0 0 4 ... 1997 0 98177 47.7296 -122
11

2015-04-
8509 4139900180 2340000 4 2.50 4500 35200 1.0 0 0 ... 1988 0 98006 47.5477 -122
20

2014-09-
9446 1068000375 3200000 6 5.00 7100 18200 2.5 0 0 ... 1933 2002 98199 47.6427 -122
23

2014-10-
10387 7237501190 1780000 4 3.25 4890 13402 2.0 0 0 ... 2004 0 98059 47.5303 -122
10

2014-11-
12320 1725059316 2390000 4 4.00 6330 13296 2.0 0 2 ... 2000 0 98033 47.6488 -122
20

2014-07-
12686 853200010 3800000 5 5.50 7050 42840 1.0 0 2 ... 1978 0 98004 47.6229 -122
01

2014-10-
16785 6762700020 7700000 6 8.00 12050 27600 2.5 0 3 ... 1910 1987 98102 47.6298 -122
13

2015-03-
17322 9831200500 2480000 5 3.75 6810 7500 2.5 0 0 ... 1922 0 98102 47.6285 -122
04

2014-12-
20892 3303850390 2980000 5 5.50 7400 18898 2.0 0 3 ... 2001 0 98006 47.5431 -122
12

2014-08-
20917 2470100110 5570000 5 5.75 9200 35069 2.0 0 0 ... 2001 0 98039 47.6289 -122
04

13 rows × 24 columns

There are only 13 propeties which have the highest quality rating
Analyzing Feature: ceil_measure

In [34]:

#ceil_measure - its highly skewed


print("Skewness is :", house_df.ceil_measure.skew())
plt.figure(figsize=(plotSizeX, plotSizeY))
sns.distplot(house_df.ceil_measure)
house_df.ceil_measure.describe()

Skewness is : 1.4466644733818372

Out[34]:
count 21613.000000
mean 1788.390691
std 828.090978
min 290.000000
25% 1190.000000
50% 1560.000000
75% 2210.000000
max 9410.000000
Name: ceil_measure, dtype: float64

In [35]:

sns.factorplot(x='ceil',y='ceil_measure',data=house_df, size = 4, aspect = 2)

C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3666: UserWarning: The `factorplot


` function has been renamed to `catplot`. The original name will be removed in a future release. Ple
ase update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'`
in `catplot`.
warnings.warn(msg)
C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3672: UserWarning: The `size` para
mter has been renamed to `height`; please update your code.
warnings.warn(msg, UserWarning)

Out[35]:
<seaborn.axisgrid.FacetGrid at 0x225ef353f28>

There is no pattern in Ceil Vs Ceil_measure


The vertival lines at each point represent the inter quartile range of values at that point

Analyzing Feature: basement

In [36]:

#basement_measure
plt.figure(figsize=(plotSizeX, plotSizeY))
sns.distplot(house_df.basement)

Out[36]:

<matplotlib.axes._subplots.AxesSubplot at 0x225f1238080>

We can see 2 gaussians, which tells us there are propeties which don't have basements and some have the basements

In [37]:

house_df[house_df.basement==0].shape

Out[37]:

(13126, 24)

We have almost 60% of the properties without basement


In [38]:

#houses have zero measure of basement i.e. they donot have basements
#let's plot boxplot for properties which have basements only
house_df_base=house_df[house_df['basement']>0]
plt.figure(figsize=(plotSizeX, plotSizeY))
sns.boxplot(house_df_base['basement'])

Out[38]:

<matplotlib.axes._subplots.AxesSubplot at 0x225f0f92a20>

We can clearly see, there are outliers. We need to treat this before our model.

In [39]:

#checking the no. of data points with 'basement' greater than 4000
house_df[house_df['basement']>4000]

Out[39]:

cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... yr_built yr_renovated zipcode lat

2014-06-
668 1924059029 4670000 5 6.75 9640 13068 1.0 1 4 ... 1983 2009 98040 47.5570 -122
17

2014-05-
20740 1225069038 2280000 7 8.00 13540 307752 3.0 0 4 ... 1999 0 98053 47.6675 -121
05

2 rows × 24 columns

We have only 2 properties with more than 4,000 measure basement


In [40]:

#Distribution of houses having basement


plt.figure(figsize=(plotSizeX, plotSizeY))
sns.distplot(house_df_base.basement)

Out[40]:

<matplotlib.axes._subplots.AxesSubplot at 0x225f102bd30>

Distribution having basement is right-skewed

Analyzing Feature: yr_built

In [41]:

#house range from new to very old


plt.figure(figsize=(plotSizeX, plotSizeY))
sns.distplot(house_df.yr_built)

Out[41]:

<matplotlib.axes._subplots.AxesSubplot at 0x225f125f5c0>

The built year of the properties range from 1900 to 2014 and we can see upward trend with time

Analyzing Feature: yr_renovated


In [42]:

house_df[house_df['yr_renovated']>0].shape

Out[42]:

(914, 24)

Only 914 houses were renovated out of 21613 houses

In [43]:

#yr_renovated - plot of houses which are renovated


house_df_reno=house_df[house_df['yr_renovated']>0]
plt.figure(figsize=(plotSizeX, plotSizeY))
sns.distplot(house_df_reno.yr_renovated)

Out[43]:

<matplotlib.axes._subplots.AxesSubplot at 0x225ef896208>

Now will create age column from columns : yr_built & yr_renovated

Analyzing Feature: Zipcode, Lat, Long

In [46]:
#For geographic visual
import geopandas as gpd
from shapely.geometry import Point, Polygon
#For current working directory
import os
cwd = os.getcwd()

In [47]:

## Need to add file USA ZipCodes_1.xlsx to current working directory to access this data
USAZip=pd.read_excel("USA ZipCodes_1.xlsx",sheet_name="Sheet8")
USAZip.head()

Out[47]:

zipcode City County Type

0 98001 Auburn King Standard

1 98002 Auburn King Standard

2 98003 Federal Way King Standard

3 98004 Bellevue King Standard

4 98005 Bellevue King Standard

In [48]:

house_df=house_df.merge(USAZip,how='left',on='zipcode')
#house_df.drop_duplicates()
In [49]:

#let's see the shape of our dataframe


house_df.shape

Out[49]:

(21613, 27)

Now we have 27 features

In [5]:

#Add the folder WA to your current working directory


usa = gpd.read_file(cwd+'\\WA\\WSDOT__City_Limits.shp')
usa.head()
gdf = gpd.GeoDataFrame(
house_df,geometry = [Point(xy) for xy in zip(house_df['long'], house_df['lat'])])
#We can now plot our ``GeoDataFrame``
ax=usa[usa.CityName.isin(house_df.City.unique())].plot(
color='white', edgecolor='black',figsize=(20,8))
plt.figure(figsize=(15,15))
gdf.plot(ax=ax, color='green', marker='o',markersize=0.1)

Out[5]:

<matplotlib.axes._subplots.AxesSubplot at 0x1ccf1142588>

<Figure size 1080x1080 with 0 Axes>

In [51]:

#let's see the columns of dataframe once again


house_df.columns

Out[51]:

Index(['cid', 'dayhours', 'price', 'room_bed', 'room_bath', 'living_measure',


'lot_measure', 'ceil', 'coast', 'sight', 'condition', 'quality',
'ceil_measure', 'basement', 'yr_built', 'yr_renovated', 'zipcode',
'lat', 'long', 'living_measure15', 'lot_measure15', 'furnished',
'total_area', 'month_year', 'City', 'County', 'Type'],
dtype='object')

So we have 'City', 'Country', 'Type' as new feature in our dataframe

In [52]:
house_df.Type.value_counts()

Out[52]:

Standard 21613
Name: Type, dtype: int64
As the type is same for all the columns, we will remove this column in further analysis

In [53]:

house_df.City.value_counts()

Out[53]:

Seattle 8977
Renton 1597
Bellevue 1407
Kent 1203
Redmond 979
Kirkland 977
Auburn 912
Sammamish 800
Federal Way 779
Issaquah 733
Maple Valley 590
Woodinville 471
Snoqualmie 310
Kenmore 283
Mercer Island 282
Enumclaw 234
North Bend 221
Bothell 195
Duvall 190
Carnation 124
Vashon 118
Black Diamond 100
Fall City 81
Medina 50
Name: City, dtype: int64

So we have most properties in 'Seattle' city and least in 'Medina' city

Analyzing Feature: furnished

In [54]:

plt.figure(figsize=(plotSizeX, plotSizeY))
sns.countplot('furnished',data=house_df)
house_df.furnished.value_counts()

Out[54]:

0 17362
1 4251
Name: furnished, dtype: int64

Most properties are not furnished. Furnish column need to be converted into categorical column

BIVARIATE ANALYSIS
PairPlot

In [55]:

# let's plot all the variables and confirm our above deduction with more confidence
sns.pairplot(house_df, diag_kind = 'kde')

Out[55]:

<seaborn.axisgrid.PairGrid at 0x225f12a3a90>
From above pair plot, we observed/deduced below

1. price: price distribution is Right-Skewed as we deduced earlier from our 5-factor analysis
2. room_bed: our target variable (price) and room_bed plot is not linear. It's distribution have lot of gaussians
3. room_bath: It's plot with price has somewhat linear relationship. Distribution has number of gaussians.
4. living_measure: Plot against price has strong linear relationship. It also have linear relationship with room_bath variable. So might remove
one of these 2. Distribution is Right-Skewed.
5. lot_measure: No clear relationship with price.
6. ceil: No clear relationship with price. We can see, it's have 6 unique values only. Therefore, we can convert this column into categorical
column for values.
7. coast: No clear relationship with price. Clearly it's categorical variable with 2 unique values.
8. sight: No clear relationship with price. This has 5 unique values. Can be converted to Categorical variable.
9. condition: No clear relationship with price. This has 5 unique values. Can be converted to Categorical variable.
10. quality: Somewhat linear relationship with price. Has discrete values from 1 - 13. Can be converted to Categorical variable.
11. ceil_measure: Strong linear relationship with price. Also with room_bath and living_measure features. Distribution is Right-Skewed.
12. basement: No clear relationship with price.
13. yr_built: No clear relationship with price.
14. yr_renovated: No clear relationship with price. Have 2 unique values. Can be converted to Categorical Variable which tells whether house is
renovated or not.
15. zipcode, lat, long: No clear relationship with price or any other feature.
16. living_measure15: Somewhat linear relationship with target feature. It's same as living_measure. Therefore we can drop this variable.
17. lot_measure15: No clear relationship with price or any other feature.
18. furnished: No clear relationship with price or any other feature. 2 unique values so can be converted to Categorical Variable
19. total_area: No clear relationship with price. But it has Very Strong linear relationship with lot_measure. So one of it can be dropped.

In brief, below featues should be converted to Categorical Variable


ceil, coast, sight, condition, quality, yr_renovated, furnished
And below columns can be dropped after checking pearson factor
zipcode, lat, long, living_measure15, lot_measure15, total_area

In [56]:

# let's see corelatoin between the different features


house_corr = house_df.corr(method ='pearson')
house_corr

Out[56]:

cid price room_bed room_bath living_measure lot_measure ceil coast sight condition ... basement yr_built

cid 1.000000 -0.016797 0.001286 0.005160 -0.012258 -0.132109 0.018525 -0.002721 0.011592 -0.023783 ... -0.005151 0.021380

price -0.016797 1.000000 0.308338 0.525134 0.702044 0.089655 0.256786 0.266331 0.397346 0.036392 ... 0.323837 0.053982

room_bed 0.001286 0.308338 1.000000 0.515884 0.576671 0.031703 0.175429 -0.006582 0.079532 0.028472 ... 0.303093 0.154178

room_bath 0.005160 0.525134 0.515884 1.000000 0.754665 0.087740 0.500653 0.063744 0.187737 -0.124982 ... 0.283770 0.506019

living_measure -0.012258 0.702044 0.576671 0.754665 1.000000 0.172826 0.353949 0.103818 0.284611 -0.058753 ... 0.435043 0.318049

lot_measure -0.132109 0.089655 0.031703 0.087740 0.172826 1.000000 -0.005201 0.021604 0.074710 -0.008958 ... 0.015286 0.053080

ceil 0.018525 0.256786 0.175429 0.500653 0.353949 -0.005201 1.000000 0.023698 0.029444 -0.263768 ... -0.245705 0.489319

coast -0.002721 0.266331 -0.006582 0.063744 0.103818 0.021604 0.023698 1.000000 0.401857 0.016653 ... 0.080588 -0.026161

sight 0.011592 0.397346 0.079532 0.187737 0.284611 0.074710 0.029444 0.401857 1.000000 0.045990 ... 0.276947 -0.053440

condition -0.023783 0.036392 0.028472 -0.124982 -0.058753 -0.008958 -0.263768 0.016653 0.045990 1.000000 ... 0.174105 -0.361417

quality 0.008130 0.667463 0.356967 0.664983 0.762704 0.113621 0.458183 0.082775 0.251321 -0.144674 ... 0.168392 0.446963

ceil_measure -0.010842 0.605566 0.477600 0.685342 0.876597 0.183512 0.523885 0.072075 0.167649 -0.158214 ... -0.051943 0.423898

basement -0.005151 0.323837 0.303093 0.283770 0.435043 0.015286 -0.245705 0.080588 0.276947 0.174105 ... 1.000000 -0.133124

yr_built 0.021380 0.053982 0.154178 0.506019 0.318049 0.053080 0.489319 -0.026161 -0.053440 -0.361417 ... -0.133124 1.000000

yr_renovated -0.016907 0.126442 0.018841 0.050739 0.055363 0.007644 0.006338 0.092885 0.103917 -0.060618 ... 0.071323 -0.224874

zipcode -0.008224 -0.053168 -0.152668 -0.203866 -0.199430 -0.129574 -0.059121 0.030285 0.084827 0.003026 ... 0.074845 -0.346869

lat -0.001891 0.306919 -0.008931 0.024573 0.052529 -0.085683 0.049614 -0.014274 0.006157 -0.014941 ... 0.110538 -0.148122

long 0.020799 0.021571 0.129473 0.223042 0.240223 0.229521 0.125419 -0.041910 -0.078400 -0.106500 ... -0.144765 0.409356

living_measure15 -0.002901 0.585374 0.391638 0.568634 0.756420 0.144608 0.279885 0.086463 0.280439 -0.092824 ... 0.200355 0.326229

lot_measure15 -0.138798 0.082456 0.029244 0.087175 0.183286 0.718557 -0.011269 0.030703 0.072575 -0.003406 ... 0.017276 0.070958

furnished -0.010009 0.565991 0.259268 0.484923 0.632947 0.118883 0.347749 0.069882 0.220250 -0.121902 ... 0.092847 0.305225

total_area -0.131844 0.104796 0.044310 0.104050 0.194209 0.999763 0.002637 0.023809 0.080693 -0.010219 ... 0.024832 0.059889

22 rows × 22 columns
We have linear relationships in below featues as we got to know from above matrix

1. price: room_bath, living_measure, quality, living_measure15, furnished


2. living_measure: price, room_bath. So we can consider dropping 'room_bath' variable.
3. quality: price, room_bath, living_measure
4. ceil_measure: price, room_bath, living_measure, quality
5. living_measure15: price, living_measure, quality. So we can consider dropping living_measure15 as well. As it's giving same info as
living_measure.
6. lot_measure15: lot_measure. Therefore, we can consider dropping lot_measure15, as it's giving same info.
7. furnished: quality
8. total_area: lot_measure, lot_measure15. Therefore, we can consider dropping total_area feature as well. As it's giving same info as
lot_measure.

We can plot heatmap and can easily confirm our above findings

In [57]:

# Plotting heatmap
plt.subplots(figsize =(15, 8))
sns.heatmap(house_corr,cmap="YlGnBu",annot=True)

Out[57]:

<matplotlib.axes._subplots.AxesSubplot at 0x2258d4f79b0>

Analyzing Bivariate for Feature: month_year


In [58]:

#month,year in which house is sold. Price is not influenced by it, though there are outliers and can be easily se
en.
house_df['month_year'] = pd.to_datetime(house_df['month_year'], format='%B-%Y')

house_df.sort_values(["month_year"], axis=0,
ascending=True, inplace=True)
house_df["month_year"] = house_df["month_year"].dt.strftime('%B-%Y')

sns.factorplot(x='month_year',y='price',data=house_df, size=4, aspect=2)


plt.xticks(rotation=90)
#groupby
house_df.groupby('month_year')['price'].agg(['mean','median','size'])

C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3666: UserWarning: The `factorplot


` function has been renamed to `catplot`. The original name will be removed in a future release. Ple
ase update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'`
in `catplot`.
warnings.warn(msg)
C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3672: UserWarning: The `size` para
mter has been renamed to `height`; please update your code.
warnings.warn(msg, UserWarning)

Out[58]:

mean median size

month_year

April-2015 561933.463021 476500 2231

August-2014 536527.039691 442100 1940

December-2014 524602.893270 432500 1471

February-2015 507919.603200 425545 1250

January-2015 525963.251534 438500 978

July-2014 544892.161013 465000 2211

June-2014 558123.736239 465000 2180

March-2015 544057.683200 450000 1875

May-2014 548166.600113 465000 1768

May-2015 558193.095975 455000 646

November-2014 522058.861800 435000 1411

October-2014 539127.477636 446900 1878

September-2014 529315.868095 450000 1774

The mean price of the houses tend to be high during March,April, May as compared to that of September, October, November,December period.

Analyzing Bivariate for Feature: room_bed


In [59]:

#Room_bed - outliers can be seen easily. Mean and median of price increases with number bedrooms/house uptill a p
oint
#and then drops
sns.factorplot(x='room_bed',y='price',data=house_df, size=4, aspect=2)

#groupby
house_df.groupby('room_bed')['price'].agg(['mean','median','size'])

C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3666: UserWarning: The `factorplot


` function has been renamed to `catplot`. The original name will be removed in a future release. Ple
ase update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'`
in `catplot`.
warnings.warn(msg)
C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3672: UserWarning: The `size` para
mter has been renamed to `height`; please update your code.
warnings.warn(msg, UserWarning)

Out[59]:

mean median size

room_bed

0 4.102231e+05 288000.0 13

1 3.176580e+05 299000.0 199

2 4.013877e+05 374000.0 2760

3 4.662766e+05 413000.0 9824

4 6.355647e+05 549997.5 6882

5 7.868741e+05 620000.0 1601

6 8.258535e+05 650000.0 272

7 9.514478e+05 728580.0 38

8 1.105077e+06 700000.0 13

9 8.939998e+05 817000.0 6

10 8.200000e+05 660000.0 3

11 5.200000e+05 520000.0 1

33 6.400000e+05 640000.0 1

There is clear increasing trend in price with room_bed

In [60]:

#room_bath - outliers can be seen easily. Overall mean and median price increares with increasing room_bath
sns.factorplot(x='room_bath',y='price',data=house_df,size=4, aspect=2)
plt.xticks(rotation=90)
#groupby
house_df.groupby('room_bath')['price'].agg(['mean','median','size'])
C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3666: UserWarning: The `factorplot
` function has been renamed to `catplot`. The original name will be removed in a future release. Ple
ase update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'`
in `catplot`.
warnings.warn(msg)
C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3672: UserWarning: The `size` para
mter has been renamed to `height`; please update your code.
warnings.warn(msg, UserWarning)

Out[60]:

mean median size

room_bath

0.00 4.490950e+05 317500 10

0.50 2.373750e+05 264000 4

0.75 2.945209e+05 273500 72

1.00 3.470412e+05 320000 3852

1.25 6.217722e+05 516500 9

1.50 4.093457e+05 370000 1446

1.75 4.549158e+05 422900 3048

2.00 4.579050e+05 423250 1930

2.25 5.337688e+05 472500 2047

2.50 5.536618e+05 499950 5380

2.75 6.603505e+05 605000 1185

3.00 7.086619e+05 600000 753

3.25 9.707532e+05 835000 589

3.50 9.324017e+05 820000 731

3.75 1.198179e+06 1070000 155

4.00 1.268405e+06 1055000 136

4.25 1.526653e+06 1380000 79

4.50 1.334211e+06 1060000 100

4.75 2.022300e+06 2300000 23

5.00 1.674167e+06 1430000 21

5.25 1.817962e+06 1420000 13

5.50 2.522500e+06 2340000 10

5.75 2.492500e+06 1930000 4

6.00 2.948333e+06 2895000 6

6.25 3.095000e+06 3095000 2

6.50 1.710000e+06 1710000 2

6.75 2.735000e+06 2735000 2

7.50 4.500000e+05 450000 1

7.75 6.890000e+06 6890000 1

8.00 4.990000e+06 4990000 2

There is upward trend in price with increase in room_bath


Analyzing Bivariate for Feature: living_measure

In [61]:

#living_measure - price increases with increase in living measure


plt.figure(figsize=(plotSizeX, plotSizeY))
print(sns.scatterplot(house_df['living_measure'],house_df['price']))
house_df['living_measure'].describe()

AxesSubplot(0.125,0.125;0.775x0.755)

Out[61]:
count 21613.000000
mean 2079.899736
std 918.440897
min 290.000000
25% 1427.000000
50% 1910.000000
75% 2550.000000
max 13540.000000
Name: living_measure, dtype: float64

There is clear increment in price of the property with increment in the living measure But there seems to be one outlier to this trend. Need to
evaluate the same

Analyzing Bivariate for Feature: lot_measure


In [62]:

#lot_measure - there seems to be no relation between lot_measure and price


#lot_measure - data value range is very large so breaking it get better view.
plt.figure(figsize=(plotSizeX, plotSizeY))
print(sns.scatterplot(house_df['lot_measure'],house_df['price']))
house_df['lot_measure'].describe()

AxesSubplot(0.125,0.125;0.775x0.755)

Out[62]:

count 2.161300e+04
mean 1.510697e+04
std 4.142051e+04
min 5.200000e+02
25% 5.040000e+03
50% 7.618000e+03
75% 1.068800e+04
max 1.651359e+06
Name: lot_measure, dtype: float64

There doesnt seem to be no relation between lot_measure and price trend


In [63]:

#lot_measure <25000
plt.figure(figsize=(plotSizeX, plotSizeY))
x=house_df[house_df['lot_measure']<25000]
print(sns.scatterplot(x['lot_measure'],x['price']))
x['lot_measure'].describe()

AxesSubplot(0.125,0.125;0.775x0.755)

Out[63]:
count 19713.000000
mean 7762.510577
std 4252.549162
min 520.000000
25% 4997.000000
50% 7253.000000
75% 9620.000000
max 24969.000000
Name: lot_measure, dtype: float64

Almost 95% of the houses have <25000 lot_measure. But there is no clear trend between lot_measure and price

In [64]:

#lot_measure >100000 - price increases with increase in living measure


plt.figure(figsize=(plotSizeX, plotSizeY))
y=house_df[house_df['lot_measure']<=75000]
print(sns.scatterplot(y['lot_measure'],y['price']))
#y['lot_measure'].describe()

AxesSubplot(0.125,0.125;0.775x0.755)

Analyzing Bivariate for Feature: ceil


In [65]:

#ceil - median price increases initially and then falls


print(sns.factorplot(x='ceil',y='price',data=house_df, size = 4, aspect = 2))
#groupby
house_df.groupby('ceil')['price'].agg(['mean','median','size'])

C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3666: UserWarning: The `factorplot


` function has been renamed to `catplot`. The original name will be removed in a future release. Ple
ase update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'`
in `catplot`.
warnings.warn(msg)
C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3672: UserWarning: The `size` para
mter has been renamed to `height`; please update your code.
warnings.warn(msg, UserWarning)

<seaborn.axisgrid.FacetGrid object at 0x000002259321B9B0>

Out[65]:

mean median size

ceil

1.0 4.422196e+05 390000 10680

1.5 5.590449e+05 524475 1910

2.0 6.490515e+05 542950 8241

2.5 1.061021e+06 799200 161

3.0 5.826201e+05 490000 613

3.5 9.339375e+05 534500 8

There is some slight upward trend in price with the ceil

Analyzing Bivariate for Feature: coast


In [66]:

#coast - mean and median of waterfront view is high however such houses are very small in compare to non-waterfro
nt
#Also, living_measure mean and median is greater for waterfront house.
print(sns.factorplot(x='coast',y='price',data=house_df, size = 4, aspect = 2))
#groupby
house_df.groupby('coast')['living_measure','price'].agg(['median','mean'])

C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3666: UserWarning: The `factorplot


` function has been renamed to `catplot`. The original name will be removed in a future release. Ple
ase update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'`
in `catplot`.
warnings.warn(msg)
C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3672: UserWarning: The `size` para
mter has been renamed to `height`; please update your code.
warnings.warn(msg, UserWarning)

<seaborn.axisgrid.FacetGrid object at 0x0000022580B62208>

Out[66]:

living_measure price

median mean median mean

coast

0 1910 2071.587972 450000 5.316534e+05

1 2850 3173.687117 1400000 1.662524e+06

The house properties with water_front tend to have higher price compared to that of non-water_front properties

Analyzing Bivariate for Feature: sight


In [67]:

#sight - have outliers. The house sighted more have high price (mean and median) and have large living area as we
ll.
print(sns.factorplot(x='sight',y='price',data=house_df, size = 4, aspect = 2))
#groupby
house_df.groupby('sight')['price','living_measure'].agg(['mean','median','size'])

C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3666: UserWarning: The `factorplot


` function has been renamed to `catplot`. The original name will be removed in a future release. Ple
ase update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'`
in `catplot`.
warnings.warn(msg)
C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3672: UserWarning: The `size` para
mter has been renamed to `height`; please update your code.
warnings.warn(msg, UserWarning)

<seaborn.axisgrid.FacetGrid object at 0x00000225960E3080>

Out[67]:

price living_measure

mean median size mean median size

sight

0 4.966235e+05 432500 19489 1997.761660 1850 19489

1 8.125186e+05 690944 332 2568.960843 2420 332

2 7.927462e+05 675000 963 2655.257529 2470 963

3 9.724684e+05 802500 510 3018.564706 2840 510

4 1.464363e+06 1190000 319 3351.473354 3050 319

Properties with higher price have more no.of sights compared to that of houses with lower price
In [68]:

#Sight - Viewed in relation with price and living_measure


#Costlier houses with large living area are sighted more.
plt.figure(figsize=(plotSizeX, plotSizeY))
print(sns.scatterplot(house_df['living_measure'],house_df['price'],hue=house_df['sight'],palette='Paired',legend=
'full'))

AxesSubplot(0.125,0.125;0.775x0.755)

The above graph also justify that: Properties with higher price have more no.of sights compared to that of houses with lower price

Analyzing Bivariate for Feature: condition

In [69]:

#condition - as the condition rating increases its price and living measure mean and median also increases.
print(sns.factorplot(x='condition',y='price',data=house_df, size = 4, aspect = 2))
#groupby
house_df.groupby('condition')['price','living_measure'].agg(['mean','median','size'])
C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3666: UserWarning: The `factorplot
` function has been renamed to `catplot`. The original name will be removed in a future release. Ple
ase update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'`
in `catplot`.
warnings.warn(msg)
C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3672: UserWarning: The `size` para
mter has been renamed to `height`; please update your code.
warnings.warn(msg, UserWarning)

<seaborn.axisgrid.FacetGrid object at 0x00000225FFCB87F0>

Out[69]:

price living_measure

mean median size mean median size

condition

1 334431.666667 262500 30 1216.000000 1000 30

2 327316.215116 279000 172 1410.058140 1320 172

3 542097.086024 450000 14031 2149.042050 1970 14031

4 521300.705230 440000 5679 1950.991724 1820 5679

5 612577.742504 526000 1701 2022.911229 1880 1701

The price of the house increases with condition rating of the house

In [70]:

#Condition - Viewed in relation with price and living_measure. Most houses are rated as 3 or more.
#We can see some outliers as well
plt.figure(figsize=(plotSizeX, plotSizeY))
print(sns.scatterplot(house_df['living_measure'],house_df['price'],hue=house_df['condition'],palette='Paired',leg
end='full'))

AxesSubplot(0.125,0.125;0.775x0.755)

So we found out that smaller houses are in better condition and better condition houses are having higher prices
Analyzing Bivariate for Feature: quality

In [71]:

#quality - with grade increase price and living_measure increase (mean and median)

print(sns.factorplot(x='quality',y='price',data=house_df, size = 4, aspect = 2))


#groupby
house_df.groupby('quality')['price','living_measure'].agg(['mean','median','size'])

C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3666: UserWarning: The `factorplot


` function has been renamed to `catplot`. The original name will be removed in a future release. Ple
ase update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'`
in `catplot`.
warnings.warn(msg)
C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3672: UserWarning: The `size` para
mter has been renamed to `height`; please update your code.
warnings.warn(msg, UserWarning)

<seaborn.axisgrid.FacetGrid object at 0x000002258C021320>

Out[71]:

price living_measure

mean median size mean median size

quality

1 1.420000e+05 142000.0 1 290.000000 290 1

3 2.056667e+05 262000.0 3 596.666667 600 3

4 2.143810e+05 205000.0 29 660.482759 660 29

5 2.485240e+05 228700.0 242 983.326446 905 242

6 3.019166e+05 275276.5 2038 1191.561335 1120 2038

7 4.025933e+05 375000.0 8981 1689.400401 1630 8981

8 5.428955e+05 510000.0 6068 2184.748517 2150 6068

9 7.737382e+05 720000.0 2615 2868.139962 2820 2615

10 1.072347e+06 914327.0 1134 3520.299824 3450 1134

11 1.497792e+06 1280000.0 399 4395.448622 4260 399

12 2.192500e+06 1820000.0 90 5471.588889 4965 90

13 3.710769e+06 2980000.0 13 7483.076923 7100 13

There is clear increase in price of the house with higher rating on quality
In [72]:

#quality - Viewed in relation with price and living_measure. Most houses are graded as 6 or more.
#We can see some outliers as well
plt.figure(figsize=(plotSizeX, plotSizeY))
print(sns.scatterplot(house_df['living_measure'],house_df['price'],hue=house_df['quality'],palette='coolwarm_r',
legend='full'))

AxesSubplot(0.125,0.125;0.775x0.755)

Analyzing Bivariate for Feature: ceil_measure

In [73]:

#ceil_measure - price increases with increase in ceil measure


plt.figure(figsize=(plotSizeX, plotSizeY))
print(sns.scatterplot(house_df['ceil_measure'],house_df['price']))
house_df['ceil_measure'].describe()

AxesSubplot(0.125,0.125;0.775x0.755)

Out[73]:

count 21613.000000
mean 1788.390691
std 828.090978
min 290.000000
25% 1190.000000
50% 1560.000000
75% 2210.000000
max 9410.000000
Name: ceil_measure, dtype: float64

There is upward trend in price with ceil_measure


Analyzing Bivariate for Feature: basement

In [74]:

#basement - price increases with increase in ceil measure


plt.figure(figsize=(plotSizeX, plotSizeY))
print(sns.scatterplot(house_df['basement'],house_df['price']))
house_df['basement'].describe()

AxesSubplot(0.125,0.125;0.775x0.755)

Out[74]:

count 21613.000000
mean 291.509045
std 442.575043
min 0.000000
25% 0.000000
50% 0.000000
75% 560.000000
max 4820.000000
Name: basement, dtype: float64

We will create the categorical variable for basement 'has_basement' for houses with basement and no basement.This categorical variable will
be used for further analysis.

In [75]:

#Binning Basement to analyse data


def create_basement_group(series):
if series == 0:
return "No"
elif series > 0:
return "Yes"

house_df['has_basement'] = house_df['basement'].apply(create_basement_group)
In [76]:

#basement - after binning we data shows with basement houses are costlier and have higher
#living measure (mean & median)
print(sns.factorplot(x='has_basement',y='price',data=house_df, size = 4, aspect = 2))
house_df.groupby('has_basement')['price','living_measure'].agg(['mean','median','size'])

C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3666: UserWarning: The `factorplot


` function has been renamed to `catplot`. The original name will be removed in a future release. Ple
ase update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'`
in `catplot`.
warnings.warn(msg)
C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3672: UserWarning: The `size` para
mter has been renamed to `height`; please update your code.
warnings.warn(msg, UserWarning)

<seaborn.axisgrid.FacetGrid object at 0x0000022580B9A470>

Out[76]:

price living_measure

mean median size mean median size

has_basement

No 486945.394789 411500 13126 1928.879628 1740 13126

Yes 622518.174384 515000 8487 2313.467539 2100 8487

The houses with basement has better price compared to that of houses without basement

In [77]:

#basement - have higher price & living measure


plt.figure(figsize=(plotSizeX, plotSizeY))
print(sns.scatterplot(house_df['living_measure'],house_df['price'],hue=house_df['has_basement']))

AxesSubplot(0.125,0.125;0.775x0.755)
In [78]:

#yr_built - outliers can be seen easily.


plt.figure(figsize=(plotSizeX, plotSizeY))
print(sns.scatterplot(house_df['yr_built'],house_df['living_measure']))
#groupby
house_df.groupby('yr_built')['price'].agg(['mean','median','size'])

AxesSubplot(0.125,0.125;0.775x0.755)

Out[78]:

mean median size

yr_built

1900 581536.632184 549000 87

1901 557108.344828 550000 29

1902 673192.592593 624000 27

1903 480958.195652 461000 46

1904 583867.755556 478000 45

1905 753443.932432 597500 74

1906 670027.663043 555000 92

1907 676324.476923 595000 65

1908 564499.848837 519475 86

1909 696448.989362 575500 94

1910 671671.835821 542500 134

1911 632584.246575 606000 73

1912 613193.227848 557510 79

1913 586066.271186 535000 59

1914 615246.074074 553300 54

1915 585036.921875 549500 64

1916 601041.620253 515000 79

1917 528126.785714 450000 56

1918 492346.875000 412450 120

1919 537887.556818 487900 88

1920 477761.030612 448500 98

1921 613224.210526 547500 76

1922 569794.147368 515000 95

1923 618653.773810 498376 84

1924 570419.928058 525000 139

1925 607316.606061 535000 165

1926 625443.377778 560000 180

1927 654154.208696 605000 115

1928 621920.198413 547500 126

1929 574396.842105 523475 114

... ... ... ...

1986 476989.069767 419500 215

1987 517565.010204 471500 294

1988 583930.400000 500000 270

1989 583063.403448 490000 290

1990 564133.384375 457500 320

1991 630630.647321 534150 224

1992 548205.924242 472500 198

1993 556760.455446 435000 202

1994 486864.040161 439000 249

1995 577933.757396 496000 169

1996 639673.528205 540000 195

1997 606173.887006 515000 177

1998 594280.146444 500000 239


1999 640431.177358 499900 265

2000 682003.619266 544250 218

2001 741340.042623 585000 305

2002 578818.481982 447500 222

2003 558791.367299 450500 422

2004 596095.004619 507000 433

2005 580895.468889 486000 450

2006 631041.548458 510500 454

2007 615193.292566 480000 417

2008 642037.716621 500000 367

2009 518462.186957 416375 230

2010 551678.384615 448500 143

2011 544648.384615 440000 130

2012 527436.982353 448475 170

2013 678599.582090 565000 201

2014 683792.685152 599000 559

2015 759970.947368 629500 38

116 rows × 3 columns

We will create new variable: Houselandratio - This is proportion of living area in the total area of the house. We will explore the trend of price
against this houselandratio.

In [79]:

#HouseLandRatio - Computing new variable as ratio of living_measure/total_area


#Significes - Land used for construction of house
house_df["HouseLandRatio"]=np.round((house_df['living_measure']/house_df['total_area']),2)*100
house_df["HouseLandRatio"].head()

Out[79]:

17786 19.0
3782 16.0
10069 16.0
7114 24.0
10080 22.0
Name: HouseLandRatio, dtype: float64

Analyzing Bivariate for Feature: yr_renovated

In [80]:

#yr_renovated -
plt.figure(figsize=(plotSizeX, plotSizeY))
x=house_df[house_df['yr_renovated']>0]
print(sns.scatterplot(x['yr_renovated'],x['price']))
#groupby
x.groupby('yr_renovated')['price'].agg(['mean','median','size'])
AxesSubplot(0.125,0.125;0.775x0.755)

Out[80]:

mean median size

yr_renovated

1934 4.599500e+05 459950.0 1

1940 3.784000e+05 378400.0 2

1944 5.210000e+05 521000.0 1

1945 3.986667e+05 375000.0 3

1946 3.511375e+05 351137.5 2

1948 4.100000e+05 410000.0 1

1950 2.914500e+05 291450.0 2

1951 2.760000e+05 276000.0 1

1953 2.458167e+05 247500.0 3

1954 9.000000e+05 900000.0 1

1955 4.421667e+05 399000.0 3

1956 9.306667e+05 1140000.0 3

1957 2.915333e+05 249900.0 3

1958 5.595760e+05 397380.0 5

1959 3.975000e+05 397500.0 1

1960 4.771750e+05 299350.0 4

1962 6.150000e+05 615000.0 2

1963 4.977125e+05 402500.0 4

1964 3.567200e+05 325000.0 5

1965 7.822000e+05 580000.0 5

1967 2.686000e+05 268600.0 2

1968 4.835125e+05 425000.0 8

1969 5.291250e+05 555750.0 4

1970 5.230444e+05 450000.0 9

1971 4.182775e+05 418277.5 2

1972 6.197500e+05 522000.0 4

1973 4.172000e+05 440000.0 5

1974 4.025000e+05 310000.0 3

1975 5.052500e+05 521750.0 6

1976 4.016667e+05 335000.0 3

... ... ... ...

1986 6.230582e+05 520000.0 17

1987 1.206778e+06 624000.0 18

1988 7.227600e+05 588000.0 15

1989 6.397886e+05 560000.0 22

1990 7.491200e+05 730000.0 25

1991 9.650450e+05 792500.0 20

1992 6.967941e+05 599000.0 17

1993 8.480032e+05 805000.0 19

1994 9.430265e+05 780000.0 19

1995 8.055231e+05 536475.0 16

1996 7.496633e+05 710000.0 15

1997 6.203960e+05 569950.0 15

1998 7.737316e+05 526000.0 19

1999 1.030706e+06 840000.0 17

2000 8.090843e+05 755000.0 35

2001 1.089489e+06 675000.0 19

2002 1.216498e+06 890000.0 22

2003 9.923056e+05 767500.0 36

2004 7.820769e+05 721250.0 26


2005 8.151957e+05 744000.0 35

2006 7.890396e+05 654050.0 24

2007 8.389221e+05 797000.0 35

2008 1.034499e+06 801500.0 18

2009 9.006824e+05 521000.0 22

2010 9.926694e+05 845000.0 18

2011 6.074962e+05 577000.0 13

2012 6.251818e+05 515000.0 11

2013 6.649608e+05 560000.0 37

2014 6.550301e+05 575000.0 91

2015 6.591562e+05 651000.0 16

69 rows × 3 columns

So most houses are renovated after 1980's. We will create new categorical variable 'has_renovated' to categorize the property as renovated and
non-renovated. For further ananlysis we will use this categorical variable.

In [81]:

#Lets try to group yr_renovated


#Binning Basement to analyse data
def create_renovated_group(series):
if series == 0:
return "No"
elif series > 0:
return "Yes"

house_df['has_renovated'] = house_df['yr_renovated'].apply(create_renovated_group)
In [84]:

#has_renovated - renovated have higher mean and median, however it does not confirm if the prices of house renova
ted
#actually increased or not.
#HouseLandRatio - Renovated house utilized more land area for construction of house
plt.figure(figsize=(plotSizeX, plotSizeY))
print(sns.scatterplot(house_df['living_measure'],house_df['price'],hue=house_df['has_renovated']))
#groupby
house_df.groupby(['has_renovated'])['price','HouseLandRatio'].agg(['mean','median','size'])

AxesSubplot(0.125,0.125;0.775x0.755)

Out[84]:

price HouseLandRatio

mean median size mean median size

has_renovated

No 530447.958597 448000 20699 22.067056 20.0 20699

Yes 760628.777899 600000 914 22.296499 21.0 914

Renovated properties have higher price than others with same living measure space.

In [85]:

#pd.crosstab(house_df['yearbuilt_group'],house_df['has_renovated'])
In [86]:

#has_renovated - have higher price & living measure


plt.figure(figsize=(plotSizeX, plotSizeY))
x=house_df[house_df['yr_built']<2000]
print(sns.scatterplot(x['living_measure'],x['price'],hue=x['has_renovated']))

AxesSubplot(0.125,0.125;0.775x0.755)

Analyzing Bivariate for Feature: furnished

In [87]:

#furnished - Furnished has higher price value and has greater living_measure
plt.figure(figsize=(plotSizeX, plotSizeY))
print(sns.scatterplot(house_df['living_measure'],house_df['price'],hue=house_df['furnished']))
#groupby
house_df.groupby('furnished')['price','living_measure','HouseLandRatio'].agg(['mean','median','size'])

AxesSubplot(0.125,0.125;0.775x0.755)

Out[87]:

price living_measure HouseLandRatio

mean median size mean median size mean median size

furnished

0 437300.158968 401000 17362 1792.256652 1720 17362 21.508236 19.0 17362

1 960374.414961 810000 4251 3254.696072 3110 4251 24.398730 24.0 4251

Furnished houses have higher price than that of the Non-furnished houses
Analyzing Bivariate for Feature: city

In [88]:

#City - outliers can be seen easily.

print(sns.factorplot(x='City',y='price',data=house_df, size = 4, aspect = 2))


plt.xticks(rotation=90)
#groupby
house_df.groupby('City')['price'].agg(['mean','median','size']).sort_values(by='median',ascending=False)

C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3666: UserWarning: The `factorplot


` function has been renamed to `catplot`. The original name will be removed in a future release. Ple
ase update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'`
in `catplot`.
warnings.warn(msg)
C:\ProgramData\Anaconda3\lib\site-packages\seaborn\categorical.py:3672: UserWarning: The `size` para
mter has been renamed to `height`; please update your code.
warnings.warn(msg, UserWarning)

<seaborn.axisgrid.FacetGrid object at 0x0000022593C63C88>

Out[88]:

mean median size

City

Medina 2.161300e+06 1895000.0 50

Mercer Island 1.194874e+06 993750.0 282

Bellevue 8.984661e+05 749000.0 1407

Sammamish 7.328210e+05 688500.0 800

Redmond 6.589089e+05 625000.0 979

Issaquah 6.151222e+05 572000.0 733

Woodinville 6.174979e+05 570000.0 471

Kirkland 6.465428e+05 510000.0 977

Snoqualmie 5.280031e+05 500000.0 310

Bothell 4.903771e+05 470000.0 195

Vashon 4.874805e+05 463750.0 118

Fall City 5.806379e+05 460000.0 81

Seattle 5.350695e+05 453000.0 8977

Kenmore 4.624889e+05 445000.0 283

Carnation 4.556171e+05 415000.0 124

Duvall 4.248151e+05 401250.0 190

North Bend 4.395073e+05 399500.0 221

Black Diamond 4.236660e+05 359999.5 100

Renton 4.034685e+05 358000.0 1597

Maple Valley 3.668761e+05 342000.0 590

Kent 2.995499e+05 283200.0 1203

Enumclaw 3.157093e+05 279500.0 234

Auburn 2.914815e+05 270000.0 912

Federal Way 2.893913e+05 268000.0 779


From the above graph, few cities have higher average price of the houses compared to others. We need to further analyse why the price varies
among cities.

In [89]:

#City mean price distribution with average


city_price=pd.DataFrame(house_df.groupby('City')['price'].agg(['mean','median','size']))

indx=city_price.index
overall_price_mean=np.mean(house_df['price'])
overall_price_median=np.median(house_df['price'])

fig, ax1 = plt.subplots(figsize=(plotSizeX, plotSizeY))


barlist=ax1.bar(city_price.index,city_price['mean'],color='gray')
plt.xticks(rotation=90)
ax1.axhline(overall_price_mean, color="red")
ax1.text(1.02, overall_price_mean, "{0:.2f}".format(round(overall_price_mean,2)), va='center', ha="left", bbox=di
ct(facecolor="w",alpha=0.5),
transform=ax1.get_yaxis_transform())
plt.title("Cities and Mean Price")
plt.show()
As we can see from above grapgh, majorly below cities have higher mean house prices

1. Bellevue
2. Fall City
3. Federal Way
4. Kirkland
5. Medina
6. Mercer Island
7. Redmond
8. Sammanmish
9. Woodinville

In [90]:

#City median price distribution with average


fig, ax1 = plt.subplots(figsize=(plotSizeX, plotSizeY))
barlist=ax1.bar(city_price.index,city_price['median'],color='green')
plt.xticks(rotation=90)
ax1.axhline(overall_price_median, color="red")
ax1.text(1.02, overall_price_median, "{0:.2f}".format(round(overall_price_median,2)), va='center', ha="left", bbo
x=dict(facecolor="w",alpha=0.5),
transform=ax1.get_yaxis_transform())

plt.title("Cities and Median Price")


plt.show()

As we can see from above grapgh, majorly below cities have higher median house prices

1. Bellevue
2. Bothell
3. Issaquah
4. Kirkland
5. Medina
6. Mercer Island
7. Redmond
8. Sammanmish
9. Snoqualmie
10. Woodinville

In [91]:
#let's make the copy of the dataframe, before making any furhter changes
house_df_bdp=house_df.copy()

DATA PROCESSING
Treating Outlilers
We have seen outliers for columns room_bath(33 bed), living_measure, lot_measure, ceil_measure and Basement

In [92]:

def outlier_treatment(datacolumn):
sorted(datacolumn)
Q1,Q3 = np.percentile(datacolumn , [25,75])
IQR = Q3-Q1
lower_range = Q1-(1.5 * IQR)
upper_range = Q3+(1.5 * IQR)
return lower_range,upper_range

Using the above function, lets get the lowerbound and upperbound values

Treating outliers for column - ceil_measure

In [93]:

lowerbound,upperbound = outlier_treatment(house_df.ceil_measure)
print(lowerbound,upperbound)

-340.0 3740.0

Lets check which column is considered as an outlier

In [94]:

house_df[(house_df.ceil_measure < lowerbound) | (house_df.ceil_measure > upperbound)]

Out[94]:

cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... lot_measure15 furnished total_area month_y

2014-05-
7142 7397300220 2750000 4 3.25 4430 21000 2.0 0 0 ... 20000 1 25430 May-2
29

2014-05-
10270 3221059044 799950 4 3.50 4220 196817 2.0 0 0 ... 195395 1 201037 May-2
23

2014-05-
9770 2424049029 3100000 6 4.25 6980 15682 3.0 0 4 ... 18367 1 22662 May-2
29

2014-05-
9909 1724069059 2000000 5 4.00 4580 4443 3.0 1 4 ... 4443 1 9023 May-2
24

2014-05-
3712 9359100500 1800000 4 3.25 4060 13000 2.0 0 3 ... 13800 1 17060 May-2
27

2014-05-
3628 4131900042 2000000 5 4.25 6490 10862 2.0 0 3 ... 14080 1 17352 May-2
16

2014-05-
18900 3521059134 900000 3 3.50 4080 217697 1.5 0 3 ... 217790 1 221777 May-2
23

2014-05-
13664 2481620310 1120000 4 2.25 4470 60373 2.0 0 0 ... 40450 1 64843 May-2
14

2014-05-
20740 1225069038 2280000 7 8.00 13540 307752 3.0 0 4 ... 217800 1 321292 May-2
05

2014-05-
10672 3892500150 1550000 3 2.50 4460 26027 2.0 0 0 ... 26027 1 30487 May-2
21

2014-05-
3964 1829300210 762300 4 2.50 3880 14550 2.0 0 0 ... 14045 1 18430 May-2
06

2014-05-
10294 525069127 1200000 4 3.50 4740 172497 2.0 0 0 ... 49658 1 177237 May-2
23

2014-05-
3996 3630200780 1050000 4 3.75 3860 5474 2.5 0 0 ... 5474 1 9334 May-2
22

2014-05-
10540 2524069097 2240000 5 6.50 7270 130017 2.0 0 0 ... 44890 1 137287 May-2
09

2014-05-
13827 824059042 1890000 5 3.50 4180 17935 2.0 0 0 ... 13760 1 22115 May-2
30

2014-05-
10462 7237550130 1300000 4 3.50 4380 74052 1.0 0 0 ... 62291 1 78432 May-2
20

2014-05-
3089 3616600250 1600000 3 3.25 3790 19000 2.0 0 4 ... 18628 1 22790 May-2
27

2014-05-
15646 98000960 1050000 4 3.25 4400 16625 2.0 0 0 ... 15523 1 21025 May-2
13

2014-05-
8153 425069020 1090000 4 2.50 4340 141570 2.5 0 0 ... 97138 1 145910 May-2
05
2014-05-
8484 4039800080 1360000 5 3.50 5960 13703 2.0 0 2 ... 17320 1 19663 May-2
29

2014-05-
15135 3625700010 1870000 5 4.00 4510 15175 2.0 0 0 ... 13500 1 19685 May-2
06

2014-05-
8769 2524049318 2000000 4 3.00 4260 18000 2.0 0 2 ... 17015 1 22260 May-2
28

2014-05-
15404 3276940100 1000000 4 3.00 4260 18687 2.0 0 0 ... 16772 1 22947 May-2
22

2014-05-
8358 4100500070 1710000 5 4.50 4590 14685 2.0 0 0 ... 9486 1 19275 May-2
27

2014-05-
9507 8691310840 833000 4 2.75 3780 10308 2.0 0 0 ... 10740 1 14088 May-2
09

2014-05-
7743 6613000935 2560000 4 2.50 5300 26211 2.0 1 2 ... 19281 1 31511 May-2
13

2014-05-
3330 5710000005 2150000 4 5.50 5060 10320 2.0 0 0 ... 10080 1 15380 May-2
22

2014-05-
19115 6648150040 1680000 5 3.25 4860 23723 2.0 0 2 ... 13860 1 28583 May-2
13

2014-05-
15697 3758900259 1040000 4 3.50 3900 8391 2.0 0 0 ... 12268 1 12291 May-2
07

2014-05-
3187 1853080640 966000 5 4.50 3810 8019 2.0 0 0 ... 7713 1 11829 May-2
14

... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

2015-04-
3232 7135520300 1300000 3 2.75 4120 16365 1.0 0 2 ... 14110 1 20485 April-2
07

2015-04-
15540 98000740 945000 5 3.50 4380 14925 2.0 0 0 ... 14633 1 19305 April-2
01

2015-04-
15481 2726059144 1040000 5 3.75 4570 10194 2.0 0 0 ... 7560 1 14764 April-2
10

2015-04-
8270 3401700150 1350000 5 3.00 5530 38816 1.5 0 2 ... 44417 1 44346 April-2
23

2015-04-
19626 3295610080 912000 4 2.75 4030 10888 2.0 0 0 ... 10756 1 14918 April-2
01

2015-04-
14955 3121500150 894000 4 2.50 3800 22029 2.0 0 0 ... 24979 1 25829 April-2
23

2015-04-
15084 2625069070 1390000 4 3.25 4860 181319 2.5 0 0 ... 181319 1 186179 April-2
10

2015-04-
2878 644000040 1780000 4 3.25 3950 10912 2.0 0 0 ... 10998 1 14862 April-2
29

2015-04-
8561 3585900500 1530000 4 4.25 4720 21000 3.0 0 4 ... 20000 1 25720 April-2
02

2015-04-
8682 3860900035 1940000 5 3.50 4230 16526 2.0 0 0 ... 12362 1 20756 April-2
15

2015-04-
15893 98300230 1460000 4 4.00 4620 130208 2.0 0 0 ... 131007 1 134828 April-2
28

2015-04-
7749 6790830090 1060000 4 3.50 4220 8417 3.0 0 0 ... 8435 1 12637 April-2
15

2015-04-
7941 2481630030 965000 4 2.50 3920 41206 2.0 0 0 ... 36562 1 45126 April-2
27

2015-04-
15862 7237550110 1180000 4 3.25 3750 74052 2.0 0 0 ... 74052 1 77802 April-2
24

2015-04-
19172 713500020 1390000 4 4.50 4490 24767 2.0 0 2 ... 32700 1 29257 April-2
21

2015-04-
7847 7853440140 802945 5 3.50 4000 9234 2.0 0 0 ... 6600 1 13234 April-2
09

2015-05-
13999 1126059201 1270000 5 3.25 4410 35192 2.0 0 2 ... 59677 1 39602 May-2
04

2015-05-
9320 1525059261 1900000 5 4.50 5160 44315 2.0 0 0 ... 44315 1 49475 May-2
05

2015-05-
2730 7853440050 771005 5 4.50 4000 6713 2.0 0 0 ... 6600 1 10713 May-2
05

2015-05-
5687 3751600409 510000 4 2.50 4073 17334 2.0 0 0 ... 9625 0 21407 May-2
08

2015-05-
5620 6065300370 4210000 5 6.00 7440 21540 2.0 0 0 ... 19329 1 28980 May-2
06

2015-05-
21004 3303960250 1050000 4 3.25 4020 11588 2.0 0 0 ... 8066 1 15608 May-2
07

2015-05-
15596 1925059254 3000000 5 4.00 6670 16481 2.0 0 0 ... 16607 1 23151 May-2
07

2015-05-
13440 1623089165 920000 4 3.75 4030 503989 2.0 0 0 ... 71874 1 508019 May-2
06
2015-05-
15586 1266200140 1850000 4 3.25 4160 10335 2.0 0 0 ... 10333 1 14495 May-2
06

2015-05-
9588 7237501380 1270000 4 3.50 4640 13404 2.0 0 0 ... 13590 1 18044 May-2
07

2015-05-
17098 2424059174 2000000 4 3.25 5640 35006 2.0 0 2 ... 35033 1 40646 May-2
08

2015-05-
13099 3024059057 1650000 4 4.50 5550 16065 2.0 0 0 ... 16488 1 21615 May-2
01

2015-05-
19121 4389201095 3650000 5 3.75 5020 8694 2.0 0 1 ... 11275 1 13714 May-2
11

2015-05-
13112 7960900060 2900000 4 3.25 5050 20100 1.5 0 2 ... 20060 1 25150 May-2
04

611 rows × 30 columns

We got 611 records which are outliers

In [95]:

#dropping the record from the dataset


house_df.drop(house_df[ (house_df.ceil_measure > upperbound) | (house_df.ceil_measure < lowerbound) ].index, inpl
ace=True)

In [96]:

house_df.shape

Out[96]:

(21002, 30)

In [97]:

#ceil_measure
print("Skewness is :", house_df.ceil_measure.skew())
plt.figure(figsize=(plotSizeX, plotSizeY))
sns.distplot(house_df.ceil_measure)
house_df.ceil_measure.describe()

Skewness is : 0.8198869256569326

Out[97]:

count 21002.000000
mean 1712.238168
std 696.044073
min 290.000000
25% 1180.000000
50% 1540.000000
75% 2140.000000
max 3740.000000
Name: ceil_measure, dtype: float64

After treating outliers of ceil_measure, the data has reduced by about 600(~3%) data points but data is nicely distributed
Treating outliers for column - basement

In [98]:

lowerbound_base,upperbound_base = outlier_treatment(house_df.basement)
print(lowerbound_base,upperbound_base)

-855.0 1425.0

In [99]:
house_df[(house_df.basement < lowerbound_base) | (house_df.basement > upperbound_base)]

Out[99]:

cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... lot_measure15 furnished total_area month_y

2014-05-
16357 3211270170 404000 4 3.00 4060 35621 1.0 0 0 ... 35259 1 39681 May-2
23

2014-05-
7386 5700003640 2100000 5 3.75 5340 10655 2.5 0 3 ... 9418 1 15995 May-2
19

2014-05-
9727 5119010090 549900 5 2.75 3060 7015 1.0 0 0 ... 7600 0 10075 May-2
10

2014-05-
16069 7663700968 565000 7 4.50 4140 9066 1.0 0 0 ... 1865 0 13206 May-2
28

2014-05-
1783 7430200100 1220000 4 3.50 4910 9444 1.5 0 0 ... 11063 1 14354 May-2
14

2014-05-
1145 7856410030 1030000 5 2.75 3190 16920 1.0 0 3 ... 13100 1 20110 May-2
05

2014-05-
13624 7855801610 1220000 4 2.50 3190 8684 1.0 0 3 ... 8684 1 11874 May-2
19

2014-05-
6610 1424059154 1270000 4 3.00 5520 8313 2.0 0 3 ... 8278 1 13833 May-2
16

2014-05-
13951 9322800210 879950 4 2.25 3500 13875 1.0 0 4 ... 15000 1 17375 May-2
20

2014-05-
13757 4219401236 1690000 3 1.75 3400 8965 1.0 0 2 ... 8500 1 12365 May-2
20

2014-05-
10529 7784400130 497300 6 2.75 3200 9200 1.0 0 2 ... 9500 0 12400 May-2
05

2014-05-
6832 486000510 1330000 4 3.00 3370 7920 1.0 0 3 ... 7380 1 11290 May-2
23

2014-05-
2479 1624049293 390000 5 3.75 2890 5000 1.0 0 0 ... 5117 0 7890 May-2
06

2014-05-
15539 7855200120 1370000 4 2.75 3720 9450 1.0 0 4 ... 8605 1 13170 May-2
09

2014-05-
2752 7922900040 1080000 4 3.00 3600 9200 1.0 0 4 ... 9775 1 12800 May-2
22

2014-05-
8532 3623500205 2450000 4 4.50 5030 11023 2.0 0 2 ... 11490 1 16053 May-2
13

2014-05-
14866 5152700060 465000 6 3.25 4250 23326 1.0 0 3 ... 15983 1 27576 May-2
28

2014-05-
3344 4058800215 430000 3 3.75 3890 7140 1.0 0 2 ... 7320 0 11030 May-2
28

2014-05-
3501 4122900190 1350000 5 1.75 3380 20021 1.0 0 0 ... 19809 0 23401 May-2
12

2014-05-
15783 217500140 464000 5 2.50 3400 8970 1.0 0 0 ... 8475 0 12370 May-2
13

2014-05-
9331 5152100060 472000 6 2.50 4410 14034 1.0 0 2 ... 13988 1 18444 May-2
29

2014-05-
9349 3342700405 585000 4 1.75 3000 42200 1.0 0 3 ... 9821 0 45200 May-2
22

2014-05-
9279 4139420590 1210000 4 3.50 4560 16643 1.0 0 3 ... 15177 1 21203 May-2
20

2014-05-
520 1313000220 675000 5 3.00 3410 9600 1.0 0 0 ... 9679 0 13010 May-2
13

2014-05-
769 7856410430 1390000 6 2.75 5700 20000 1.0 0 4 ... 15700 1 25700 May-2
30

2014-05-
18293 5425700205 1800000 4 3.50 4460 16953 1.0 0 0 ... 13370 1 21413 May-2
20

2014-05-
6088 9558050170 475000 4 2.50 3740 8700 1.0 0 0 ... 6333 1 12440 May-2
13

2014-05-
18107 1180008355 380000 5 1.75 3000 6000 1.0 0 0 ... 7125 0 9000 May-2
07
2014-05-
17068 8562710550 950000 5 3.75 5330 6000 2.0 0 2 ... 5797 1 11330 May-2
21

2014-05-
21501 2021201000 980000 4 3.00 3680 5854 1.0 0 3 ... 5000 1 9534 May-2
23

... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

2015-04-
16301 9542000275 675000 4 2.50 2420 18470 1.0 0 0 ... 13800 0 20890 April-2
06

2015-04-
7062 3982700250 799900 4 2.50 3030 7800 2.0 0 0 ... 7435 1 10830 April-2
23

2015-04-
18757 8085400376 2320000 4 3.50 5050 9520 2.0 0 0 ... 9248 1 14570 April-2
21

2015-04-
17329 2655500235 1610000 4 3.50 3920 19088 1.0 0 1 ... 13749 1 23008 April-2
10

2015-04-
5618 3524039202 1070000 3 2.25 2950 7232 1.0 0 2 ... 7140 0 10182 April-2
20

2015-04-
18042 5460600110 1050000 6 4.00 5310 12741 2.0 0 2 ... 12632 1 18051 April-2
23

2015-04-
17165 1736800520 662500 3 2.50 3560 9796 1.0 0 0 ... 8925 0 13356 April-2
03

2015-04-
17090 2141300080 707000 5 2.50 3050 13212 1.0 0 0 ... 10826 0 16262 April-2
24

2015-04-
15407 1373800330 1120000 4 2.50 3690 11191 1.0 0 3 ... 8160 1 14881 April-2
20

2015-04-
14911 4147200040 1090000 5 2.25 3650 13068 1.0 0 0 ... 13927 1 16718 April-2
14

2015-04-
2642 2425059074 740000 5 3.00 3655 51836 1.0 0 0 ... 8606 0 55491 April-2
10

2015-04-
2860 9808100150 3350000 5 3.75 5350 15360 1.0 0 1 ... 15940 1 20710 April-2
02

2015-04-
3498 9560500105 957000 4 2.25 2860 11545 1.0 0 0 ... 11396 0 14405 April-2
24

2015-04-
7793 629860010 1350000 4 3.50 4640 9827 2.0 0 2 ... 8207 1 14467 April-2
29

2015-05-
5853 7964410100 700000 4 3.50 5360 25800 1.0 0 0 ... 21781 1 31160 May-2
04

2015-05-
5500 4139420190 2480000 4 5.00 5310 16909 1.0 0 4 ... 15701 1 22219 May-2
12

2015-05-
12295 1742800430 463828 5 1.75 3250 13702 1.0 0 2 ... 11328 0 16952 May-2
04

2015-05-
5347 9541600490 931088 4 2.50 3510 17400 1.0 0 0 ... 12120 1 20910 May-2
05

2015-05-
13793 6065300840 2850000 4 4.00 5040 17208 1.0 0 0 ... 18647 1 22248 May-2
01

2015-05-
19013 1822079046 500000 3 2.00 3040 41072 1.0 0 0 ... 54014 0 44112 May-2
04

2015-05-
5617 1925069082 2200000 5 4.25 4640 22703 2.0 1 4 ... 14200 0 27343 May-2
11

2015-05-
7035 1180007375 625000 5 3.50 4010 6000 2.0 0 3 ... 6000 1 10010 May-2
12

2015-05-
4032 7878400022 390000 4 2.25 3060 7920 1.0 0 0 ... 7800 0 10980 May-2
06

2015-05-
1890 3336000050 435000 6 3.00 3560 4290 1.0 0 0 ... 6000 0 7850 May-2
01

2015-05-
19313 8835401250 1490000 6 2.75 4430 6440 2.0 0 3 ... 7314 1 10870 May-2
06

2015-05-
4404 3523069008 890000 4 3.25 4360 210254 1.0 0 0 ... 87120 1 214614 May-2
05

2015-05-
20299 3286800260 780000 5 2.50 3480 74052 1.0 0 0 ... 65775 0 77532 May-2
06

2015-05-
4712 1924059254 1300000 5 3.75 3490 15246 1.0 0 1 ... 15682 1 18736 May-2
08

2015-05-
8288 2524049108 1380000 5 4.25 4050 18827 1.0 0 2 ... 25120 1 22877 May-2
12

2015-05-
15391 2925059260 800000 5 2.50 3000 10560 1.0 0 0 ... 11616 0 13560 May-2
06

408 rows × 30 columns

We got 408 records as outliers, let's drop these outliers


In [100]:

#dropping the record from the dataset


house_df.drop(house_df[ (house_df.basement > upperbound_base) | (house_df.basement < lowerbound_base) ].index, in
place=True)

In [101]:

house_df.shape

Out[101]:

(20594, 30)

In [102]:

#basement_measure
plt.figure(figsize=(plotSizeX, plotSizeY))
sns.distplot(house_df.basement)

Out[102]:

<matplotlib.axes._subplots.AxesSubplot at 0x22593a3e5f8>

After treating outliers of basement, we can see that 400(~2%) data points got imputed. Total about 5% data has been imputed after treating
ceil_measure and basement.

In [103]:

#Let's see the boxplot now for basement


plt.figure(figsize=(plotSizeX, plotSizeY))
sns.boxplot(house_df['basement'])

Out[103]:

<matplotlib.axes._subplots.AxesSubplot at 0x22593921d30>
Treating outliers for column - living_measure

In [104]:

lowerbound_lim,upperbound_lim = outlier_treatment(house_df.living_measure)
print(lowerbound_lim,upperbound_lim)

-160.0 4000.0

In [105]:
house_df[(house_df.living_measure < lowerbound_lim) | (house_df.living_measure > upperbound_lim)]

Out[105]:

cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... lot_measure15 furnished total_area month_y

2014-05-
10110 6669100070 900000 4 3.25 4700 38412 2.0 0 0 ... 35571 1 43112 May-2
12

2014-05-
7275 2926069083 900000 5 3.75 4130 226076 2.0 0 0 ... 55321 1 230206 May-2
07

2014-05-
10549 6819100020 1430000 4 4.25 4960 6000 2.5 0 0 ... 4080 1 10960 May-2
29

2014-05-
1438 5093300325 1610000 4 3.50 4390 11600 2.0 0 3 ... 12000 1 15990 May-2
23

2014-05-
13897 7853280350 809000 5 4.50 4630 6324 2.0 0 0 ... 6790 1 10954 May-2
12

2014-05-
10530 6169901185 490000 5 3.50 4460 2975 3.0 0 2 ... 4231 1 7435 May-2
20

2014-05-
6830 425079099 560000 3 3.00 4120 60392 2.0 0 2 ... 64033 1 64512 May-2
07

2014-05-
2969 7853280550 700000 4 3.50 4490 5099 2.0 0 0 ... 5537 1 9589 May-2
28

2014-05-
2864 251620090 2400000 4 3.25 4140 20734 1.0 0 1 ... 20008 1 24874 May-2
30

2014-05-
2680 587550280 625000 4 3.25 4240 25639 2.0 0 3 ... 24967 1 29879 May-2
30

2014-05-
5059 1338600225 1970000 8 3.50 4440 6480 2.0 0 3 ... 8640 1 10920 May-2
28

2014-05-
11490 526069024 950000 5 3.00 4530 258746 1.5 0 0 ... 83199 1 263276 May-2
12

2014-05-
17520 723000114 1400000 5 3.50 4010 8510 2.0 0 1 ... 6128 1 12520 May-2
05

2014-05-
4965 8562710250 890000 4 4.25 4420 5750 2.0 0 0 ... 5750 1 10170 May-2
05

2014-05-
4596 8562710520 890000 5 3.50 4490 6000 2.0 0 0 ... 6000 1 10490 May-2
05

2014-05-
21557 3758900075 1530000 5 4.50 4270 8076 2.0 0 0 ... 10631 1 12346 May-2
07

2014-05-
1071 1924069039 869000 5 3.25 4180 49222 2.0 0 0 ... 8029 0 53402 May-2
19

2014-06-
15797 3127200021 850000 4 3.50 4140 7089 2.0 0 0 ... 8896 1 11229 June-2
16

2014-06-
5235 293760050 1050000 4 4.25 4390 13833 2.0 0 3 ... 11652 1 18223 June-2
27

2014-06-
19273 3629890190 1300000 4 4.00 4270 6002 2.0 0 3 ... 5942 1 10272 June-2
06

2014-06-
17631 1702901180 665000 6 3.00 4250 4400 2.5 0 0 ... 4950 0 8650 June-2
11

2014-06-
4215 8043700300 2700000 4 3.25 4420 7850 2.0 1 4 ... 8525 1 12270 June-2
08

2014-06-
7571 3616600231 960000 4 3.00 4590 9150 2.0 0 0 ... 12348 1 13740 June-2
03

2014-06-
17402 8128600060 600000 4 3.25 4690 14930 2.0 0 2 ... 13320 1 19620 June-2
24

2014-06-
19073 5561300730 530000 4 3.25 4160 35654 2.0 0 0 ... 35675 0 39814 June-2
05

2014-06-
7431 5078400160 1800000 5 4.50 4400 15580 2.0 0 0 ... 14249 1 19980 June-2
05

2014-06-
21125 5700003630 1930000 5 4.25 4830 8050 2.5 0 2 ... 9194 1 12880 June-2
30

2014-06-
1552 1336800010 1340000 5 2.25 4200 5800 2.5 0 0 ... 5800 1 10000 June-2
13
2014-06-
14441 7853280570 765000 4 3.00 4410 5104 2.0 0 0 ... 5537 1 9514 June-2
04

2014-06-
372 7636800041 995000 3 4.50 4380 47044 2.0 1 3 ... 18512 1 51424 June-2
25

... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

2015-03-
11314 722059020 550000 6 4.50 4520 40164 2.0 0 0 ... 13068 1 44684 March-2
18

2015-03-
11089 745530180 870000 5 3.50 4495 10079 2.0 0 0 ... 10079 1 14574 March-2
17

2015-03-
11340 9362000080 1600000 5 3.50 4050 20925 2.0 0 3 ... 18321 1 24975 March-2
16

2015-03-
13844 1824079073 985000 5 4.25 4650 108464 2.0 0 0 ... 155509 1 113114 March-2
31

2015-03-
9911 3026059085 1290000 5 3.50 4090 290980 1.0 0 0 ... 9255 1 295070 March-2
17

2015-03-
17302 1924059319 1290000 5 4.00 4050 11358 2.0 0 0 ... 13555 1 15408 March-2
20

2015-03-
14676 1333300145 2230000 3 4.00 4200 30120 2.0 0 2 ... 12200 1 34320 March-2
04

2015-03-
11625 2600010220 1250000 4 2.50 4040 11350 2.0 0 2 ... 12382 1 15390 March-2
26

2015-03-
8631 3616600003 1680000 3 2.50 4090 16972 2.0 0 2 ... 16972 1 21062 March-2
02

2015-04-
14545 2579500101 1390000 4 3.50 4010 10880 2.0 0 3 ... 17310 1 14890 April-2
21

2015-04-
567 9185700485 2540000 4 3.50 4350 6000 2.0 0 0 ... 7200 1 10350 April-2
01

2015-04-
21062 3303980140 1150000 4 3.00 4160 13170 2.0 0 0 ... 13148 1 17330 April-2
02

2015-04-
12545 269000970 1300000 5 3.75 4450 7680 2.0 0 0 ... 6400 1 12130 April-2
02

2015-04-
1512 1118000340 3000000 5 3.75 4590 11265 2.0 0 0 ... 8996 1 15855 April-2
08

2015-04-
4292 1115300270 900000 6 3.75 4210 6105 2.0 0 0 ... 6368 1 10315 April-2
28

2015-04-
16585 6645950070 1450000 4 3.50 5000 38012 2.0 0 0 ... 18054 1 43012 April-2
01

2015-04-
16752 8562720420 1350000 4 3.50 4740 8611 2.0 0 3 ... 8321 1 13351 April-2
30

2015-04-
16743 1223089077 718000 3 1.75 4060 136290 1.0 0 0 ... 51836 0 140350 April-2
01

2015-04-
7550 2260300060 2580000 5 3.00 4780 20440 1.0 0 0 ... 20440 1 25220 April-2
10

2015-04-
5755 1069000070 2800000 5 3.25 4590 12793 2.0 0 2 ... 8609 1 17383 April-2
15

2015-04-
5706 4128500380 1200000 4 2.50 4280 12796 2.0 0 0 ... 9593 1 17076 April-2
27

2015-04-
5670 2254100090 887250 5 3.50 4320 7502 2.0 0 0 ... 7538 1 11822 April-2
07

2015-04-
5572 853200040 2410000 5 2.50 4600 23250 1.5 0 2 ... 20066 1 27850 April-2
28

2015-04-
7209 8562750060 825000 5 3.50 4140 6770 2.0 0 0 ... 5431 1 10910 April-2
20

2015-04-
19608 114101505 630000 5 3.50 4060 8309 2.0 0 0 ... 11711 1 12369 April-2
23

2015-04-
7997 5700004028 2450000 4 4.25 4250 6552 2.0 0 3 ... 8841 1 10802 April-2
17

2015-05-
17159 1118000320 3400000 4 4.00 4260 11765 2.0 0 0 ... 10408 1 16025 May-2
08

2015-05-
17742 5428000070 770000 5 3.50 4750 8234 2.0 0 2 ... 14496 1 12984 May-2
11

2015-05-
16333 2421059090 640000 4 2.50 4090 215186 2.0 0 0 ... 142005 0 219276 May-2
11

2015-05-
1152 1525069088 442500 5 3.25 4240 226097 2.0 0 0 ... 217800 0 230337 May-2
04

178 rows × 30 columns

We got 178 records as outliers. Let's treat this by dropping


In [106]:

#dropping the record from the dataset


house_df.drop(house_df[ (house_df.living_measure > upperbound_lim) | (house_df.living_measure < lowerbound_lim) ]
.index, inplace=True)

In [107]:

#let's see the boxplot after dropping the outliers


plt.figure(figsize=(plotSizeX, plotSizeY))
sns.boxplot(house_df['living_measure'])

Out[107]:

<matplotlib.axes._subplots.AxesSubplot at 0x22593a3e240>

In [108]:

plt.figure(figsize=(plotSizeX, plotSizeY))
sns.distplot(house_df.living_measure)

Out[108]:
<matplotlib.axes._subplots.AxesSubplot at 0x22595886198>

By treating outliers of living_measure, we lost 178 data points more and data distribution looks normal

In [109]:

# shape of the data after imputing outliers in living_column


house_df.shape

Out[109]:
(20416, 30)
Treating outliers for column - lot_measure

In [110]:

lowerbound_lom,upperbound_lom = outlier_treatment(house_df.lot_measure)
print(lowerbound_lom,upperbound_lom)

-2774.875 17958.125

In [111]:

house_df[(house_df.lot_measure < lowerbound_lom) | (house_df.lot_measure > upperbound_lom)]

Out[111]:

cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... lot_measure15 furnished total_area month_y

2014-05-
10082 1121039059 503000 2 1.75 2860 59612 1.0 1 4 ... 59612 0 62472 May-2
22

2014-05-
14089 6070500055 599000 4 2.25 2260 29930 2.0 0 0 ... 29930 0 32190 May-2
06

2014-05-
1611 5561000190 437500 3 2.25 1970 35100 2.0 0 0 ... 35100 1 37070 May-2
02

2014-05-
14068 5111400086 110000 3 1.00 1250 53143 1.0 0 0 ... 217800 0 54393 May-2
12

2014-05-
14081 3022039071 800000 2 2.25 1730 31491 2.0 1 2 ... 12410 0 33221 May-2
30

2014-05-
20351 9808610190 782000 4 2.50 2830 20345 2.0 0 0 ... 13732 1 23175 May-2
09

2014-05-
9981 2324800350 860000 4 2.00 3740 32417 2.0 0 0 ... 32417 1 36157 May-2
06

2014-05-
16273 1823069279 499950 5 3.50 3200 43560 2.0 0 0 ... 43560 0 46760 May-2
20

2014-05-
16325 7214700160 610000 3 3.00 2480 45302 1.0 0 0 ... 14100 0 47782 May-2
09

2014-05-
10030 2025700730 287200 3 3.00 1850 19966 1.0 0 0 ... 6715 0 21816 May-2
02

2014-05-
3870 1330900250 550000 3 2.25 1980 40887 1.0 0 0 ... 35700 0 42867 May-2
15

2014-05-
16422 4047200380 460000 2 1.50 2730 19877 1.0 0 0 ... 19509 0 22607 May-2
26

2014-05-
3865 2924069132 527500 3 1.75 2310 78844 1.0 0 0 ... 6230 0 81154 May-2
27

2014-05-
1589 4045500510 420850 1 1.00 960 40946 1.0 0 0 ... 20350 0 41906 May-2
21

2014-05-
3824 320069049 305000 4 1.50 1590 131551 1.0 0 3 ... 108028 0 133141 May-2
14

2014-05-
18773 1321720140 370000 4 2.50 3090 18645 2.0 0 0 ... 20114 1 21735 May-2
28

2014-05-
14357 3210950080 486000 4 2.50 2150 39449 1.0 0 0 ... 35717 0 41599 May-2
14

2014-05-
14310 1921069082 560000 3 2.00 2560 216777 1.0 0 0 ... 108463 0 219337 May-2
12

2014-05-
16144 7574910780 766950 3 2.50 3030 30007 1.5 0 0 ... 34983 1 33037 May-2
14

2014-05-
9659 1023059365 520000 3 2.50 2460 54885 2.0 0 0 ... 21407 0 57345 May-2
06

2014-05-
7446 4012800010 360000 4 2.00 2680 18768 1.0 0 0 ... 15750 0 21448 May-2
06

2014-05-
18970 3523089019 480000 4 3.50 3370 435600 2.0 0 3 ... 114868 1 438970 May-2
19

2014-05-
7423 4188000670 749400 4 2.50 3240 20301 2.0 0 0 ... 23650 1 23541 May-2
15

2014-05-
16149 9368700031 195000 2 1.00 720 18000 1.0 0 0 ... 7925 0 18720 May-2
09

2014-05-
9913 8856000545 100000 2 1.00 910 22000 1.0 0 0 ... 9891 0 22910 May-2
07

2014-05-
7214 124069032 600000 3 1.75 1670 39639 1.0 0 0 ... 30492 0 41309 May-2
05

2014-05-
7264 2724089019 527550 1 0.75 820 59677 1.0 0 0 ... 14163 0 60497 May-2
23

2014-05-
7323 3761700251 600000 4 2.00 2510 38141 1.0 0 0 ... 11760 1 40651 May-2
28
2014-05-
1809 226059103 570000 3 1.75 1930 36210 1.0 0 0 ... 35060 0 38140 May-2
27

2014-05-
13684 1721069036 412000 3 1.75 1950 52256 1.0 0 0 ... 51836 0 54206 May-2
29

... ... ... ... ... ... ... ... ... ... ... ... ... ... ...

2015-05-
19300 3924500130 460000 2 2.50 1880 40575 1.0 0 0 ... 32935 1 42455 May-2
06

2015-05-
3221 1775700011 390000 3 2.50 1410 26375 1.0 0 0 ... 12474 0 27785 May-2
12

2015-05-
20818 126039394 525000 4 2.75 2300 26650 1.0 0 0 ... 9879 0 28950 May-2
08

2015-05-
3195 9310300215 652500 4 1.75 3130 18253 2.0 0 0 ... 12220 0 21383 May-2
06

2015-05-
13297 322069010 435000 3 2.00 2570 233481 1.5 0 0 ... 157687 0 236051 May-2
08

2015-05-
11080 1823069088 492000 2 1.75 1300 22239 1.0 0 0 ... 14810 0 23539 May-2
04

2015-05-
11081 625069064 625000 3 2.25 2570 47480 1.0 0 0 ... 106722 1 50050 May-2
07

2015-05-
9892 2124069103 374000 3 1.75 1510 18439 1.0 0 0 ... 34326 0 19949 May-2
05

2015-05-
4291 2426049079 330000 3 1.00 1060 20040 1.0 0 0 ... 10800 0 21100 May-2
06

2015-05-
9880 8011100050 350000 2 1.00 1220 28703 1.0 0 0 ... 6720 0 29923 May-2
08

2015-05-
4246 2722059275 536000 3 2.75 2290 34548 2.0 0 3 ... 275299 0 36838 May-2
12

2015-05-
13312 8835800450 950000 3 2.50 2780 275033 1.0 0 0 ... 16340 1 277813 May-2
04

2015-05-
20752 1326069050 750000 2 2.00 2370 155130 1.0 0 0 ... 14475 0 157500 May-2
04

2015-05-
2011 2591720160 674950 3 2.75 3510 92347 2.0 0 0 ... 37070 1 95857 May-2
01

2015-05-
20375 302000375 250000 3 2.00 1050 18304 1.0 0 0 ... 15675 0 19354 May-2
06

2015-05-
15300 722039087 329000 2 1.00 990 57499 1.0 0 0 ... 27442 0 58489 May-2
04

2015-05-
19102 1774220070 550000 4 2.25 2590 36256 2.0 0 0 ... 35657 0 38846 May-2
07

2015-05-
9615 2316400285 495000 4 3.50 2490 18042 2.0 0 0 ... 21107 0 20532 May-2
13

2015-05-
14043 9406510130 448000 5 3.50 3740 24684 2.0 0 0 ... 26023 1 28424 May-2
05

2015-05-
10637 522079068 513000 3 2.50 2150 161607 2.0 0 0 ... 207781 0 163757 May-2
06

2015-05-
6109 251610020 1580000 4 2.75 3480 19991 2.0 0 2 ... 20271 1 23471 May-2
08

2015-05-
12897 4027701265 480000 3 1.75 2920 21375 1.0 0 0 ... 8482 0 24295 May-2
01

2015-05-
1591 4166600610 335000 3 2.00 1410 44866 1.0 0 0 ... 29152 0 46276 May-2
14

2015-05-
6116 122029066 490000 3 1.75 2020 215622 2.0 0 0 ... 215622 0 217642 May-2
08

2015-05-
7911 3585900460 1060000 6 2.75 2980 20000 1.0 0 4 ... 20000 0 22980 May-2
01

2015-05-
11329 2320069111 449999 4 1.75 2290 36900 1.5 0 2 ... 12434 0 39190 May-2
07

2015-05-
9752 2521059060 490000 3 2.25 2840 107157 2.0 0 0 ... 215622 1 109997 May-2
01

2015-05-
4012 6446200050 540000 3 1.75 2590 25992 1.0 0 0 ... 29250 0 28582 May-2
04

2015-05-
16888 3422059208 390000 3 2.50 1930 64904 1.0 0 0 ... 57500 0 66834 May-2
11

2015-05-
4579 1921069101 399000 3 1.75 2170 73616 1.0 0 0 ... 297514 0 75786 May-2
08

2128 rows × 30 columns

We got 2155 records which are outliers. Let's drop these outlier records.
In [112]:

#dropping the record from the dataset


house_df.drop(house_df[ (house_df.lot_measure > upperbound_lom) | (house_df.lot_measure < lowerbound_lom) ].index
, inplace=True)

In [113]:

#let's plot after treating outliers


plt.figure(figsize=(plotSizeX, plotSizeY))
sns.boxplot(house_df['lot_measure'])

Out[113]:

<matplotlib.axes._subplots.AxesSubplot at 0x22593975eb8>

In [114]:

house_df.shape

Out[114]:

(18288, 30)

Total outliers in the lot_measure are 2128 data points. But still we are going ahead with imputing the data. We will analyze later whether there is
any impact on the data set or not.

Treating outliers for column - room_bed

In [115]:

#As we know for room_bed = 33 was outlier from our earlier findings, let's see the record and drop it
house_df[house_df['room_bed']==33]

Out[115]:

cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... lot_measure15 furnished total_area month_year

2014-06-
750 2402100895 640000 33 1.75 1620 6000 1.0 0 0 ... 4700 0 7620 June-2014
25

1 rows × 30 columns

In [116]:

#dropping the record from the dataset


house_df.drop(house_df[ (house_df.room_bed == 33) ].index, inplace=True)

In [117]:

house_df.shape

Out[117]:

(18287, 30)
In summary, after treating outliers, we have lost about 15% of the data. We will analyse the impact of this data loss during the model
evaluation.

In [118]:

#let's see the feature/columns and drop the unneccessary features


house_df.columns

Out[118]:

Index(['cid', 'dayhours', 'price', 'room_bed', 'room_bath', 'living_measure',


'lot_measure', 'ceil', 'coast', 'sight', 'condition', 'quality',
'ceil_measure', 'basement', 'yr_built', 'yr_renovated', 'zipcode',
'lat', 'long', 'living_measure15', 'lot_measure15', 'furnished',
'total_area', 'month_year', 'City', 'County', 'Type', 'has_basement',
'HouseLandRatio', 'has_renovated'],
dtype='object')

As we already have this information in other features. We will drop the unwanted columns from new copied dataframe instance :
cid,dayhours,yr_renovated,zipcode,lat,long,county,type

In [119]:

#Let's create another dataframe for modeling


df_model=house_df.copy()

In [120]:

#let's check the new copy of dataframe by printing first few records
df_model.head()

Out[120]:

cid dayhours price room_bed room_bath living_measure lot_measure ceil coast sight ... lot_measure15 furnished total_area month_ye

2014-05-
17786 7568700740 430000 3 2.75 2550 11160 2.0 0 0 ... 7440 0 13710 May-20
21

2014-05-
3782 2248000080 385500 3 2.00 1540 7947 1.0 0 0 ... 7950 0 9487 May-20
21

2014-05-
10069 7805450110 736000 4 2.50 2290 12047 2.0 0 0 ... 15666 1 14337 May-20
06

2014-05-
7114 2215500080 580000 5 2.00 1940 6000 1.0 0 0 ... 6000 0 7940 May-20
28

2014-05-
10080 1219000043 315000 5 1.75 2320 8100 1.0 0 0 ... 7271 0 10420 May-20
09

5 rows × 30 columns

New instance of dataframe for model created successfully

In [121]:

#let's verify the columns


df_model.columns

Out[121]:

Index(['cid', 'dayhours', 'price', 'room_bed', 'room_bath', 'living_measure',


'lot_measure', 'ceil', 'coast', 'sight', 'condition', 'quality',
'ceil_measure', 'basement', 'yr_built', 'yr_renovated', 'zipcode',
'lat', 'long', 'living_measure15', 'lot_measure15', 'furnished',
'total_area', 'month_year', 'City', 'County', 'Type', 'has_basement',
'HouseLandRatio', 'has_renovated'],
dtype='object')

In [122]:

#Dropping the feature not required in 1st Iteration


df_final=df_model.drop(['cid','dayhours','yr_renovated','zipcode','lat','long','County','Type'],axis=1)

In [123]:

df_final.shape

Out[123]:

(18287, 22)
In [124]:

df_final.head()

Out[124]:

price room_bed room_bath living_measure lot_measure ceil coast sight condition quality ... yr_built living_measure15 lot_measure15 furnished

17786 430000 3 2.75 2550 11160 2.0 0 0 3 8 ... 1994 1020 7440

3782 385500 3 2.00 1540 7947 1.0 0 0 3 7 ... 1961 1910 7950

10069 736000 4 2.50 2290 12047 2.0 0 0 4 9 ... 1988 3130 15666

7114 580000 5 2.00 1940 6000 1.0 0 0 5 7 ... 1945 1700 6000

10080 315000 5 1.75 2320 8100 1.0 0 0 4 7 ... 1956 1410 7271

5 rows × 22 columns

In [125]:

df_final.columns

Out[125]:

Index(['price', 'room_bed', 'room_bath', 'living_measure', 'lot_measure',


'ceil', 'coast', 'sight', 'condition', 'quality', 'ceil_measure',
'basement', 'yr_built', 'living_measure15', 'lot_measure15',
'furnished', 'total_area', 'month_year', 'City', 'has_basement',
'HouseLandRatio', 'has_renovated'],
dtype='object')

Creating dummies for categorical variables: 'room_bed', 'room_bath', 'ceil', 'coast', 'sight', 'condition', 'quality', 'furnished','City',
'has_basement', 'has_renovated'

In [126]:

# Getting dummies for columns ceil, coast, sight, condition, quality, yr_renovated, furnished
dff = pd.get_dummies(df_final, columns=['room_bed', 'room_bath', 'ceil', 'coast', 'sight', 'condition', 'quality'
, 'furnished','City',
'has_basement', 'has_renovated'],drop_first=True)

In [127]:

# let's see the data types of the features


dff.shape

Out[127]:

(18287, 92)

In [128]:

dff.columns

Out[128]:

Index(['price', 'living_measure', 'lot_measure', 'ceil_measure', 'basement',


'yr_built', 'living_measure15', 'lot_measure15', 'total_area',
'month_year', 'HouseLandRatio', 'room_bed_1', 'room_bed_2',
'room_bed_3', 'room_bed_4', 'room_bed_5', 'room_bed_6', 'room_bed_7',
'room_bed_8', 'room_bed_9', 'room_bed_10', 'room_bed_11',
'room_bath_0.5', 'room_bath_0.75', 'room_bath_1.0', 'room_bath_1.25',
'room_bath_1.5', 'room_bath_1.75', 'room_bath_2.0', 'room_bath_2.25',
'room_bath_2.5', 'room_bath_2.75', 'room_bath_3.0', 'room_bath_3.25',
'room_bath_3.5', 'room_bath_3.75', 'room_bath_4.0', 'room_bath_4.25',
'room_bath_4.5', 'room_bath_4.75', 'room_bath_5.0', 'room_bath_5.25',
'room_bath_5.75', 'ceil_1.5', 'ceil_2.0', 'ceil_2.5', 'ceil_3.0',
'ceil_3.5', 'coast_1', 'sight_1', 'sight_2', 'sight_3', 'sight_4',
'condition_2', 'condition_3', 'condition_4', 'condition_5', 'quality_4',
'quality_5', 'quality_6', 'quality_7', 'quality_8', 'quality_9',
'quality_10', 'quality_11', 'quality_12', 'furnished_1',
'City_Bellevue', 'City_Black Diamond', 'City_Bothell', 'City_Carnation',
'City_Duvall', 'City_Enumclaw', 'City_Fall City', 'City_Federal Way',
'City_Issaquah', 'City_Kenmore', 'City_Kent', 'City_Kirkland',
'City_Maple Valley', 'City_Medina', 'City_Mercer Island',
'City_North Bend', 'City_Redmond', 'City_Renton', 'City_Sammamish',
'City_Seattle', 'City_Snoqualmie', 'City_Vashon', 'City_Woodinville',
'has_basement_Yes', 'has_renovated_Yes'],
dtype='object')
Ready for model building

'dff' is the data frame which is ready for modeling

In [129]:

dff.head()

Out[129]:

City_North
price living_measure lot_measure ceil_measure basement yr_built living_measure15 lot_measure15 total_area month_year ... City_Red
Bend

17786 430000 2550 11160 2550 0 1994 1020 7440 13710 May-2014 ... 0

3782 385500 1540 7947 1120 420 1961 1910 7950 9487 May-2014 ... 0

10069 736000 2290 12047 2290 0 1988 3130 15666 14337 May-2014 ... 0

7114 580000 1940 6000 970 970 1945 1700 6000 7940 May-2014 ... 0

10080 315000 2320 8100 1160 1160 1956 1410 7271 10420 May-2014 ... 0

5 rows × 92 columns

In [130]:

#let's drop the month_year column as we already analyzed it


dff=dff.drop(['month_year'],axis=1)

In [131]:

#Creating X, y for training and testing set


X = dff.drop("price" , axis=1)
y = dff["price"]

In [132]:

from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=10)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, random_state=10)

In [133]:

print(X_train.shape)
print(X_test.shape)
print(X_val.shape)

(11703, 90)
(3658, 90)
(2926, 90)

In [134]:

dff.head()

Out[134]:

City_North
price living_measure lot_measure ceil_measure basement yr_built living_measure15 lot_measure15 total_area HouseLandRatio ... City
Bend

17786 430000 2550 11160 2550 0 1994 1020 7440 13710 19.0 ... 0

3782 385500 1540 7947 1120 420 1961 1910 7950 9487 16.0 ... 0

10069 736000 2290 12047 2290 0 1988 3130 15666 14337 16.0 ... 0

7114 580000 1940 6000 970 970 1945 1700 6000 7940 24.0 ... 0

10080 315000 2320 8100 1160 1160 1956 1410 7271 10420 22.0 ... 0

5 rows × 91 columns

Model building
Let's build the model and see their performances
Linear Regression (with Ridge and Lasso)

In [135]:

#importing the necessary libraries


from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso

from sklearn import metrics


from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error

In [136]:

LR1 = LinearRegression()
LR1.fit(X_train, y_train)
#predicting result over test data
y_LR1_predtr= LR1.predict(X_train)
y_LR1_predvl= LR1.predict(X_val)

LR1.coef_

Out[136]:

array([ 4.65340600e+01, -2.60438290e+01, 4.15085952e+01, 5.02547332e+00,


-2.04412513e+03, 5.35189660e+01, -1.85866571e+00, 2.04902213e+01,
1.71493571e+02, -1.52366392e+04, -1.71559917e+02, -7.41190980e+03,
-2.49409439e+04, -3.38428650e+04, -7.05629386e+04, -1.39570745e+05,
-6.13484472e+04, -5.01063903e+04, -1.54205825e+05, -2.24656830e+05,
1.25982909e+04, 8.20853745e+04, 9.07899656e+04, 2.27686059e+05,
9.03633165e+04, 1.00072705e+05, 1.08362140e+05, 1.12686599e+05,
1.12536191e+05, 1.13322504e+05, 1.30547366e+05, 1.77676262e+05,
1.64433901e+05, 2.85527951e+05, 1.71382012e+05, 1.61500051e+05,
1.74737226e+05, 9.19752797e+05, 1.55294652e+05, 2.95336027e+05,
-7.18864612e-09, 1.64060215e+04, 1.53741629e+04, 5.55177883e+04,
5.71678128e+04, 7.56908538e+04, 2.60659481e+05, 4.01215839e+04,
4.57278795e+04, 1.17418144e+05, 2.55845105e+05, 9.85063804e+04,
1.30265390e+05, 1.58016139e+05, 1.95914534e+05, -1.52225301e+05,
-1.50728574e+05, -1.30295440e+05, -5.38413267e+04, 1.41195146e+04,
-3.31240159e+05, -2.05150668e+05, 3.46837714e+04, 9.74678182e+05,
4.72971126e+05, 2.94909682e+05, 1.32458120e+05, 1.23710820e+05,
1.76625819e+05, 1.09165616e+05, 1.81816493e+04, 1.72163167e+05,
-1.34952467e+04, 1.66521627e+05, 1.21190592e+05, 1.50478273e+04,
2.33952638e+05, 4.07406448e+04, 8.02211265e+05, 4.19684384e+05,
1.32976614e+05, 2.32526660e+05, 6.01053338e+04, 1.61179525e+05,
1.73301885e+05, 1.04083333e+05, 8.65668543e+04, 1.56470485e+05,
2.84414383e+04, 3.11287808e+04])

In [137]:

#Model score and Deduction for each Model in a DataFrame


LR1_trscore=r2_score(y_train,y_LR1_predtr)
LR1_trRMSE=np.sqrt(mean_squared_error(y_train, y_LR1_predtr))
LR1_trMSE=mean_squared_error(y_train, y_LR1_predtr)
LR1_trMAE=mean_absolute_error(y_train, y_LR1_predtr)

LR1_vlscore=r2_score(y_val,y_LR1_predvl)
LR1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_LR1_predvl))
LR1_vlMSE=mean_squared_error(y_val, y_LR1_predvl)
LR1_vlMAE=mean_absolute_error(y_val, y_LR1_predvl)

Compa_df=pd.DataFrame({'Method':['Linear Reg Model1'],'Val Score':LR1_vlscore,'RMSE_vl': LR1_vlRMSE, 'MSE_vl': LR


1_vlMSE, 'MAE_vl': LR1_vlMAE,'train Score':LR1_trscore,'RMSE_tr': LR1_trRMSE, 'MSE_tr': LR1_trMSE, 'MAE_tr': LR1_
trMAE})

#Compa_df = Compa_df[['Method', 'Test Score', 'RMSE', 'MSE', 'MAE']]

Compa_df

Out[137]:

Method Val Score RMSE_vl MSE_vl MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg Model1 0.718749 137733.698415 1.897057e+10 93994.455301 0.730112 132958.367261 1.767793e+10 92391.001786

The linear regression model performed with scores 0.73 & .72 in training data set and validation data set respectively
In [138]:

sns.set(style="darkgrid", color_codes=True)

with sns.axes_style("white"):
sns.jointplot(x=y_val, y=y_LR1_predvl, kind="reg", color="k")

Lasso model

In [139]:
Lasso1 = Lasso(alpha=1)
Lasso1.fit(X_train, y_train)

#predicting result over test data


y_Lasso1_predtr= Lasso1.predict(X_train)
y_Lasso1_predvl= Lasso1.predict(X_val)

Lasso1.coef_

C:\ProgramData\Anaconda3\lib\site-packages\sklearn\linear_model\coordinate_descent.py:492: Convergen
ceWarning: Objective did not converge. You might want to increase the number of iterations. Fitting
data with very small alpha may cause precision problems.
ConvergenceWarning)

Out[139]:

array([ 9.65902931e+01, -3.93953142e+00, 1.35529095e+01, -2.29061776e+01,


-2.04163675e+03, 5.35553554e+01, -1.85807486e+00, -1.60949858e+00,
1.73465396e+02, 2.58275091e+04, 4.11967331e+04, 3.40052776e+04,
1.64885732e+04, 7.61896026e+03, -2.86912815e+04, -9.69275851e+04,
-1.65210854e+04, -4.12121513e+03, -1.00992716e+05, -1.71298527e+05,
-9.13241618e+04, -2.48925268e+04, -1.60169381e+04, 1.18516529e+05,
-1.64860326e+04, -6.79525494e+03, 1.46220219e+03, 5.75895038e+03,
5.60351070e+03, 6.32472345e+03, 2.34670403e+04, 7.06751057e+04,
5.73457320e+04, 1.78131982e+05, 6.38573909e+04, 5.27207711e+04,
6.65775444e+04, 8.00797041e+05, 4.05485728e+04, 1.84114085e+05,
0.00000000e+00, 1.64275884e+04, 1.53464259e+04, 5.53276759e+04,
5.70768344e+04, 7.27284380e+04, 2.60559411e+05, 3.99774418e+04,
4.57348433e+04, 1.17367493e+05, 2.55764846e+05, 9.44643505e+04,
1.25980055e+05, 1.53729964e+05, 1.91681487e+05, -1.98077420e+05,
-2.00695519e+05, -1.80295562e+05, -1.03865243e+05, -3.58821013e+04,
5.98641869e+04, 1.85986501e+05, 4.26066854e+05, 1.35396787e+06,
3.18911457e+04, 2.94465826e+05, 1.31572941e+05, 1.23199746e+05,
1.75754190e+05, 1.08625950e+05, 1.76695808e+04, 1.70928679e+05,
-1.38642363e+04, 1.66073186e+05, 1.20696485e+05, 1.45879439e+04,
2.33538555e+05, 4.03015196e+04, 8.01308223e+05, 4.19176853e+05,
1.32466072e+05, 2.32091780e+05, 5.96809785e+04, 1.60725157e+05,
1.72962017e+05, 1.03839270e+05, 8.54647044e+04, 1.55943713e+05,
2.84495484e+04, 3.11881238e+04])
In [140]:

#Model score and Deduction for each Model in a DataFrame


Lasso1_trscore=r2_score(y_train,y_Lasso1_predtr)
Lasso1_trRMSE=np.sqrt(mean_squared_error(y_train, y_Lasso1_predtr))
Lasso1_trMSE=mean_squared_error(y_train, y_Lasso1_predtr)
Lasso1_trMAE=mean_absolute_error(y_train, y_Lasso1_predtr)

Lasso1_vlscore=r2_score(y_val,y_Lasso1_predvl)
Lasso1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_Lasso1_predvl))
Lasso1_vlMSE=mean_squared_error(y_val, y_Lasso1_predvl)
Lasso1_vlMAE=mean_absolute_error(y_val, y_Lasso1_predvl)

Lasso1_df=pd.DataFrame({'Method':['Linear-Reg Lasso1'],'Val Score':Lasso1_vlscore,'RMSE_vl': Lasso1_vlRMSE, 'MSE_


vl': Lasso1_vlMSE, 'MAE_vl': Lasso1_vlMAE,'train Score':Lasso1_trscore,'RMSE_tr': Lasso1_trRMSE, 'MSE_tr': Lasso1
_trMSE, 'MAE_tr': Lasso1_trMAE})
Compa_df = pd.concat([Compa_df, Lasso1_df])

Compa_df

Out[140]:

Method Val Score RMSE_vl MSE_vl MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg Model1 0.718749 137733.698415 1.897057e+10 93994.455301 0.730112 132958.367261 1.767793e+10 92391.001786

0 Linear-Reg Lasso1 0.719117 137643.639712 1.894577e+10 93939.441186 0.730092 132963.180396 1.767921e+10 92403.854117

The lasso linear regression model performed with scores 0.73 & .72 in training data set and validation data set respectively. The coefficeints of
1 variable in lasso model is almost '0', signifying that the variable with '0' coefficient can be dropped.

In [141]:

sns.set(style="darkgrid", color_codes=True)

with sns.axes_style("white"):
sns.jointplot(x=y_val, y=y_Lasso1_predvl, kind="reg", color="k")

Ridge model
In [142]:

Ridge1 = Ridge(alpha=0.5)
Ridge1.fit(X_train, y_train)

#predicting result over test data


y_Ridge1_predtr= Ridge1.predict(X_train)
y_Ridge1_predvl= Ridge1.predict(X_val)

Ridge1.coef_

Out[142]:

array([ 4.66834622e+01, -2.60918244e+01, 4.15530007e+01, 5.13138900e+00,


-2.04329070e+03, 5.40305072e+01, -1.83894732e+00, 2.05899911e+01,
1.99149390e+02, 4.75922037e+04, 6.34342640e+04, 5.62050467e+04,
3.84394798e+04, 2.95688617e+04, -6.95633001e+03, -7.29503951e+04,
2.07847666e+03, 1.15197429e+04, -6.08136975e+04, -1.07506532e+05,
-1.24763131e+05, -6.99157975e+04, -6.12127626e+04, 6.84637990e+04,
-6.18022272e+04, -5.21975799e+04, -4.39150711e+04, -3.96547935e+04,
-4.00644775e+04, -3.93829165e+04, -2.22301737e+04, 2.63190006e+04,
1.13660948e+04, 1.31445711e+05, 1.79480665e+04, 7.08429226e+03,
2.07235857e+04, 5.06032623e+05, 1.96912226e+03, 1.26354803e+05,
0.00000000e+00, 1.62595610e+04, 1.52860502e+04, 5.48436035e+04,
5.68361232e+04, 6.73075467e+04, 2.58145115e+05, 3.96081664e+04,
4.58498930e+04, 1.16570206e+05, 2.54669644e+05, 8.12334931e+04,
1.12963518e+05, 1.40760828e+05, 1.78571191e+05, -1.31486307e+05,
-1.40442366e+05, -1.20195568e+05, -4.38816320e+04, 2.39956530e+04,
-2.60492990e+05, -1.34329618e+05, 1.11752710e+05, 6.95080119e+05,
4.12010220e+05, 2.90368008e+05, 1.26135853e+05, 1.18985675e+05,
1.69604765e+05, 1.04432882e+05, 1.42752316e+04, 1.63147681e+05,
-1.74612788e+04, 1.62055462e+05, 1.16592568e+05, 1.10585170e+04,
2.29612234e+05, 3.67503018e+04, 7.71949916e+05, 4.13369607e+05,
1.28389461e+05, 2.28151004e+05, 5.60163760e+04, 1.56620270e+05,
1.69467429e+05, 9.98907258e+04, 8.00564608e+04, 1.51440054e+05,
2.85123132e+04, 3.11686377e+04])

In [143]:

#Model score and Deduction for each Model in a DataFrame


Ridge1_trscore=r2_score(y_train,y_Ridge1_predtr)
Ridge1_trRMSE=np.sqrt(mean_squared_error(y_train, y_Ridge1_predtr))
Ridge1_trMSE=mean_squared_error(y_train, y_Ridge1_predtr)
Ridge1_trMAE=mean_absolute_error(y_train, y_Ridge1_predtr)

Ridge1_vlscore=r2_score(y_val,y_Ridge1_predvl)
Ridge1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_Ridge1_predvl))
Ridge1_vlMSE=mean_squared_error(y_val, y_Ridge1_predvl)
Ridge1_vlMAE=mean_absolute_error(y_val, y_Ridge1_predvl)

Ridge1_df=pd.DataFrame({'Method':['Linear-Reg Ridge1'],'Val Score':Ridge1_vlscore,'RMSE_vl': Ridge1_vlRMSE, 'MSE_


vl': Ridge1_vlMSE, 'MAE_vl': Ridge1_vlMAE,'train Score':Ridge1_trscore,'RMSE_tr': Ridge1_trRMSE, 'MSE_tr': Ridge1
_trMSE, 'MAE_tr': Ridge1_trMAE})
Compa_df = pd.concat([Compa_df, Ridge1_df])

Compa_df

Out[143]:

Method Val Score RMSE_vl MSE_vl MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg Model1 0.718749 137733.698415 1.897057e+10 93994.455301 0.730112 132958.367261 1.767793e+10 92391.001786

0 Linear-Reg Lasso1 0.719117 137643.639712 1.894577e+10 93939.441186 0.730092 132963.180396 1.767921e+10 92403.854117

0 Linear-Reg Ridge1 0.718929 137689.597398 1.895843e+10 93992.809617 0.729789 133037.735155 1.769904e+10 92497.255174

The Ridge linear regression model performed with scores 0.73 & .72 in training data set and validation data set respectively. The coefficeints of
variables in ridge model are all non-zero, indicating that non of the variables can be dropped.
In [144]:

sns.set(style="darkgrid", color_codes=True)

with sns.axes_style("white"):
sns.jointplot(x=y_val, y=y_Ridge1_predvl, kind="reg", color="k")

In summary, Linear models have performed almost with similar results in both regularized model and non-regularized models

KNN Regressor

In [145]:

from sklearn.neighbors import KNeighborsRegressor

In [146]:

knn1 = KNeighborsRegressor(n_neighbors=4,weights='distance')
knn1.fit(X_train, y_train)

#predicting result over test data


y_knn1_predtr= knn1.predict(X_train)
y_knn1_predvl= knn1.predict(X_val)

In [147]:

#Model score and Deduction for each Model in a DataFrame


knn1_trscore=r2_score(y_train,y_knn1_predtr)
knn1_trRMSE=np.sqrt(mean_squared_error(y_train, y_knn1_predtr))
knn1_trMSE=mean_squared_error(y_train, y_knn1_predtr)
knn1_trMAE=mean_absolute_error(y_train, y_knn1_predtr)

knn1_vlscore=r2_score(y_val,y_knn1_predvl)
knn1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_knn1_predvl))
knn1_vlMSE=mean_squared_error(y_val, y_knn1_predvl)
knn1_vlMAE=mean_absolute_error(y_val, y_knn1_predvl)

knn1_df=pd.DataFrame({'Method':['knn1'],'Val Score':knn1_vlscore,'RMSE_vl': knn1_vlRMSE, 'MSE_vl': knn1_vlMSE, 'M


AE_vl': knn1_vlMAE,'train Score':knn1_trscore,'RMSE_tr': knn1_trRMSE, 'MSE_tr': knn1_trMSE, 'MAE_tr': knn1_trMAE}
)
Compa_df = pd.concat([Compa_df, knn1_df])

Compa_df

Out[147]:

Method Val Score RMSE_vl MSE_vl MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg Model1 0.718749 137733.698415 1.897057e+10 93994.455301 0.730112 132958.367261 1.767793e+10 92391.001786

0 Linear-Reg Lasso1 0.719117 137643.639712 1.894577e+10 93939.441186 0.730092 132963.180396 1.767921e+10 92403.854117

0 Linear-Reg Ridge1 0.718929 137689.597398 1.895843e+10 93992.809617 0.729789 133037.735155 1.769904e+10 92497.255174

0 knn1 0.425008 196935.451160 3.878357e+10 138494.383286 0.998628 9480.192071 8.987404e+07 887.708707


Though KNN regressor performed well in training set, the performance score in validation set is very less. This shows that the model is
overfitted in training set

Support vector regressor

In [148]:

from sklearn.svm import SVR

In [149]:

SVR1 = SVR(gamma='auto',C=10.0, epsilon=0.2,kernel='rbf')


SVR1.fit(X_train, y_train)

y_SVR1_predtr= SVR1.predict(X_train)
y_SVR1_predvl= SVR1.predict(X_val)

In [150]:

#Model score and Deduction for each Model in a DataFrame


SVR1_trscore=r2_score(y_train,y_SVR1_predtr)
SVR1_trRMSE=np.sqrt(mean_squared_error(y_train, y_SVR1_predtr))
SVR1_trMSE=mean_squared_error(y_train, y_SVR1_predtr)
SVR1_trMAE=mean_absolute_error(y_train, y_SVR1_predtr)

SVR1_vlscore=r2_score(y_val,y_SVR1_predvl)
SVR1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_SVR1_predvl))
SVR1_vlMSE=mean_squared_error(y_val, y_SVR1_predvl)
SVR1_vlMAE=mean_absolute_error(y_val, y_SVR1_predvl)

SVR1_df=pd.DataFrame({'Method':['SVR1'],'Val Score':SVR1_vlscore,'RMSE_vl': SVR1_vlRMSE, 'MSE_vl': SVR1_vlMSE, 'M


AE_vl': SVR1_vlMAE,'train Score':SVR1_trscore,'RMSE_tr': SVR1_trRMSE, 'MSE_tr': SVR1_trMSE, 'MAE_tr': SVR1_trMAE}
)
Compa_df = pd.concat([Compa_df, SVR1_df])

Compa_df

Out[150]:

Method Val Score RMSE_vl MSE_vl MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg Model1 0.718749 137733.698415 1.897057e+10 93994.455301 0.730112 132958.367261 1.767793e+10 92391.001786

0 Linear-Reg Lasso1 0.719117 137643.639712 1.894577e+10 93939.441186 0.730092 132963.180396 1.767921e+10 92403.854117

0 Linear-Reg Ridge1 0.718929 137689.597398 1.895843e+10 93992.809617 0.729789 133037.735155 1.769904e+10 92497.255174

0 knn1 0.425008 196935.451160 3.878357e+10 138494.383286 0.998628 9480.192071 8.987404e+07 887.708707

0 SVR1 -0.055489 266820.956555 7.119342e+10 183639.593215 -0.046405 261802.341726 6.854047e+10 179434.350170

The above negative scores in SVR model is due to non-learning of the model in the training set which results in non-performance in validation
set
In [151]:

SVR2 = SVR(gamma='auto',C=0.1,kernel='linear')
SVR2.fit(X_train, y_train)

y_SVR2_predtr= SVR2.predict(X_train)
y_SVR2_predvl= SVR2.predict(X_val)

#Model score and Deduction for each Model in a DataFrame


SVR2_trscore=r2_score(y_train,y_SVR2_predtr)
SVR2_trRMSE=np.sqrt(mean_squared_error(y_train, y_SVR2_predtr))
SVR2_trMSE=mean_squared_error(y_train, y_SVR2_predtr)
SVR2_trMAE=mean_absolute_error(y_train, y_SVR2_predtr)

SVR2_vlscore=r2_score(y_val,y_SVR2_predvl)
SVR2_vlRMSE=np.sqrt(mean_squared_error(y_val, y_SVR2_predvl))
SVR2_vlMSE=mean_squared_error(y_val, y_SVR2_predvl)
SVR2_vlMAE=mean_absolute_error(y_val, y_SVR2_predvl)

SVR2_df=pd.DataFrame({'Method':['SVR2'],'Val Score':SVR2_vlscore,'RMSE_vl': SVR2_vlRMSE, 'MSE_vl': SVR2_vlMSE, 'M


AE_vl': SVR2_vlMAE,'train Score':SVR2_trscore,'RMSE_tr': SVR2_trRMSE, 'MSE_tr': SVR2_trMSE, 'MAE_tr': SVR2_trMAE}
)
Compa_df = pd.concat([Compa_df, SVR2_df])

Compa_df

Out[151]:

Method Val Score RMSE_vl MSE_vl MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg Model1 0.718749 137733.698415 1.897057e+10 93994.455301 0.730112 132958.367261 1.767793e+10 92391.001786

0 Linear-Reg Lasso1 0.719117 137643.639712 1.894577e+10 93939.441186 0.730092 132963.180396 1.767921e+10 92403.854117

0 Linear-Reg Ridge1 0.718929 137689.597398 1.895843e+10 93992.809617 0.729789 133037.735155 1.769904e+10 92497.255174

0 knn1 0.425008 196935.451160 3.878357e+10 138494.383286 0.998628 9480.192071 8.987404e+07 887.708707

0 SVR1 -0.055489 266820.956555 7.119342e+10 183639.593215 -0.046405 261802.341726 6.854047e+10 179434.350170

0 SVR2 0.458252 191157.623415 3.654124e+10 132876.663665 0.454410 189041.408746 3.573665e+10 130250.868504

The SVR model with modified parameters has not performed well with just ~0.45 in both training and validation data sets

Decision Tree Regressor

In [152]:

from sklearn.tree import DecisionTreeRegressor


In [153]:

DT1 = DecisionTreeRegressor()
DT1.fit(X_train, y_train)

y_DT1_predtr= DT1.predict(X_train)
y_DT1_predvl= DT1.predict(X_val)

#Model score and Deduction for each Model in a DataFrame


DT1_trscore=r2_score(y_train,y_DT1_predtr)
DT1_trRMSE=np.sqrt(mean_squared_error(y_train, y_DT1_predtr))
DT1_trMSE=mean_squared_error(y_train, y_DT1_predtr)
DT1_trMAE=mean_absolute_error(y_train, y_DT1_predtr)

DT1_vlscore=r2_score(y_val,y_DT1_predvl)
DT1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_DT1_predvl))
DT1_vlMSE=mean_squared_error(y_val, y_DT1_predvl)
DT1_vlMAE=mean_absolute_error(y_val, y_DT1_predvl)

DT1_df=pd.DataFrame({'Method':['DT1'],'Val Score':DT1_vlscore,'RMSE_vl': DT1_vlRMSE, 'MSE_vl': DT1_vlMSE, 'MAE_vl


': DT1_vlMAE,'train Score':DT1_trscore,'RMSE_tr': DT1_trRMSE, 'MSE_tr': DT1_trMSE, 'MAE_tr': DT1_trMAE})
Compa_df = pd.concat([Compa_df, DT1_df])

Compa_df

Out[153]:

Method Val Score RMSE_vl MSE_vl MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg Model1 0.718749 137733.698415 1.897057e+10 93994.455301 0.730112 132958.367261 1.767793e+10 92391.001786

0 Linear-Reg Lasso1 0.719117 137643.639712 1.894577e+10 93939.441186 0.730092 132963.180396 1.767921e+10 92403.854117

0 Linear-Reg Ridge1 0.718929 137689.597398 1.895843e+10 93992.809617 0.729789 133037.735155 1.769904e+10 92497.255174

0 knn1 0.425008 196935.451160 3.878357e+10 138494.383286 0.998628 9480.192071 8.987404e+07 887.708707

0 SVR1 -0.055489 266820.956555 7.119342e+10 183639.593215 -0.046405 261802.341726 6.854047e+10 179434.350170

0 SVR2 0.458252 191157.623415 3.654124e+10 132876.663665 0.454410 189041.408746 3.573665e+10 130250.868504

0 DT1 0.542495 175667.376246 3.085903e+10 109891.238551 0.998628 9480.192071 8.987404e+07 887.708707

Above performance of initial Decision tree model shows overfit in training set with 0.99 score and low performance in validation set
In [154]:

DT2 = DecisionTreeRegressor(max_depth=10,min_samples_leaf=5)
DT2.fit(X_train, y_train)

y_DT2_predtr= DT2.predict(X_train)
y_DT2_predvl= DT2.predict(X_val)

#Model score and Deduction for each Model in a DataFrame


DT2_trscore=r2_score(y_train,y_DT2_predtr)
DT2_trRMSE=np.sqrt(mean_squared_error(y_train, y_DT2_predtr))
DT2_trMSE=mean_squared_error(y_train, y_DT2_predtr)
DT2_trMAE=mean_absolute_error(y_train, y_DT2_predtr)

DT2_vlscore=r2_score(y_val,y_DT2_predvl)
DT2_vlRMSE=np.sqrt(mean_squared_error(y_val, y_DT2_predvl))
DT2_vlMSE=mean_squared_error(y_val, y_DT2_predvl)
DT2_vlMAE=mean_absolute_error(y_val, y_DT2_predvl)

DT2_df=pd.DataFrame({'Method':['DT2'],'Val Score':DT2_vlscore,'RMSE_vl': DT2_vlRMSE, 'MSE_vl': DT2_vlMSE, 'MAE_vl


': DT2_vlMAE,'train Score':DT2_trscore,'RMSE_tr': DT2_trRMSE, 'MSE_tr': DT2_trMSE, 'MAE_tr': DT2_trMAE})
Compa_df = pd.concat([Compa_df, DT2_df])

Compa_df

Out[154]:

Method Val Score RMSE_vl MSE_vl MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg Model1 0.718749 137733.698415 1.897057e+10 93994.455301 0.730112 132958.367261 1.767793e+10 92391.001786

0 Linear-Reg Lasso1 0.719117 137643.639712 1.894577e+10 93939.441186 0.730092 132963.180396 1.767921e+10 92403.854117

0 Linear-Reg Ridge1 0.718929 137689.597398 1.895843e+10 93992.809617 0.729789 133037.735155 1.769904e+10 92497.255174

0 knn1 0.425008 196935.451160 3.878357e+10 138494.383286 0.998628 9480.192071 8.987404e+07 887.708707

0 SVR1 -0.055489 266820.956555 7.119342e+10 183639.593215 -0.046405 261802.341726 6.854047e+10 179434.350170

0 SVR2 0.458252 191157.623415 3.654124e+10 132876.663665 0.454410 189041.408746 3.573665e+10 130250.868504

0 DT1 0.542495 175667.376246 3.085903e+10 109891.238551 0.998628 9480.192071 8.987404e+07 887.708707

0 DT2 0.637513 156364.920550 2.444999e+10 102458.587308 0.794647 115977.718333 1.345083e+10 82537.840190

Above decision tree model with modified parameter has better performed on the training set and validation set compared to initial decision tree
model.But overall decision tree has not performed well than linear regression models.

In [155]:

sns.set(style="darkgrid", color_codes=True)

with sns.axes_style("white"):
sns.jointplot(x=y_val, y=y_DT2_predvl, kind="reg", color="k")

In summary, KNN regressor model and decision tree models have not performed well in comparison with linear regression models
Ensemble techniques

Boosting and Bagging

In [156]:

from sklearn.ensemble import GradientBoostingRegressor, BaggingRegressor

In [157]:

GB1=GradientBoostingRegressor(n_estimators = 200, learning_rate = 0.1, random_state=22)


GB1.fit(X_train, y_train)

y_GB1_predtr= GB1.predict(X_train)
y_GB1_predvl= GB1.predict(X_val)

#Model score and Deduction for each Model in a DataFrame


GB1_trscore=r2_score(y_train,y_GB1_predtr)
GB1_trRMSE=np.sqrt(mean_squared_error(y_train, y_GB1_predtr))
GB1_trMSE=mean_squared_error(y_train, y_GB1_predtr)
GB1_trMAE=mean_absolute_error(y_train, y_GB1_predtr)

GB1_vlscore=r2_score(y_val,y_GB1_predvl)
GB1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_GB1_predvl))
GB1_vlMSE=mean_squared_error(y_val, y_GB1_predvl)
GB1_vlMAE=mean_absolute_error(y_val, y_GB1_predvl)

GB1_df=pd.DataFrame({'Method':['GB1'],'Val Score':GB1_vlscore,'RMSE_vl': GB1_vlRMSE, 'MSE_vl': GB1_vlMSE, 'MAE_vl


': GB1_vlMAE,'train Score':GB1_trscore,'RMSE_tr': GB1_trRMSE, 'MSE_tr': GB1_trMSE, 'MAE_tr': GB1_trMAE})
Compa_df = pd.concat([Compa_df, GB1_df])

Compa_df

Out[157]:

Method Val Score RMSE_vl MSE_vl MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg Model1 0.718749 137733.698415 1.897057e+10 93994.455301 0.730112 132958.367261 1.767793e+10 92391.001786

0 Linear-Reg Lasso1 0.719117 137643.639712 1.894577e+10 93939.441186 0.730092 132963.180396 1.767921e+10 92403.854117

0 Linear-Reg Ridge1 0.718929 137689.597398 1.895843e+10 93992.809617 0.729789 133037.735155 1.769904e+10 92497.255174

0 knn1 0.425008 196935.451160 3.878357e+10 138494.383286 0.998628 9480.192071 8.987404e+07 887.708707

0 SVR1 -0.055489 266820.956555 7.119342e+10 183639.593215 -0.046405 261802.341726 6.854047e+10 179434.350170

0 SVR2 0.458252 191157.623415 3.654124e+10 132876.663665 0.454410 189041.408746 3.573665e+10 130250.868504

0 DT1 0.542495 175667.376246 3.085903e+10 109891.238551 0.998628 9480.192071 8.987404e+07 887.708707

0 DT2 0.637513 156364.920550 2.444999e+10 102458.587308 0.794647 115977.718333 1.345083e+10 82537.840190

0 GB1 0.782471 121129.989228 1.467247e+10 82824.319932 0.820821 108334.766538 1.173642e+10 76533.619644

Gradient boosting model has provided good scores in both training and validation sets
In [158]:

BGG1=BaggingRegressor(n_estimators=50, oob_score= True,random_state=14)


BGG1.fit(X_train, y_train)

y_BGG1_predtr= BGG1.predict(X_train)
y_BGG1_predvl= BGG1.predict(X_val)

#Model score and Deduction for each Model in a DataFrame


BGG1_trscore=r2_score(y_train,y_BGG1_predtr)
BGG1_trRMSE=np.sqrt(mean_squared_error(y_train, y_BGG1_predtr))
BGG1_trMSE=mean_squared_error(y_train, y_BGG1_predtr)
BGG1_trMAE=mean_absolute_error(y_train, y_BGG1_predtr)

BGG1_vlscore=r2_score(y_val,y_BGG1_predvl)
BGG1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_BGG1_predvl))
BGG1_vlMSE=mean_squared_error(y_val, y_BGG1_predvl)
BGG1_vlMAE=mean_absolute_error(y_val, y_BGG1_predvl)

BGG1_df=pd.DataFrame({'Method':['BGG1'],'Val Score':BGG1_vlscore,'RMSE_vl': BGG1_vlRMSE, 'MSE_vl':BGG1_vlMSE, 'MA


E_vl': BGG1_vlMAE,'train Score':BGG1_trscore,'RMSE_tr': BGG1_trRMSE, 'MSE_tr': BGG1_trMSE, 'MAE_tr': BGG1_trMAE})
Compa_df = pd.concat([Compa_df, BGG1_df])

Compa_df

Out[158]:

Method Val Score RMSE_vl MSE_vl MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg Model1 0.718749 137733.698415 1.897057e+10 93994.455301 0.730112 132958.367261 1.767793e+10 92391.001786

0 Linear-Reg Lasso1 0.719117 137643.639712 1.894577e+10 93939.441186 0.730092 132963.180396 1.767921e+10 92403.854117

0 Linear-Reg Ridge1 0.718929 137689.597398 1.895843e+10 93992.809617 0.729789 133037.735155 1.769904e+10 92497.255174

0 knn1 0.425008 196935.451160 3.878357e+10 138494.383286 0.998628 9480.192071 8.987404e+07 887.708707

0 SVR1 -0.055489 266820.956555 7.119342e+10 183639.593215 -0.046405 261802.341726 6.854047e+10 179434.350170

0 SVR2 0.458252 191157.623415 3.654124e+10 132876.663665 0.454410 189041.408746 3.573665e+10 130250.868504

0 DT1 0.542495 175667.376246 3.085903e+10 109891.238551 0.998628 9480.192071 8.987404e+07 887.708707

0 DT2 0.637513 156364.920550 2.444999e+10 102458.587308 0.794647 115977.718333 1.345083e+10 82537.840190

0 GB1 0.782471 121129.989228 1.467247e+10 82824.319932 0.820821 108334.766538 1.173642e+10 76533.619644

0 BGG1 0.769319 124738.101557 1.555959e+10 80102.360544 0.966466 46867.181534 2.196533e+09 29441.780117

Bagging model also performed well in training and validation sets.There seems to be overfitting in training set. We need to analyse further by
hypertuning

Random forest

In [159]:

from sklearn.ensemble import RandomForestRegressor


In [160]:

RF1=RandomForestRegressor()
RF1.fit(X_train, y_train)

y_RF1_predtr= RF1.predict(X_train)
y_RF1_predvl= RF1.predict(X_val)

#Model score and Deduction for each Model in a DataFrame


RF1_trscore=r2_score(y_train,y_RF1_predtr)
RF1_trRMSE=np.sqrt(mean_squared_error(y_train, y_RF1_predtr))
RF1_trMSE=mean_squared_error(y_train, y_RF1_predtr)
RF1_trMAE=mean_absolute_error(y_train, y_RF1_predtr)

RF1_vlscore=r2_score(y_val,y_RF1_predvl)
RF1_vlRMSE=np.sqrt(mean_squared_error(y_val, y_RF1_predvl))
RF1_vlMSE=mean_squared_error(y_val, y_RF1_predvl)
RF1_vlMAE=mean_absolute_error(y_val, y_RF1_predvl)

RF1_df=pd.DataFrame({'Method':['RF1'],'Val Score':RF1_vlscore,'RMSE_vl': RF1_vlRMSE, 'MSE_vl':RF1_vlMSE, 'MAE_vl'


: RF1_vlMAE,'train Score':RF1_trscore,'RMSE_tr': RF1_trRMSE, 'MSE_tr': RF1_trMSE, 'MAE_tr': RF1_trMAE})
Compa_df = pd.concat([Compa_df, RF1_df])

Compa_df

Out[160]:

Method Val Score RMSE_vl MSE_vl MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg Model1 0.718749 137733.698415 1.897057e+10 93994.455301 0.730112 132958.367261 1.767793e+10 92391.001786

0 Linear-Reg Lasso1 0.719117 137643.639712 1.894577e+10 93939.441186 0.730092 132963.180396 1.767921e+10 92403.854117

0 Linear-Reg Ridge1 0.718929 137689.597398 1.895843e+10 93992.809617 0.729789 133037.735155 1.769904e+10 92497.255174

0 knn1 0.425008 196935.451160 3.878357e+10 138494.383286 0.998628 9480.192071 8.987404e+07 887.708707

0 SVR1 -0.055489 266820.956555 7.119342e+10 183639.593215 -0.046405 261802.341726 6.854047e+10 179434.350170

0 SVR2 0.458252 191157.623415 3.654124e+10 132876.663665 0.454410 189041.408746 3.573665e+10 130250.868504

0 DT1 0.542495 175667.376246 3.085903e+10 109891.238551 0.998628 9480.192071 8.987404e+07 887.708707

0 DT2 0.637513 156364.920550 2.444999e+10 102458.587308 0.794647 115977.718333 1.345083e+10 82537.840190

0 GB1 0.782471 121129.989228 1.467247e+10 82824.319932 0.820821 108334.766538 1.173642e+10 76533.619644

0 BGG1 0.769319 124738.101557 1.555959e+10 80102.360544 0.966466 46867.181534 2.196533e+09 29441.780117

0 RF1 0.754483 128686.871977 1.656031e+10 82901.717082 0.954362 54674.891629 2.989344e+09 33099.588581

Random forest model has performed well in training and validation set. There is scope of further analysis on this model

Enseble models: in summary ensemble models have performed well on training and validation sets. These models will be selected for further
analysis with hypertuning and feature selection
In [161]:

#feature importance
rf_imp_feature_1=pd.DataFrame(RF1.feature_importances_, columns = ["Imp"], index = X_val.columns)
rf_imp_feature_1.sort_values(by="Imp",ascending=False)
rf_imp_feature_1['Imp'] = rf_imp_feature_1['Imp'].map('{0:.5f}'.format)
rf_imp_feature_1=rf_imp_feature_1.sort_values(by="Imp",ascending=False)
rf_imp_feature_1.Imp=rf_imp_feature_1.Imp.astype("float")

rf_imp_feature_1[:30].plot.bar(figsize=(plotSizeX, plotSizeY))

#First 20 features have an importance of 90.5% and first 30 have importance of 95.15
print("First 20 feature importance:\t",(rf_imp_feature_1[:20].sum())*100)
print("First 30 feature importance:\t",(rf_imp_feature_1[:30].sum())*100)

First 20 feature importance: Imp 90.184


dtype: float64
First 30 feature importance: Imp 95.098
dtype: float64

Above are top 30 important features that account for 95% of variation in model. This need to be further analysed during hypertuning of the
models for better scores

Model performance Summary:

Ensemble methods are performing better than linear models. Of all the ensemble models, Gradient boosting regressor is giving better R2
score. we identified top 30 features that are explaining the 95% variation in model(Random Forest). Will further hypertune the model to improve
the model performance. Will further explore and evaluate the features while hyperturning the ensemble models

Building Function/Pipeline for models


In [162]:

rf_imp_feature_1[:30]

Out[162]:

Imp

furnished_1 0.28448

yr_built 0.14227

living_measure 0.09463

living_measure15 0.06691

quality_8 0.05062

HouseLandRatio 0.04008

lot_measure15 0.03731

City_Bellevue 0.02532

ceil_measure 0.02459

quality_9 0.02049

total_area 0.01527

lot_measure 0.01319

City_Seattle 0.01268

City_Kirkland 0.01245

City_Federal Way 0.01224

City_Kent 0.01089

City_Mercer Island 0.01047

sight_4 0.00945

quality_7 0.00942

basement 0.00908

City_Redmond 0.00830

coast_1 0.00648

City_Medina 0.00556

quality_10 0.00545

City_Renton 0.00521

room_bed_4 0.00393

City_Maple Valley 0.00388

City_Sammamish 0.00379

sight_3 0.00351

City_Issaquah 0.00303

In [163]:

from sklearn.pipeline import Pipeline

In [164]:

def result (model,pipe_model,X_train_set,y_train_set,X_val_set,y_val_set):


pipe_model.fit(X_train_set,y_train_set)
#predicting result over test data
y_train_predict= pipe_model.predict(X_train_set)
y_val_predict= pipe_model.predict(X_val_set)

trscore=r2_score(y_train_set,y_train_predict)
trRMSE=np.sqrt(mean_squared_error(y_train_set,y_train_predict))
trMSE=mean_squared_error(y_train_set,y_train_predict)
trMAE=mean_absolute_error(y_train_set,y_train_predict)

vlscore=r2_score(y_val,y_val_predict)
vlRMSE=np.sqrt(mean_squared_error(y_val,y_val_predict))
vlMSE=mean_squared_error(y_val,y_val_predict)
vlMAE=mean_absolute_error(y_val,y_val_predict)
result_df=pd.DataFrame({'Method':[model],'val score':vlscore,'RMSE_val':vlRMSE,'MSE_val':vlMSE,'MSE_vl': vlMS
E,
'train Score':trscore,'RMSE_tr': trRMSE,'MSE_tr': trMSE, 'MAE_tr': trMAE})
return result_df

Above function will run the model and return the r2 score,rmse,mse of the model
In [165]:

#Creating empty dataframe to capture results


result_dff=pd.DataFrame()
pipe_LR = Pipeline([('LR', LinearRegression())])
result_dff=pd.concat([result_dff,result('LR',pipe_LR,X_train,y_train,X_val,y_val)])

pipe_knr = Pipeline([('KNNR', KNeighborsRegressor(n_neighbors=4,weights='distance'))])


result_dff=pd.concat([result_dff,result('KNNR',pipe_knr,X_train,y_train,X_val,y_val)])

pipe_DTR = Pipeline([('DTR', DecisionTreeRegressor())])


result_dff=pd.concat([result_dff,result('DTR',pipe_DTR,X_train,y_train,X_val,y_val)])

pipe_GBR = Pipeline([('GBR', GradientBoostingRegressor(n_estimators = 200, learning_rate = 0.1, random_state=22))


])
result_dff=pd.concat([result_dff,result('GBR',pipe_GBR,X_train,y_train,X_val,y_val)])

pipe_BGR = Pipeline([('BGR', BaggingRegressor(n_estimators=50, oob_score= True,random_state=14))])


result_dff=pd.concat([result_dff,result('BGR',pipe_BGR,X_train,y_train,X_val,y_val)])

pipe_RFR = Pipeline([('RFR', RandomForestRegressor())])


result_dff=pd.concat([result_dff,result('RFR',pipe_RFR,X_train,y_train,X_val,y_val)])

result_dff

Out[165]:

Method val score RMSE_val MSE_val MSE_vl train Score RMSE_tr MSE_tr MAE_tr

0 LR 0.718749 137733.698415 1.897057e+10 1.897057e+10 0.730112 132958.367261 1.767793e+10 92391.001786

0 KNNR 0.425008 196935.451160 3.878357e+10 3.878357e+10 0.998628 9480.192071 8.987404e+07 887.708707

0 DTR 0.537219 176677.375867 3.121490e+10 3.121490e+10 0.998628 9480.192071 8.987404e+07 887.708707

0 GBR 0.782471 121129.989228 1.467247e+10 1.467247e+10 0.820821 108334.766538 1.173642e+10 76533.619644

0 BGR 0.769319 124738.101557 1.555959e+10 1.555959e+10 0.966466 46867.181534 2.196533e+09 29441.780117

0 RFR 0.757473 127900.773592 1.635861e+10 1.635861e+10 0.955380 54061.682258 2.922665e+09 32834.525684

Above sequence of steps with pipeline function will run all the models and compile the scores in result_dff dataframe. We can see that the
above 2 steps are concise instead of running individual models and compiling the scores as earlier.

We can clearly see gradient boosting is giving better result in comparison with other ensemble methods. Also the score of 0.82 on training set
indicates no overfitting of the model

In [166]:

#Storing results of initial data set - dff

result_ds1=result_dff.copy()
result_ds1

Out[166]:

Method val score RMSE_val MSE_val MSE_vl train Score RMSE_tr MSE_tr MAE_tr

0 LR 0.718749 137733.698415 1.897057e+10 1.897057e+10 0.730112 132958.367261 1.767793e+10 92391.001786

0 KNNR 0.425008 196935.451160 3.878357e+10 3.878357e+10 0.998628 9480.192071 8.987404e+07 887.708707

0 DTR 0.537219 176677.375867 3.121490e+10 3.121490e+10 0.998628 9480.192071 8.987404e+07 887.708707

0 GBR 0.782471 121129.989228 1.467247e+10 1.467247e+10 0.820821 108334.766538 1.173642e+10 76533.619644

0 BGR 0.769319 124738.101557 1.555959e+10 1.555959e+10 0.966466 46867.181534 2.196533e+09 29441.780117

0 RFR 0.757473 127900.773592 1.635861e+10 1.635861e+10 0.955380 54061.682258 2.922665e+09 32834.525684

FEATURE SELECTION (PCA)

Now, we will explore the possibility of features reduction using PCA

In [167]:

dff.shape

Out[167]:

(18287, 91)
In [168]:

dff.columns

Out[168]:

Index(['price', 'living_measure', 'lot_measure', 'ceil_measure', 'basement',


'yr_built', 'living_measure15', 'lot_measure15', 'total_area',
'HouseLandRatio', 'room_bed_1', 'room_bed_2', 'room_bed_3',
'room_bed_4', 'room_bed_5', 'room_bed_6', 'room_bed_7', 'room_bed_8',
'room_bed_9', 'room_bed_10', 'room_bed_11', 'room_bath_0.5',
'room_bath_0.75', 'room_bath_1.0', 'room_bath_1.25', 'room_bath_1.5',
'room_bath_1.75', 'room_bath_2.0', 'room_bath_2.25', 'room_bath_2.5',
'room_bath_2.75', 'room_bath_3.0', 'room_bath_3.25', 'room_bath_3.5',
'room_bath_3.75', 'room_bath_4.0', 'room_bath_4.25', 'room_bath_4.5',
'room_bath_4.75', 'room_bath_5.0', 'room_bath_5.25', 'room_bath_5.75',
'ceil_1.5', 'ceil_2.0', 'ceil_2.5', 'ceil_3.0', 'ceil_3.5', 'coast_1',
'sight_1', 'sight_2', 'sight_3', 'sight_4', 'condition_2',
'condition_3', 'condition_4', 'condition_5', 'quality_4', 'quality_5',
'quality_6', 'quality_7', 'quality_8', 'quality_9', 'quality_10',
'quality_11', 'quality_12', 'furnished_1', 'City_Bellevue',
'City_Black Diamond', 'City_Bothell', 'City_Carnation', 'City_Duvall',
'City_Enumclaw', 'City_Fall City', 'City_Federal Way', 'City_Issaquah',
'City_Kenmore', 'City_Kent', 'City_Kirkland', 'City_Maple Valley',
'City_Medina', 'City_Mercer Island', 'City_North Bend', 'City_Redmond',
'City_Renton', 'City_Sammamish', 'City_Seattle', 'City_Snoqualmie',
'City_Vashon', 'City_Woodinville', 'has_basement_Yes',
'has_renovated_Yes'],
dtype='object')

will drop the price column as it is the target variable

In [169]:

df_pca = dff.drop(['price'], axis = 1)

In [170]:

numerical_cols = df_pca.copy()

numerical_cols.shape

Out[170]:

(18287, 90)

In [171]:

# Let's first transform the entire X (independent variable data) to zscores.


# We will create the PCA dimensions on this distribution.
from scipy.stats import zscore

# As PCA for Independent columns of Numerical types, let's pass numerical_cols (16 numerical features)
numerical_cols = numerical_cols.apply(zscore)

cov_matrix = np.cov(numerical_cols.T)
print('Covariance Matrix \n%s', cov_matrix)

Covariance Matrix
%s [[ 1.00005469 0.20028185 0.84597846 ... 0.01415428 0.20094885
0.05257785]
[ 0.20028185 1.00005469 0.1663024 ... 0.08035946 -0.02988448
-0.00617414]
[ 0.84597846 0.1663024 1.00005469 ... 0.01649371 -0.27730605
0.01739462]
...
[ 0.01415428 0.08035946 0.01649371 ... 1.00005469 -0.0056238
-0.01445085]
[ 0.20094885 -0.02988448 -0.27730605 ... -0.0056238 1.00005469
0.04524435]
[ 0.05257785 -0.00617414 0.01739462 ... -0.01445085 0.04524435
1.00005469]]

As we can see, near the value to 1, more the features related.


In [172]:

eigenvalues, eigenvectors = np.linalg.eig(cov_matrix)


print('Eigen Vectors \n%s', eigenvectors)
print('\n Eigen Values \n%s', eigenvalues)

Eigen Vectors
%s [[ 3.38140157e-01 -5.91272225e-02 2.10933458e-01 ... -5.27282174e-03
5.54142192e-03 -1.67124034e-04]
[ 7.12659835e-02 -4.34121260e-01 -8.88436080e-02 ... -1.68818774e-02
4.57078107e-03 -8.52995967e-03]
[ 3.49772357e-01 -8.00383876e-03 -4.05156781e-02 ... -4.68496505e-03
-1.39683527e-02 2.43243338e-03]
...
[ 1.29688720e-02 -3.80398560e-02 -3.41813657e-02 ... 1.07174062e-01
8.41737918e-02 -1.95413367e-01]
[-2.50075399e-02 -3.74622282e-02 4.39476539e-01 ... 3.99982110e-03
4.08181763e-02 2.32617269e-02]
[-3.18537004e-03 -6.23000043e-04 1.02661626e-01 ... 2.84579669e-02
-1.47963772e-02 -6.62692406e-02]]

Eigen Values
%s [ 6.40030103e+00 4.23053272e+00 3.02200570e+00 2.36069955e+00
1.72278028e+00 1.70533047e+00 5.17634008e-02 7.84864255e-02
1.23323929e-01 1.58239483e+00 1.94704947e-01 2.10588552e-01
2.45372409e-01 3.37764061e-01 3.52383334e-01 2.24756725e-03
9.93351422e-04 1.28503648e-04 8.54683326e-05 1.51669793e+00
3.97816689e-01 1.48400510e+00 4.25049450e-01 -5.20329656e-16
-1.83229560e-15 3.57406920e-15 1.39212554e+00 1.33812387e+00
5.71411667e-01 6.48215227e-01 6.60453404e-01 1.27455883e+00
6.90208644e-01 7.30900855e-01 1.22358633e+00 1.21781188e+00
7.54916613e-01 7.61951753e-01 7.89272221e-01 1.19439921e+00
1.18354682e+00 8.08765828e-01 8.31761100e-01 1.17521503e+00
1.16073113e+00 8.62975337e-01 1.14847039e+00 8.79158894e-01
1.11948938e+00 1.10960276e+00 8.90644524e-01 8.88567656e-01
9.01761603e-01 1.10493861e+00 9.16012433e-01 9.31041146e-01
1.09143428e+00 1.08460485e+00 1.08273453e+00 1.07118893e+00
9.33793856e-01 1.06368893e+00 9.41694315e-01 9.44273389e-01
9.49801385e-01 9.52927340e-01 1.05455290e+00 1.04955645e+00
1.04815072e+00 1.04163633e+00 9.69808694e-01 1.03813696e+00
1.03345195e+00 1.02768165e+00 1.02381893e+00 9.82887562e-01
9.81254198e-01 9.86986081e-01 1.01521697e+00 9.89390243e-01
9.93680575e-01 9.93261992e-01 1.00195909e+00 1.01363137e+00
1.01189827e+00 1.00051625e+00 1.00419597e+00 1.00622516e+00
1.00552678e+00 1.00928050e+00]

In [173]:
# Let's Sort eigenvalues in descending order

# Make a set of (eigenvalue, eigenvector) pairs


eig_pairs = [(eigenvalues[index], eigenvectors[:,index]) for index in range(len(eigenvalues))]

# Sort the (eigenvalue, eigenvector) pairs from highest to lowest with respect to eigenvalue
eig_pairs.sort()

eig_pairs.reverse()
print(eig_pairs)

# Extract the descending ordered eigenvalues and eigenvectors


eigvalues_sorted = [eig_pairs[index][0] for index in range(len(eigenvalues))]
eigvectors_sorted = [eig_pairs[index][1] for index in range(len(eigenvalues))]

# Let's confirm our sorting worked, print out eigenvalues


print('Eigenvalues in descending order: \n%s' %eigvalues_sorted)

[(6.400301029851477, array([ 3.38140157e-01, 7.12659835e-02, 3.49772357e-01, 1.21826278e-02,


2.45688226e-01, 3.13509594e-01, 5.85711636e-02, 1.32118243e-01,
1.24188241e-01, -5.39557358e-02, -1.55322767e-01, -7.38082454e-02,
1.69280354e-01, 6.71341866e-02, 1.76807150e-02, 8.40382554e-03,
2.25638942e-03, 3.61933268e-03, 1.42331836e-03, 5.17260476e-04,
-5.63803984e-03, -2.85596080e-02, -2.20914937e-01, -2.56073225e-03,
-6.59873013e-02, -6.94849495e-02, -5.10691490e-02, 2.99009064e-02,
1.96828062e-01, 7.27758185e-02, 4.47037234e-02, 6.43544191e-02,
8.98255374e-02, 3.57441062e-02, 2.39042699e-02, 1.82629160e-02,
2.35903040e-02, 9.52118613e-03, 6.36445584e-03, 6.71514053e-03,
4.22319789e-03, -8.25472478e-02, 2.65712150e-01, 2.24314600e-02,
-1.44008864e-03, -1.59380115e-04, 1.36351362e-02, 1.21584859e-02,
2.79651538e-02, 3.46589095e-02, 2.87868177e-02, -3.59443998e-02,
1.23735329e-01, -9.23789546e-02, -5.48045043e-02, -2.04647878e-02,
-5.44353495e-02, -1.55877412e-01, -1.66909198e-01, 9.78807036e-02,
2.09856238e-01, 1.36794862e-01, 5.65981168e-02, 8.75831765e-03,
2.64286457e-01, 4.83149865e-02, -8.63848741e-03, 2.05515336e-02,
-1.04425520e-02, 2.46487074e-02, -1.36322635e-02, -8.34913899e-03,
5.58571616e-03, 5.97895799e-02, 1.67219076e-02, 1.26648246e-02,
1.73286710e-02, 4.02906127e-02, 1.35579146e-02, 4.06324757e-02,
3.20240017e-03, 5.54293286e-02, 2.79650623e-02, 1.07319938e-01,
-1.80305747e-01, 5.49209909e-02, -5.88852015e-03, 1.29688720e-02,
-2.50075399e-02, -3.18537004e-03])), (4.23053271502051, array([-0.05912722, -0.43412126, -0.0
0800384, -0.09492954, 0.13056396,
-0.09447739, -0.370925 , -0.42105132, 0.38853903, 0.00818031,
0.09903886, -0.00322975, -0.05039931, -0.0397953 , -0.01135684,
-0.00195537, 0.00187349, 0.00561416, -0.00403777, 0.00329149,
0.00403127, -0.01266106, -0.02264596, 0.00982359, 0.00641089,
-0.1047631 , -0.01951081, -0.01220764, 0.1019734 , -0.03086617,
0.0212307 , 0.04710655, 0.04081646, -0.00117921, 0.00207246,
0.00066914, -0.00061282, -0.00468101, -0.00095167, 0.00073386,
0.0008508 , -0.00237293, 0.15398148, 0.03143851, 0.20703993,
0.0235899 , -0.04163645, -0.02784399, -0.01428115, -0.01787219,
-0.04086905, -0.02070046, 0.21897162, -0.20078997, -0.05272298,
-0.00838391, -0.0126264 , -0.02109097, -0.07477947, 0.10406576,
0.01781302, -0.03510238, -0.02017662, -0.00081656, -0.00543061,
-0.12229314, -0.02405586, -0.01865954, -0.03518784, -0.00616237,
-0.02599962, -0.02240527, -0.06022892, 0.0647382 , -0.03107846,
-0.03563074, -0.05546802, 0.03231613, -0.02937766, -0.07517532,
-0.03530176, 0.00217373, -0.04298141, -0.03445572, 0.1795959 ,
0.03400426, -0.04414761, -0.03803986, -0.03746223, -0.000623 ])), (3.0220056962108997, arra
y([ 0.21093346, -0.08884361, -0.04051568, 0.45923331, -0.20948573,
0.06094638, -0.08769888, -0.04338803, 0.18999279, -0.03429171,
-0.04303535, -0.14081992, 0.09630094, 0.14413304, 0.08075213,
0.03087204, 0.0216305 , 0.02874812, 0.01352604, 0.01099741,
-0.00046969, -0.0253926 , -0.12810462, 0.00801375, -0.00546132,
0.08028745, 0.05174082, 0.02460706, -0.17141934, 0.10063382,
0.10513547, 0.08968952, 0.09751819, 0.04988048, 0.04180795,
0.0360652 , 0.04631527, 0.01510422, 0.00364895, 0.00749181,
0.00784258, 0.14066273, -0.13476605, 0.0508906 , 0.01324093,
0.00143509, 0.04433479, 0.0745123 , 0.11139792, 0.10282392,
0.08280422, -0.02017733, -0.15120561, 0.08089571, 0.1413166 ,
-0.03014322, -0.04765138, -0.07345488, -0.00271769, 0.02248996,
0.03418131, 0.03990794, 0.02663322, 0.01611412, 0.05546425,
0.03544567, -0.03015266, -0.01667398, -0.0401713 , -0.05250393,
-0.04129285, -0.02118915, -0.04979088, -0.04044887, -0.021319 ,
-0.10458144, -0.01955768, -0.10198589, 0.00537009, 0.0366488 ,
-0.056401 , -0.05468041, -0.08192034, -0.05254563, 0.26984296,
-0.05563353, -0.00285831, -0.03418137, 0.43947654, 0.10266163])), (2.3606995503606907, arra
y([-0.00720377, 0.00224618, 0.07573216, -0.14547718, -0.20077718,
0.05062147, 0.01439989, 0.00073809, -0.07253922, 0.14716631,
0.26402065, -0.27661743, 0.05624118, 0.02404274, 0.00526146,
0.01216159, -0.00804137, 0.00257074, -0.00579735, 0.0020183 ,
0.01828875, 0.09307033, 0.32371789, 0.01903027, -0.05286308,
-0.17577786, -0.06178217, -0.14789745, -0.06749212, 0.02029172,
0.01273173, 0.05388027, 0.072458 , 0.05551253, 0.03184281,
0.03028663, 0.02775791, 0.0149579 , -0.00807084, 0.00573534,
0.00706833, 0.10759155, 0.00949538, 0.0310457 , -0.05454233,
-0.00721205, 0.10292123, 0.0113677 , 0.03206102, 0.04935804,
0.11456154, 0.0672181 , 0.02340023, -0.05185769, 0.01734833,
0.07277432, 0.13346116, 0.2975113 , -0.2129159 , -0.23715036,
0.24099786, 0.16208952, 0.06896703, 0.00896484, 0.30668776,
-0.03052305, 0.01273904, -0.03055353, 0.01560851, -0.02733405,
-0.00267958, 0.02118784, -0.08928568, -0.02238202, -0.04074563,
-0.10262501, -0.05898896, -0.06153782, 0.01533014, 0.04045805,
-0.03335646, -0.03231935, -0.03451797, 0.10004965, 0.16962266,
0.00513287, 0.03516863, -0.02673597, -0.16413819, 0.08959676])), (1.7227802802980245, arra
y([-2.75891943e-02, 6.22663979e-02, -5.87154230e-02, 5.16876256e-02,
4.84901490e-02, 1.72202394e-02, 1.03722364e-01, 5.34730987e-02,
2.21522986e-04, 1.75531777e-01, 7.48912126e-02, -2.40660821e-02,
-6.99276559e-02, -1.38633482e-02, 3.87175860e-02, 1.78451840e-02,
-6.46767863e-03, 1.31501431e-02, -1.18895907e-02, -2.37987660e-03,
2.57882460e-02, 2.03529963e-01, -2.22415917e-02, 4.43677778e-02,
-7.22222733e-02, -9.70041136e-03, -8.15912951e-02, 1.25976552e-01,
1.12053923e-02, -1.57737740e-02, 7.36220296e-03, 1.20741003e-02,
-1.82694555e-02, -3.03469013e-03, 2.37479559e-02, -1.61044922e-03,
6.17820745e-02, -5.39089932e-03, 1.58457297e-02, -9.48287976e-04,
2.06938171e-02, -1.19344878e-01, -1.05065660e-02, -8.78509029e-03,
1.16979339e-01, 1.86698328e-02, 3.82937199e-01, 2.08740410e-02,
1.00421405e-02, 3.70161361e-02, 3.50285374e-01, 4.35659169e-02,
8.36701642e-02, -5.79924748e-02, -6.94195526e-02, 1.50096333e-01,
1.05193372e-01, 8.15102561e-02, -3.43044936e-01, 4.55724591e-01,
-2.36634142e-01, -3.34803092e-02, 3.91109747e-02, -3.62507673e-03,
-2.18796507e-01, 2.77370246e-02, 1.42157352e-02, 2.65573237e-03,
5.98256022e-02, 3.65080662e-02, -7.91148709e-03, 3.45710951e-02,
4.41525866e-02, 3.01403507e-02, 2.24997611e-02, -6.09415475e-02,
-5.32191501e-02, -3.87528058e-02, 6.06045961e-03, 2.43959995e-02,
1.73155089e-02, 2.91906845e-02, -1.62743839e-02, -1.11980582e-02,
-5.78750293e-03, -2.82725675e-02, 1.74298356e-01, 4.41241036e-03,
5.27653264e-02, 8.14799322e-02])), (1.7053304684180373, array([-0.04304967, 0.03835547, -0
.049328 , 0.00682403, 0.06603838,
-0.00678647, 0.07094793, 0.0279412 , 0.01134168, -0.03214826,
-0.1074207 , 0.44510897, -0.38919037, -0.0100558 , -0.00243799,
-0.01259858, -0.00250479, 0.01319458, -0.0099384 , 0.00592628,
-0.02191463, -0.00796351, -0.02695272, 0.04413847, 0.07172762,
0.10503604, 0.00454758, -0.01146529, -0.12597964, -0.10725771,
-0.01547844, 0.12430449, 0.07892846, 0.04778133, 0.04080463,
0.0258032 , 0.044399 , 0.01198528, -0.00248636, 0.01644248,
-0.00839736, -0.10453813, -0.05859151, 0.02176298, 0.14312427,
-0.00625977, 0.25531019, 0.00308563, 0.02034035, 0.07405128,
0.25196789, -0.03486212, 0.21168428, -0.15449296, -0.11058998,
-0.02796998, -0.07348146, -0.09426926, 0.18688518, -0.28658732,
0.12769751, 0.15962844, 0.10054937, 0.0230538 , 0.21152038,
-0.03815552, 0.00385181, 0.01635949, 0.02444802, -0.0144477 ,
-0.00507272, 0.00141885, -0.00511901, -0.0331081 , -0.01035132,
-0.03981033, 0.05873867, -0.04542492, 0.01465507, 0.01155537,
0.03969387, -0.04650538, -0.12865868, 0.08653509, 0.08155747,
-0.01284676, 0.07491223, 0.04722783, 0.05148136, 0.08652505])), (1.5823948341725016, arra
y([-0.02579091, -0.10323643, 0.06233093, -0.15631637, -0.01142927,
0.04732042, -0.07678036, -0.10237832, 0.11412126, -0.06317361,
-0.05834357, 0.22674486, -0.12169029, -0.08095512, -0.09161384,
-0.00748711, -0.06773762, -0.05158664, 0.0104614 , -0.03208893,
-0.01127987, -0.07419635, -0.03121154, 0.04688368, 0.04296296,
0.01776535, -0.01940218, 0.10655432, 0.007984 , -0.06009444,
-0.08995488, 0.07290427, -0.04851162, -0.00111173, -0.03052541,
-0.00506415, -0.04276741, 0.01211364, -0.06485362, -0.03264406,
-0.01695773, 0.06618993, -0.01794634, 0.07672332, 0.18427271,
0.00645225, 0.1060034 , -0.02057187, -0.0020446 , 0.02221682,
0.11947425, -0.01927545, -0.48625332, 0.48400686, 0.0765737 ,
-0.07828707, 0.00855355, 0.04808957, -0.20662059, 0.08991486,
0.13986082, 0.02992472, 0.01678794, -0.00525954, 0.14152616,
0.17926009, 0.00538621, -0.05599342, -0.0716547 , -0.08164325,
0.02650792, -0.0351676 , 0.00165846, 0.09514562, -0.07842145,
0.03097681, -0.03501136, -0.0624873 , -0.00370757, 0.11001634,
-0.06511834, 0.00478484, 0.0232029 , -0.00802633, -0.05344581,
-0.03140204, 0.02211096, -0.04893219, -0.14940765, -0.12861492])), (1.5166979298471606, arra
y([ 0.03512953, -0.05998021, 0.10666813, -0.12151286, -0.0823675 ,
-0.01772739, -0.04739566, -0.04986913, 0.017378 , -0.07612813,
-0.06798112, -0.12944801, 0.22644149, -0.07273589, 0.06650317,
0.02817224, 0.02324206, 0.03927625, 0.01264389, 0.0187637 ,
-0.02279888, -0.08033275, -0.08179817, 0.02656226, 0.07572413,
-0.03952647, 0.13790474, -0.1450146 , 0.1294266 , 0.00847142,
0.01929373, -0.1169406 , -0.12399375, 0.02649847, 0.03841585,
-0.01188958, 0.08773861, -0.01351911, 0.0249923 , 0.01544068,
0.01212473, 0.23575178, 0.08854544, -0.02308546, -0.08859317,
-0.00819852, 0.41200847, -0.10110496, -0.11640363, -0.04605829,
0.39971337, -0.05494921, -0.07489579, -0.00541504, 0.15577549,
-0.12236071, -0.02983097, -0.15292934, 0.2874068 , -0.11035494,
-0.10694081, -0.04685742, 0.03030548, -0.03453339, -0.11326956,
-0.07205484, -0.02058402, -0.01707269, -0.05880667, 0.01668012,
0.0440469 , -0.03878011, -0.08398554, -0.07418455, -0.02799173,
0.10882041, -0.01572402, 0.13624518, -0.02676439, -0.05427701,
-0.01529219, -0.05176724, 0.05367334, -0.00107446, -0.00264562,
0.05796855, 0.11527288, -0.04911701, -0.15494206, 0.11725425])), (1.4840050960170488, arra
y([-0.01647521, -0.07834627, -0.03795622, 0.0359174 , 0.03689281,
-0.00897316, -0.04864121, -0.07710089, 0.05563915, 0.38743286,
-0.14582129, -0.01310922, 0.06974777, -0.01969408, -0.06385795,
-0.04212032, -0.00445327, -0.02754135, 0.01184245, -0.0039486 ,
0.02341597, 0.51007806, -0.16105521, 0.03972466, 0.0224396 ,
0.0429155 , 0.02903634, -0.02010313, 0.05042578, 0.00409658,
-0.02356482, 0.00522301, -0.00311867, -0.02191703, -0.04233229,
-0.01161743, -0.06236782, -0.0080576 , -0.02258844, -0.02122084,
-0.0362343 , -0.02652322, 0.03201913, -0.00857236, -0.02097777,
-0.01584929, 0.00656723, 0.01599246, -0.04447129, -0.00803429,
-0.01061251, 0.10817242, -0.12478379, 0.08557665, 0.0460519 ,
0.47181799, 0.16525984, -0.20280355, 0.17355343, -0.19177728,
0.1237945 , -0.04323163, -0.03597109, -0.00419303, 0.08125443,
-0.00628357, 0.04920573, -0.00522381, 0.17418589, -0.01709192,
0.00150704, 0.01570768, -0.04528673, 0.04077768, -0.0136485 ,
0.07800799, 0.04705179, 0.04622736, -0.02957357, -0.02562351,
-0.04611269, -0.03345367, 0.0315392 , -0.03166596, -0.06843772,
0.01994364, 0.02590641, -0.00907726, 0.03793356, -0.06581374])), (1.3921255385114626, arra
y([-7.44798782e-02, -2.04057215e-02, -1.70213024e-01, 1.59969471e-01,
1.16745291e-01, 1.22570505e-02, -8.60491578e-03, -3.35444803e-02,
-1.63747860e-02, -8.81691725e-02, 2.29829604e-01, -1.91980144e-01,
2.00935995e-01, -9.24163310e-02, -3.58916386e-01, -1.55938407e-01,
-8.01419058e-02, -1.67543214e-01, -1.58590080e-02, -3.33998688e-02,
1.42090660e-02, -1.01937834e-01, 3.97490694e-02, 3.11486801e-02,
-3.28015525e-02, 1.03256895e-01, -1.58237285e-01, 7.35541952e-02,
8.49211152e-02, 3.08494796e-02, -1.74544509e-01, -8.39287128e-03,
2.78487996e-02, -9.45944424e-02, -1.31726300e-01, -4.56532786e-02,
-2.77369029e-01, -1.65078423e-02, -1.47011482e-01, -1.20544364e-01,
-1.24713634e-01, -2.12675713e-01, -1.59140924e-06, -1.70492395e-01,
-4.71749996e-02, -2.64371816e-02, 1.93074103e-01, 5.10871246e-03,
-4.51392455e-02, -1.12503289e-02, 1.79238763e-01, -3.64441742e-02,
4.17966706e-02, 9.30520985e-03, -7.65341683e-02, -1.14372474e-01,
-3.05317310e-02, 3.57304046e-03, 6.67812385e-02, -1.18929855e-01,
1.15080118e-01, -2.72141011e-02, -6.35371089e-02, 1.76916211e-03,
7.65631431e-02, 1.08834391e-02, -3.97408678e-02, 9.80381325e-03,
-7.29636808e-02, -3.18072086e-02, -1.22301352e-01, -4.27613032e-02,
5.05198151e-02, 2.49743560e-02, 5.62910547e-02, 8.95831493e-03,
3.61783200e-02, 3.30025629e-02, -7.22496957e-02, -5.24225212e-02,
-3.43834525e-02, 3.51711601e-02, -1.93835937e-02, 3.89598662e-02,
-3.88440865e-02, -4.38094026e-03, 7.82669060e-02, 3.50383193e-02,
1.99411248e-01, -1.66581180e-01])), (1.338123871891159, array([-0.00548775, -0.03843159, -0.
03061153, 0.04333244, 0.13993639,
-0.03706519, -0.02074415, -0.03732315, 0.03333833, -0.01952747,
0.22987535, -0.21522097, -0.06384675, 0.08397588, 0.34627929,
-0.04155743, 0.15962213, 0.17622303, 0.03179456, -0.00631348,
0.01499474, -0.03142747, 0.07360389, 0.03405806, 0.05415617,
-0.04012176, -0.21879165, 0.10111445, -0.06312339, -0.02052279,
0.07695396, -0.03795474, 0.02675084, 0.09266905, 0.02364962,
0.00294061, 0.32037628, 0.04085849, 0.22018367, 0.15760833,
-0.07506903, -0.26341749, 0.045752 , 0.046454 , 0.0182172 ,
-0.00876758, 0.02950111, -0.07382824, -0.09627891, -0.03991851,
0.05040326, 0.00198497, -0.07373555, 0.24160928, -0.26190678,
-0.01717238, 0.05946142, -0.07260429, 0.15436931, -0.14236139,
-0.00075114, -0.02015509, 0.12668078, -0.02670258, 0.01202094,
0.07757646, -0.01062797, -0.00859107, -0.03864637, -0.04924234,
-0.12007782, -0.04770347, 0.01634402, 0.02039453, 0.00370991,
0.06774831, 0.07424346, 0.01216043, 0.06891506, -0.01504565,
-0.05261312, 0.05696704, 0.06306901, -0.0986121 , -0.09694006,
-0.02580626, 0.00048615, 0.025849 , 0.04712735, -0.18562838])), (1.2745588269563284, arra
y([-1.35554753e-03, -6.07757998e-03, 1.94705449e-02, -3.64873032e-02,
2.57177745e-02, -1.73766948e-03, -7.66767391e-03, -5.99583989e-03,
1.28081101e-02, 3.13125071e-03, 7.72361809e-02, -9.08219621e-02,
-3.02525331e-02, 1.04350856e-01, -1.07304336e-01, 5.79268206e-01,
-1.92585215e-02, -1.43456269e-01, 1.44049771e-02, 5.54833874e-03,
-3.02947376e-03, 1.53579737e-02, -1.61490155e-05, 1.64651561e-03,
1.70440407e-01, -9.39241300e-02, -1.00536529e-01, 1.19542486e-01,
-8.20623539e-02, 2.74726274e-02, -1.98189598e-02, -4.39066930e-02,
5.74034124e-02, 1.96783609e-02, 2.65304936e-01, 1.24948072e-01,
-2.00105962e-01, 4.30617383e-03, -4.64578459e-02, -2.17425881e-02,
4.94733434e-01, -2.54029837e-02, 4.39087477e-02, -1.04778047e-01,
2.20919184e-02, -5.46829160e-03, 8.31957137e-03, 1.71626353e-01,
-3.94012165e-02, -9.76566932e-02, 2.29359231e-02, -4.01148302e-02,
-2.68252725e-04, 1.11777672e-01, -1.67456052e-01, 3.17300476e-02,
-1.64373986e-02, -9.39181037e-02, 1.05514889e-01, -2.96055848e-02,
-5.97183383e-02, 7.01084344e-02, -5.40776784e-02, -2.26117483e-02,
-2.87330811e-02, 6.69012887e-02, -3.01012039e-02, -1.84687406e-02,
6.26595732e-04, 1.64181008e-02, -5.79387053e-02, -1.26763711e-02,
1.79376732e-02, -4.92679658e-02, 9.23604464e-03, 1.07068444e-02,
2.38723881e-02, 1.23594923e-02, 1.44704544e-02, 5.22613286e-03,
-2.06610793e-02, -5.53429277e-02, -2.80537402e-02, 3.68670983e-02,
7.42196857e-03, -9.91201404e-03, -2.55594189e-02, -2.38138885e-02,
-2.70204929e-02, 1.48195145e-02])), (1.2235863310684403, array([ 0.02162233, -0.02507778, 0
.00346684, 0.0337724 , 0.06339719,
-0.01546107, -0.00802168, -0.01952018, 0.01578492, -0.00374358,
-0.0030274 , 0.06604012, -0.29852924, 0.43570771, -0.01070749,
0.00227028, 0.27598502, -0.19355632, 0.01200504, 0.00137478,
0.01009588, -0.02943156, 0.04368344, -0.03455022, -0.09548836,
-0.02263805, -0.0043047 , -0.14906378, -0.03932275, 0.210181 ,
0.13741778, 0.03943821, 0.06420845, 0.01942264, 0.02616773,
-0.00049999, -0.2467183 , -0.06123914, 0.25875485, -0.07995135,
-0.04528393, -0.03659067, 0.03028853, -0.037632 , -0.06830891,
0.03123251, 0.08586152, -0.08303141, -0.02826377, -0.10886284,
0.07045707, -0.00895639, -0.05643817, -0.07005928, 0.21322098,
-0.07408769, 0.15301434, 0.04851492, -0.03619714, -0.00610947,
0.05533285, -0.0628817 , -0.20534149, -0.00580645, -0.02051046,
-0.03196354, 0.04860021, 0.05638867, -0.03977628, 0.01021369,
0.03052457, -0.01043534, -0.10746098, -0.01933172, -0.00225848,
-0.02649758, 0.02510083, -0.01685237, -0.10001644, -0.0289755 ,
0.01190971, 0.05171331, 0.29344975, -0.06493778, -0.10987834,
0.09982846, 0.07016599, -0.05809229, 0.00350068, -0.11836791])), (1.2178118832338365, arra
y([-1.82521939e-02, 3.98266974e-02, 4.37101951e-02, -1.09924289e-01,
3.81072863e-02, -1.04404282e-02, 5.66479026e-02, 3.40861744e-02,
3.19219948e-02, -2.82643180e-04, 4.60365840e-02, -1.36618844e-01,
5.69508913e-02, 1.79428208e-01, -6.80695887e-02, -2.57879383e-01,
4.86807903e-02, -1.58414203e-01, 1.91672236e-02, 1.04592425e-03,
-1.56431618e-02, 3.50710634e-02, -7.20908512e-02, 2.81295729e-02,
3.12644629e-01, -3.29212614e-01, 1.65765318e-01, 2.72942611e-01,
-2.49384356e-01, 1.43915743e-01, 7.70159238e-02, 3.60081867e-02,
-1.32977978e-02, 5.37143042e-02, -4.73669917e-02, -5.38248529e-02,
-1.66768399e-01, 3.17515143e-02, 2.62159366e-02, 5.40694050e-02,
-2.64704135e-01, 1.22058412e-01, -7.54748962e-02, -7.31124780e-02,
2.54286662e-01, 1.81960202e-02, -7.81860561e-03, -5.30373138e-02,
-2.94341276e-02, -2.58421362e-02, -3.85029912e-02, -4.36028096e-02,
4.67960202e-02, -4.67573892e-03, -6.04959326e-02, 3.44279004e-02,
-4.32294352e-02, -1.27615956e-01, 7.55820622e-02, 3.45839877e-02,
-1.19639293e-01, 1.45885832e-01, 4.67547906e-03, -7.37539521e-03,
-3.27779542e-02, 6.05062047e-02, -7.71285160e-02, 9.55809826e-04,
4.73524121e-02, 3.80451458e-02, 6.64970709e-02, 1.19055043e-02,
3.29736108e-02, -1.62921568e-01, 2.94258653e-02, -2.22492552e-02,
2.72471490e-03, -4.42259233e-02, 9.79541765e-02, 8.86809373e-02,
2.98735972e-02, -3.75493601e-02, -9.15437233e-02, 1.57466241e-01,
3.52801176e-02, -1.31400982e-01, 6.26552634e-04, -4.00723013e-02,
-1.13996015e-01, -5.55332480e-02])), (1.1943992057853123, array([-0.00931702, -0.00424866, -0
.01216005, 0.004068 , -0.02263428,
0.00471673, 0.03084057, -0.00579687, 0.01061565, 0.00286548,
-0.05429538, 0.0424345 , 0.17139521, -0.3992305 , 0.06337345,
0.03866668, 0.50312022, -0.17641882, -0.04637666, -0.05400847,
-0.00154834, 0.00562841, -0.01500136, 0.01718973, -0.02036575,
0.08496288, -0.03571607, 0.04040876, 0.05591341, -0.080265 ,
-0.17623252, -0.02222803, -0.03419361, 0.13241458, 0.04712548,
-0.03352416, -0.12270217, -0.00351138, 0.52701109, -0.04683177,
0.03675955, 0.04148061, -0.02656703, -0.07510523, 0.05351867,
-0.0249037 , -0.0318897 , 0.02718563, 0.01596879, 0.05101118,
-0.02903321, 0.06317588, -0.01326512, -0.03020909, 0.05275157,
0.01706168, -0.05856816, 0.03988794, -0.05196665, 0.01620932,
-0.04609711, 0.13712312, 0.03485711, 0.01156724, 0.03409038,
-0.05809907, -0.03100509, -0.06844818, 0.00742539, -0.0287335 ,
-0.01777249, -0.01164615, 0.05614855, 0.01500253, -0.0393923 ,
-0.04277097, 0.09562431, -0.04900554, -0.06115623, 0.00210678,
-0.00588897, -0.10569315, -0.05087745, 0.14847456, 0.06302812,
-0.06454742, -0.01456505, 0.02526081, 0.02779199, -0.06739155])), (1.1835468203512876, arra
y([ 0.03717678, -0.02535452, 0.02706796, 0.02121948, -0.00643199,
0.06230364, -0.04062001, -0.01679781, -0.00330309, 0.05101256,
-0.03300083, 0.02863528, -0.04126303, 0.08946389, -0.04040111,
-0.05012179, -0.1012712 , -0.14580506, -0.01925575, -0.05337933,
0.01891649, -0.00116377, 0.06500961, 0.03899285, 0.0213303 ,
0.07741037, -0.17050636, -0.16609609, 0.07488031, -0.08878289,
-0.1630476 , 0.11557036, 0.18380201, 0.082032 , 0.10960807,
0.03167498, -0.10141503, 0.23255009, -0.11216596, 0.22622171,
-0.11895868, 0.02902314, 0.07728901, -0.11490983, -0.10045019,
0.00619202, -0.04623066, -0.13040619, 0.17791992, 0.00567379,
-0.01654318, 0.03274446, -0.05062707, 0.01265202, 0.05737324,
-0.0064413 , 0.09038616, -0.01321635, 0.0524206 , 0.02059814,
-0.35873099, 0.27435224, 0.338374 , 0.07888839, -0.11605764,
-0.03570986, 0.01418727, -0.08400272, -0.02308667, 0.02757074,
-0.02740659, -0.04492104, -0.04322222, 0.0325053 , -0.09134924,
0.01900407, -0.05265527, 0.01393194, 0.1479187 , -0.02768956,
-0.01705519, -0.19410776, 0.13552946, 0.05032767, 0.01248076,
0.20909184, -0.08041327, 0.0217424 , 0.02548379, -0.14547926])), (1.175215028494778, array
([-0.01003806, 0.00929693, -0.00839073, -0.00384031, -0.02025381,
0.01108479, 0.03082652, 0.00684872, 0.0040802 , -0.00835238,
-0.00888267, 0.02343615, 0.00107126, -0.02599632, -0.01324817,
0.02040201, 0.06855982, -0.1738684 , -0.02001343, -0.00324887,
-0.00960466, -0.00413402, -0.00319067, 0.06760916, -0.03168774,
-0.04264339, 0.04942483, 0.03072312, 0.02012888, 0.09466353,
-0.06283886, 0.09247561, -0.16311953, -0.00957255, -0.18114619,
0.01839285, -0.12161602, 0.06257645, 0.07093488, 0.41841154,
0.09921985, 0.03727513, -0.0018813 , -0.13057926, 0.03516641,
0.00174659, 0.02598584, 0.16304889, 0.07714149, -0.05721265,
0.01428726, -0.04961955, 0.0233701 , -0.02559996, 0.0155587 ,
-0.01602477, -0.01613317, 0.00661517, -0.02768675, 0.00053859,
0.21964819, -0.41933767, 0.27741611, 0.0572856 , 0.03932133,
-0.01142053, -0.01319052, 0.03717639, -0.0075554 , 0.00712815,
0.03087199, 0.01612459, -0.04848844, -0.00117159, 0.03997727,
-0.06363599, 0.04734225, 0.01954812, 0.45531874, -0.07439785,
0.02489246, 0.03331119, 0.01755671, -0.22160939, 0.01271444,
0.02131253, 0.0366985 , 0.07920369, -0.01420712, 0.07780367])), (1.1607311253642982, arra
y([ 2.87769241e-02, -8.58732180e-03, 5.11720150e-02, -3.63311597e-02,
-1.14374568e-01, 4.54263578e-02, -5.78887157e-03, -2.58465735e-03,
-1.40020678e-02, 2.94326379e-02, -6.44111057e-02, 4.26378371e-02,
-4.62103817e-02, 1.10513405e-01, -6.83260112e-02, -1.32374633e-01,
1.57587691e-01, -4.88596364e-02, 6.47743069e-02, 1.21486343e-01,
-2.98135439e-02, 2.38420919e-02, 4.41077246e-02, -2.58264723e-03,
1.19040856e-01, 7.37694514e-02, -2.99108580e-01, -1.43489044e-01,
1.30655959e-01, -6.51230081e-02, 2.07728568e-01, 8.58640129e-03,
-7.58933110e-02, -2.93781059e-02, -1.13635802e-01, 1.40724954e-02,
-1.51357057e-01, -1.16784720e-03, 1.09990489e-01, -8.49939518e-02,
-1.03762028e-01, 1.61494462e-01, 8.54602535e-02, 1.05684727e-02,
-1.58935743e-01, 7.89426677e-02, 1.10384311e-02, -2.27832673e-02,
1.53458821e-01, 5.05888760e-02, 3.94104789e-04, -9.09718920e-02,
3.38157682e-02, 2.34985521e-01, -4.08935670e-01, 5.66997577e-02,
-1.70489242e-01, 6.78470832e-03, 1.46005574e-02, 5.25839206e-02,
5.42972822e-03, -6.47901863e-02, -1.28497758e-01, 4.30876884e-02,
-4.98891316e-02, 9.35210472e-02, 4.30546194e-03, 2.88052594e-02,
4.62027218e-02, 6.67473103e-02, -8.45943593e-02, 5.52658733e-02,
5.18162141e-02, -2.87932257e-02, 2.95775228e-02, 7.21685520e-03,
-1.11638155e-01, 9.76728264e-02, -1.98870673e-02, -1.41592603e-02,
1.73892059e-03, -9.33084113e-02, -2.54801086e-01, -1.00897644e-01,
1.03824118e-01, 2.08909237e-01, -1.09038150e-04, -2.96310066e-02,
-3.01645687e-02, 2.05984863e-01])), (1.148470393300462, array([-3.61964599e-05, 2.65794249e
-02, 3.60987797e-02, -6.30859519e-02,
3.53506277e-02, 2.35960207e-02, 3.72210055e-02, 2.50778801e-02,
2.32019288e-02, -5.01718666e-02, 8.58251787e-02, -6.55178810e-02,
1.59312468e-02, -1.04240857e-02, 2.31768437e-02, 4.78816367e-02,
-3.16944574e-02, 3.16321506e-02, -5.56196280e-02, -8.64990801e-02,
-1.94608535e-02, 2.91957870e-02, -1.19483897e-01, 7.98492218e-02,
2.69925041e-01, 2.97472967e-01, -1.46214798e-01, -3.88005261e-01,
1.58821192e-02, 2.46261316e-01, -1.12748529e-01, -3.44188737e-02,
-6.91164070e-02, -6.31484345e-02, 9.59373799e-03, -4.91259929e-02,
1.29723837e-01, 4.52989851e-02, -1.12190427e-02, -6.85997785e-02,
3.44038836e-02, 6.86329050e-02, -1.49008856e-01, -1.39422356e-01,
2.51729095e-01, 6.54074575e-02, -3.38979458e-02, -3.67616247e-02,
5.93783187e-02, 8.05105269e-02, -3.99044291e-02, -1.30776028e-02,
2.26413898e-02, -2.68176282e-03, -3.03462048e-02, 3.52784116e-02,
-2.99922388e-02, -6.90536366e-02, -2.80968997e-02, 7.02937675e-02,
4.56930898e-02, -8.37216294e-02, 8.48566660e-02, -1.34127177e-02,
1.41981688e-02, 9.92679438e-02, -3.34988956e-03, -2.94506250e-02,
1.32539787e-01, -6.00787755e-02, -2.51151415e-02, 9.89540495e-03,
-9.42027640e-02, -2.66558520e-01, 4.05104637e-02, -1.54805616e-01,
-3.76218661e-02, -1.74510560e-01, -1.30386733e-01, -1.52019447e-01,
1.40425150e-01, 2.42306623e-01, 8.44530920e-02, 6.53919613e-02,
6.36447079e-02, 5.02101913e-02, 8.80789722e-03, 6.75093163e-03,
-9.41898938e-02, -1.37811683e-01])), (1.1194893763589782, array([ 9.75212477e-03, -2.14980571
e-02, 5.02994180e-04, 1.70836411e-02,
4.84416567e-02, 5.29926893e-03, -1.76297741e-02, -1.84186047e-02,
-7.34638265e-04, -9.65676047e-02, 1.62105007e-01, -6.97746038e-02,
-9.07416941e-02, 1.00229032e-01, 7.87474923e-02, -1.30837342e-03,
3.72779209e-03, -1.66772165e-01, 5.80607165e-02, -5.92666217e-02,
-5.06231372e-02, 5.00420706e-02, -5.76942041e-02, -2.33076116e-02,
-2.28977381e-01, 4.53010866e-02, 4.30024819e-01, -1.87498337e-01,
2.95267449e-02, -5.09800705e-02, -1.76023158e-01, -4.50249163e-04,
9.18409746e-02, 5.71992742e-02, 8.82050093e-02, 4.67645473e-02,
-5.21500033e-02, 4.66011022e-02, 4.25656486e-02, 8.54790914e-02,
-2.34339574e-02, -6.76744639e-02, 2.23879271e-02, -7.27912296e-02,
-6.87466408e-02, -2.27698963e-02, 9.92739113e-03, -4.40896135e-02,
2.10350772e-02, -2.74350657e-02, 1.63940431e-02, 3.26246694e-02,
-4.88825022e-02, 1.08413925e-01, -9.60192428e-02, 1.04039975e-01,
-2.07286354e-01, 7.28417267e-02, 2.54386057e-03, 1.68173847e-03,
-6.82465550e-02, 9.38517009e-02, -2.04653192e-02, -1.25056428e-02,
-1.79577705e-02, 3.59411927e-02, -3.10549696e-02, -1.86131538e-02,
1.11532990e-01, -2.17296757e-02, 6.31582889e-02, 3.33960948e-02,
-8.06849400e-03, 4.77517866e-03, -1.55688143e-01, 1.15610590e-01,
-4.39679617e-03, 8.33406560e-02, 1.63903924e-02, -9.29607729e-03,
-5.89541465e-03, 4.54125479e-01, -3.76626344e-01, -1.84220010e-02,
-3.05357641e-02, 2.68322063e-02, 8.90767292e-03, -9.60404650e-02,
3.64751766e-03, 1.39357795e-02])), (1.1096027631760017, array([-0.03055394, 0.02006923, -0
.01184309, -0.03560009, 0.03543938,
-0.03474633, 0.07979204, 0.01308007, 0.02382763, -0.12809165,
0.17073957, -0.1205178 , 0.0791998 , -0.06903869, -0.05864014,
-0.03898209, 0.02089783, -0.09830454, 0.06454565, 0.13005107,
0.01883959, -0.01667412, -0.14323759, -0.10465112, 0.15049143,
0.20588045, 0.09559427, -0.24177582, 0.01675801, -0.32678463,
0.22266998, 0.06855058, 0.06707538, 0.11026669, -0.11583863,
0.03869523, -0.15933217, -0.04541662, 0.01445586, 0.07800317,
0.05061759, -0.20758707, -0.02736484, 0.1909889 , 0.09907509,
0.02327433, -0.02997833, 0.13052392, -0.08841585, 0.1226507 ,
-0.08356291, -0.00260784, 0.00180205, -0.00090463, -0.00061247,
-0.03144258, 0.13068261, -0.09081928, 0.01904294, 0.01296567,
-0.05249401, 0.10687873, -0.03820156, -0.00198812, -0.00053131,
0.01615025, 0.15613067, -0.08968868, 0.05123437, -0.08701044,
0.00300159, 0.00884269, -0.14801765, 0.05676924, 0.00349479,
-0.0093203 , -0.16312161, 0.01851605, 0.13961157, 0.22166984,
0.0680938 , -0.00547623, 0.20430917, -0.07466997, 0.00161567,
-0.15138955, 0.17433659, -0.09265044, -0.03864607, 0.16236077])), (1.1049386135422454, arra
y([ 0.00390238, 0.01124098, -0.00319226, 0.01276039, 0.02215105,
0.00251291, 0.03190408, 0.01135741, -0.01215839, -0.02391252,
0.06608213, -0.03203576, 0.01005787, 0.00093562, -0.1271413 ,
0.0043077 , 0.09066048, 0.13911112, -0.02572941, -0.06674581,
0.01267361, 0.00205398, -0.00689407, -0.01659854, 0.21426127,
-0.25949547, 0.18789231, -0.04147826, -0.04967394, 0.02929571,
-0.23312165, 0.31962601, 0.08978283, -0.06819931, -0.1871029 ,
0.05366093, 0.13418003, 0.11565465, 0.089745 , -0.28088451,
0.09355484, -0.05455608, 0.06352312, -0.00994818, -0.06054641,
-0.07468322, -0.00445091, 0.10529331, 0.15791134, 0.12185223,
-0.03846646, -0.07495017, 0.00653686, 0.05899556, -0.08369328,
-0.03960324, 0.072001 , -0.02163773, 0.03886216, -0.00853727,
-0.05292571, -0.05905547, 0.13561911, 0.25990598, -0.04319963,
-0.08102355, -0.00085609, -0.09659159, -0.01459406, 0.01992934,
0.19136588, -0.0134315 , 0.02139756, 0.10682446, -0.01656247,
-0.02167538, 0.13389442, -0.00706919, -0.22614349, 0.07601096,
0.04167494, -0.00923226, 0.0642536 , -0.25490954, -0.04759977,
0.16218892, 0.13475305, -0.05883087, 0.01153269, 0.06958206])), (1.0914342813543214, arra
y([ 0.02189241, -0.03443139, 0.00945862, 0.02380971, -0.00049311,
0.01258217, -0.0174738 , -0.02829601, 0.00979614, 0.1627378 ,
-0.1235408 , 0.07101763, 0.00283509, -0.03326187, -0.01674985,
-0.00679414, 0.01288757, -0.04424785, 0.05538182, 0.05505606,
0.12584791, -0.07862583, 0.04550909, 0.13851923, -0.17950897,
-0.06372055, 0.19302631, -0.06672413, -0.02183436, 0.1176496 ,
0.01215119, -0.07630287, 0.10243681, -0.08379255, -0.05316322,
-0.04903896, -0.01993359, -0.05983417, -0.00481602, 0.04853408,
0.04414344, 0.04041471, -0.00319666, -0.11173285, 0.00396367,
-0.04073454, -0.01805713, 0.03039337, -0.19989023, 0.36637598,
-0.0792469 , 0.14439615, -0.00384276, 0.17040613, -0.31548541,
-0.14895144, 0.25802145, -0.08622959, 0.01933251, -0.00363105,
-0.0267289 , 0.03013368, 0.01289024, -0.10870153, -0.00955271,
-0.16005998, -0.01934533, -0.00140748, -0.13589127, -0.0481737 ,
-0.00224047, -0.14533548, 0.28976179, -0.17671099, -0.07014446,
-0.14335317, -0.15035157, 0.02526417, 0.04124523, -0.07163569,
-0.00110598, 0.09289375, 0.14714881, 0.11619341, 0.01358176,
0.0716517 , 0.10363371, 0.04587821, 0.01845485, 0.09531802])), (1.0846048546102214, arra
y([ 1.36416745e-02, -3.76527152e-02, 5.28365265e-02, -6.71135070e-02,
-3.35727071e-02, -5.32886764e-03, -5.07830935e-02, -3.29187958e-02,
1.40798118e-02, -6.18918016e-02, -3.69930704e-02, -4.26516643e-03,
3.49837879e-02, 6.77725364e-02, -7.07306194e-02, -4.57095020e-02,
-4.05979345e-03, 1.08793143e-02, -1.85499966e-01, -5.90737535e-02,
-9.00830552e-02, 9.51727326e-03, -2.42416783e-03, -1.30326030e-01,
8.31577790e-02, 1.58379027e-01, -1.54707423e-01, -3.69055337e-02,
-1.34675257e-01, 2.16944049e-01, -1.67535471e-01, 1.42152860e-04,
1.40949060e-01, 1.29576748e-01, -5.52087481e-02, -4.38598682e-02,
-2.11872636e-02, -1.64992513e-01, -5.00899381e-02, 1.04220238e-01,
-1.85693009e-02, 1.69091507e-01, -6.76802710e-03, 2.43984780e-01,
-1.34055526e-01, -9.39195572e-02, 5.67596962e-02, 7.37155245e-02,
-2.13678509e-01, -2.10050985e-02, 4.96224151e-02, -4.66060006e-02,
1.94950335e-02, 6.18430541e-02, -1.17830163e-01, 9.37997796e-02,
-1.38472573e-01, 2.39545267e-02, 1.11274762e-02, 2.76187965e-02,
-6.56220134e-02, 5.16874758e-02, -2.63416474e-02, -1.08027231e-01,
-4.05131629e-02, -3.64657569e-01, -1.49822854e-03, -2.17486451e-02,
9.54450737e-02, -1.93217907e-03, -4.04227011e-02, 9.06092703e-02,
1.76503982e-01, 2.64354242e-01, -4.30849875e-03, -1.45214699e-01,
2.28849015e-01, -1.66326592e-01, 1.17558268e-01, 1.23194184e-01,
8.76691304e-03, 1.37073531e-01, 9.53029811e-02, -3.55104498e-02,
-7.08018442e-02, -3.27677377e-02, -4.75275678e-02, -5.51551190e-02,
-3.94044368e-02, 5.05800808e-02])), (1.0827345320601791, array([ 0.02608929, 0.01371157, -0
.01316454, 0.07103393, 0.06334471,
-0.01052515, 0.05838226, 0.01794481, 0.01605202, -0.08722451,
-0.00212057, 0.01563993, -0.040189 , 0.06106663, 0.04471129,
0.03679496, -0.03609148, 0.05505417, -0.01942624, -0.074719 ,
-0.07492881, 0.02029782, 0.01929931, -0.07387488, 0.10632031,
-0.12726389, -0.06496204, -0.16477549, 0.12185701, 0.03147709,
0.0849513 , 0.27178893, -0.19333652, -0.1181441 , 0.08843202,
-0.09415497, 0.04062823, -0.18361823, -0.01325183, 0.13323152,
0.04412477, -0.00660519, 0.01855727, -0.07165366, 0.03654398,
-0.01583086, -0.00077891, 0.09646896, -0.17793183, 0.19398858,
-0.06899618, 0.14358455, -0.08394391, 0.00725759, 0.09285048,
0.04445943, -0.15460279, 0.13528103, -0.03727375, -0.00432682,
-0.00749074, 0.00194684, -0.09744172, 0.22268596, -0.01705897,
-0.2809283 , -0.02343944, -0.01155507, 0.04431471, 0.02610905,
-0.09651504, -0.00723791, 0.11274514, -0.06014666, 0.07767799,
0.31518473, -0.08841039, -0.00263312, 0.09328507, 0.02644484,
-0.05111885, -0.13368294, -0.13157751, 0.13679515, 0.05311068,
0.01467942, 0.2654361 , -0.05453342, 0.05882195, -0.26311322])), (1.0711889349293784, arra
y([ 7.35032739e-03, -6.05294219e-03, -3.18961523e-02, 6.92206083e-02,
2.11146054e-02, -1.80449277e-02, 1.73558765e-02, -4.30268813e-03,
8.40429472e-03, -5.06717369e-02, -3.20152398e-02, 8.37848800e-02,
-6.93265601e-02, 6.89973963e-02, -1.43561936e-01, -7.60267861e-05,
1.08240437e-01, 9.97229941e-02, 6.25795502e-02, -2.68711516e-02,
-8.95563476e-02, 3.11804891e-02, 2.92291847e-03, 6.99087376e-02,
1.72306788e-01, -1.70191104e-01, 4.92745375e-02, -1.16659803e-01,
8.01172970e-02, 2.19604626e-02, 4.57797290e-03, -3.18535647e-01,
2.83039200e-01, -4.42865328e-02, -2.14172005e-01, 8.65819959e-02,
4.59897609e-02, 1.67839804e-01, 4.74444353e-02, -2.04096693e-03,
4.45972907e-02, -1.84784859e-02, 3.67002221e-02, -3.89093533e-02,
-5.44284701e-02, -7.35226162e-02, -1.79040400e-02, 1.03478098e-01,
-2.36485139e-01, 1.56970299e-01, 8.61790027e-04, -7.76936897e-02,
-9.22260724e-03, -2.88677542e-02, 8.94937864e-02, 7.70136302e-02,
-2.26364718e-01, 1.80732695e-01, -7.33757398e-02, 3.04025298e-03,
-1.29459403e-02, -9.29285890e-03, 1.67078601e-01, -3.92809793e-01,
3.15256265e-03, 1.38861430e-01, -8.32651408e-02, -4.50246873e-02,
7.95186158e-02, -6.99649803e-03, 4.56267492e-02, 1.02014106e-01,
-5.46238406e-02, 8.18037302e-02, 8.04374447e-02, 1.31697986e-01,
-1.52482944e-01, -2.38383139e-02, 1.89103034e-03, -1.06697907e-01,
-8.98353059e-02, -1.41209456e-01, 2.83590603e-02, -6.28238278e-03,
-6.59767662e-03, -4.96758182e-02, 6.35906472e-02, 6.19351502e-02,
6.61072339e-02, -6.05161345e-02])), (1.0636889264741323, array([-1.78741662e-02, -3.35305864
e-03, -4.63080385e-02, 4.79209043e-02,
-4.41204495e-02, 2.51674495e-02, 8.04611723e-03, -6.59300723e-03,
-2.84730843e-02, 2.41629079e-02, 3.87817790e-02, 6.71582876e-02,
-5.03348601e-02, -1.62125338e-01, 1.74122016e-01, -5.42585379e-02,
-8.18383340e-02, -3.52185647e-02, -3.77420648e-02, -5.34238699e-02,
1.17142345e-01, 1.94547224e-02, -7.08832856e-02, -1.27047112e-01,
3.39755313e-01, -1.35466949e-01, 1.04595107e-02, -1.30035681e-02,
7.39993858e-02, -5.81725018e-02, -1.44061875e-01, -1.99191906e-01,
1.79286967e-01, -1.21190591e-01, 8.01792571e-02, -7.64494994e-02,
1.66262543e-02, -2.39493290e-01, 3.83750787e-02, 1.83870406e-01,
-7.47679673e-02, -8.07918404e-02, 6.24524946e-02, -3.99129133e-02,
-1.42741438e-01, -7.21310505e-03, 4.96211043e-03, 8.48621898e-02,
2.15515680e-01, -8.26625518e-02, 2.15747714e-02, -4.48358208e-02,
-2.04433750e-04, -1.59766103e-02, 3.70072966e-02, -6.60417365e-02,
1.58815573e-01, -3.26744578e-02, -5.40505568e-02, 2.52737200e-02,
7.66776346e-02, 2.97530563e-02, -2.57170177e-01, -1.02254792e-01,
3.18499749e-02, 6.61156058e-02, 4.39845954e-03, -4.25431053e-02,
-1.01418002e-01, -5.75079706e-02, 1.35511675e-01, -6.17755409e-02,
3.27842198e-02, -1.01334015e-02, 1.06259242e-01, 3.39699852e-02,
1.81752213e-02, -2.13502123e-01, 1.44220602e-01, -1.17824890e-01,
-2.69799529e-02, 3.04793508e-02, -4.85665970e-02, 5.10212773e-02,
1.21739561e-03, 2.39552260e-01, 3.37575025e-02, -2.97237033e-01,
7.24032687e-02, 6.35448512e-02])), (1.054552899346843, array([ 0.00189664, -0.02418795, 0.
01666143, -0.02559334, -0.04723553,
0.08762733, -0.01844839, -0.02246402, -0.00971444, -0.00164947,
0.06009845, 0.05435715, -0.11430688, 0.05245795, -0.07366107,
-0.09214164, 0.03386417, 0.17952946, -0.00685296, -0.06151238,
-0.01797202, 0.02444293, 0.01174569, -0.09673668, -0.0569378 ,
0.06625694, -0.09712417, 0.08872945, 0.04657131, -0.03924046,
-0.08835877, 0.01618904, -0.02475417, 0.00304524, -0.16224386,
-0.05104083, 0.13126887, -0.09246795, 0.0206345 , 0.02608835,
0.03934495, 0.01252745, 0.08147797, -0.24957468, -0.0223957 ,
-0.07817746, -0.02509523, 0.3154523 , -0.13966946, 0.06831692,
-0.02458838, 0.00925475, -0.05591175, -0.02468803, 0.13381471,
0.02909932, -0.00545406, -0.03017539, 0.02461014, 0.02148193,
-0.09844483, 0.14146013, -0.10725505, -0.04139617, -0.03799853,
0.10957219, -0.02784257, -0.01238004, -0.05261667, 0.07543871,
-0.08444205, -0.0357948 , -0.23817986, -0.24027782, 0.01643644,
-0.32423113, 0.18525905, 0.38181714, 0.10277854, 0.21109554,
-0.12364024, 0.06989283, -0.04347523, 0.05079479, 0.03984204,
0.09982685, -0.03860539, -0.12303043, -0.03791448, -0.01685344])), (1.0495564454254236, arra
y([ 0.01064236, 0.01687834, 0.00774352, 0.00608319, -0.0663222 ,
0.02874585, -0.01472903, 0.01797059, -0.01661496, 0.00404952,
-0.01793021, 0.02676529, -0.03435304, 0.06351985, -0.08183986,
-0.05729709, 0.11622493, 0.1527865 , 0.00436274, -0.09122086,
0.05417958, -0.04566393, -0.01298813, -0.05333558, 0.05338584,
0.00566828, 0.08695468, -0.00570582, 0.01474224, -0.05766692,
-0.1724336 , 0.14035376, -0.03804171, -0.15738222, -0.08266736,
0.04978232, 0.05402185, 0.01735426, 0.1069416 , 0.0049564 ,
0.01182338, 0.08756753, 0.00285441, -0.00483389, -0.06538106,
-0.19215988, 0.02330812, 0.16395316, -0.11995921, -0.30957911,
0.08650203, -0.02370349, 0.04727246, -0.01419063, -0.05444469,
0.00958458, 0.10879339, -0.09002233, 0.0206424 , 0.0085553 ,
-0.01636564, 0.03688603, -0.02780243, 0.21259285, 0.00504157,
0.10054532, 0.06241784, 0.10927469, 0.06292144, 0.0116295 ,
-0.23382899, 0.05088776, 0.06485522, 0.10607442, -0.03182513,
0.03136658, -0.51409216, -0.09419313, 0.05395911, -0.09983506,
-0.09806359, 0.12642438, 0.1162335 , 0.16582754, 0.05573822,
-0.12724108, -0.25410565, -0.01573826, 0.01954463, 0.0113436 ])), (1.048150719936755, array
([ 6.32926518e-03, 1.93752966e-02, -6.53840904e-04, 1.27988606e-02,
-2.23016780e-02, 2.23062907e-02, -1.25351527e-02, 1.94998288e-02,
-1.66564719e-02, 1.12505578e-01, -2.63086807e-02, 1.07711355e-02,
-2.42326174e-02, 4.17755806e-02, -2.74605434e-02, 1.52271476e-03,
3.39426545e-02, -5.25063210e-02, -2.08645711e-01, -9.29383028e-02,
1.55201829e-01, -1.05959186e-02, 1.46360546e-02, 7.17564931e-02,
-2.21967333e-03, 3.61857593e-03, -9.09097678e-02, 8.68340523e-02,
5.21987840e-02, -1.96675177e-02, -1.77688729e-01, -6.89211306e-02,
-8.28044588e-03, 2.71875449e-01, -5.61288151e-02, 9.82268066e-02,
1.41919011e-02, -9.46419057e-02, -2.21997555e-02, -6.73477724e-02,
2.91777474e-04, 3.23841883e-02, -7.05894469e-03, -2.66193621e-02,
3.08786946e-02, -1.17226362e-01, -3.41913073e-02, -6.49851301e-02,
-3.64756074e-02, 1.53302816e-01, -2.73293681e-02, -9.04331988e-02,
3.10836892e-02, -5.49365525e-03, -1.92861272e-02, -7.27238605e-02,
1.20203541e-01, -5.49052633e-02, -8.10101525e-03, 2.82562220e-02,
2.27497961e-02, -4.56225291e-02, -4.50583496e-03, -8.53459705e-02,
-5.91057225e-03, -1.42297715e-01, 1.29745173e-02, 9.10621502e-02,
-1.10595308e-01, 1.60057000e-01, -6.90532574e-02, -2.61851164e-02,
-3.13303654e-01, -1.87659790e-01, 8.05732976e-02, 4.47115037e-01,
-1.32055824e-01, -1.90136154e-01, -8.59790525e-03, 3.21439334e-01,
1.76632464e-02, 1.51021356e-01, -9.00769966e-02, -1.30557213e-01,
6.19954135e-02, 4.48562221e-02, -8.80594436e-02, 1.19985457e-01,
7.29911054e-03, 5.90922916e-02])), (1.0416363339168457, array([ 1.46623820e-02, 1.54303991
e-02, 1.79455242e-02, -4.32272660e-03,
2.35530131e-02, -9.42976306e-03, 2.03551780e-02, 1.73751586e-02,
-2.37510925e-04, -4.17669585e-02, -2.87456320e-02, 1.30099513e-02,
6.80922864e-03, -8.69418294e-04, 5.81911122e-02, 6.02727274e-03,
-3.76176668e-02, -2.61469369e-02, -1.77654206e-02, -1.02570906e-01,
-1.40732267e-01, -2.30153512e-02, -7.99214816e-03, -8.46987746e-02,
6.93018879e-02, -6.99778433e-02, 4.52420607e-02, 3.34853621e-02,
2.43491611e-02, -1.19113095e-01, -7.75098025e-03, -1.21760579e-01,
1.14685497e-01, 2.25428210e-01, -1.10637738e-01, 3.40121251e-02,
1.57074247e-03, -8.49852459e-03, -3.95465819e-02, -5.60547606e-02,
5.59753541e-03, -1.02641664e-02, -1.56158744e-03, 8.08025724e-03,
3.88694509e-02, 5.18183511e-02, 2.87509168e-02, 4.34750839e-02,
1.31356215e-01, -1.76073092e-01, 6.86705263e-02, 4.02618874e-01,
-2.79742194e-02, -8.14257034e-02, 5.59803184e-02, 4.68295630e-02,
-7.85045448e-02, 1.19823169e-02, 1.26926437e-02, -2.64786421e-02,
9.56324796e-02, -1.10901569e-01, -1.39180410e-02, -6.63077191e-02,
2.49854692e-02, -2.51737936e-02, -2.19902848e-01, -6.11794604e-02,
5.36039258e-02, 6.68968645e-02, -2.45041001e-01, -8.18407346e-02,
3.21649268e-01, -1.64446235e-01, -1.29230020e-01, -8.38560239e-02,
-2.05545607e-01, 5.81414621e-02, -5.98264940e-02, 2.33343621e-01,
2.48402369e-01, -3.12717029e-02, 9.50193588e-02, -2.30041984e-01,
7.45476523e-02, 1.64200297e-01, -1.60027315e-02, -3.97527715e-02,
-3.80590486e-04, -1.18257727e-01])), (1.0381369601703594, array([-0.01452019, -0.01310388, -0
.00336089, -0.02087642, -0.02197353,
0.05608255, 0.01455196, -0.01515219, 0.00758196, 0.02941782,
0.06903329, 0.01999798, -0.07352511, -0.00069396, -0.0552507 ,
0.0230458 , 0.07026268, 0.06683153, 0.01384287, -0.12941273,
0.07686492, 0.00546715, 0.0082493 , -0.00252667, 0.01492917,
-0.00691446, -0.04071411, -0.01726711, -0.01130595, 0.22201318,
-0.01064372, -0.05323656, -0.16254632, 0.01430831, -0.04516928,
0.07372426, -0.0349304 , 0.102635 , 0.01170177, 0.0255023 ,
0.04308332, 0.00141148, 0.02590048, 0.21401737, -0.1421845 ,
0.12882908, -0.00345029, 0.02566968, 0.01811789, 0.0097195 ,
-0.02130792, 0.14354843, -0.0388769 , -0.03878133, 0.08394915,
-0.05331129, 0.11921897, -0.14265847, 0.07302256, 0.0005525 ,
-0.05354975, 0.0696055 , 0.02237221, -0.09216676, -0.01143389,
0.01236562, 0.02386611, -0.04510227, -0.08527455, -0.04482601,
0.10257254, -0.30206641, -0.05343934, 0.31309607, 0.32663363,
-0.14205531, -0.21014011, 0.1071922 , 0.02449223, 0.06901395,
0.31163194, 0.06347318, -0.34719675, 0.06869492, 0.02253956,
-0.01624024, 0.08994234, 0.10084055, -0.0057217 , -0.14646712])), (1.0334519547258465, arra
y([ 5.77656009e-04, -6.14921415e-03, 1.52996270e-02, -2.56453126e-02,
1.72353854e-02, 3.11267022e-02, -3.45374061e-02, -5.69263252e-03,
-6.27046938e-03, 1.14765169e-02, 8.08286924e-03, 1.94396853e-02,
2.26049114e-03, -9.07174977e-03, -1.92518361e-01, -4.57572252e-02,
1.19314884e-01, 2.15268187e-01, 2.73962655e-03, 1.88035665e-01,
-1.82317506e-02, 1.35226005e-02, 3.52944128e-02, 3.01395938e-02,
-1.35578444e-01, 2.14035930e-02, 2.08187780e-02, -8.50202768e-03,
-4.90208524e-02, 7.78140518e-02, 1.57332492e-01, -4.31036636e-02,
4.91241884e-02, -6.15175316e-02, -1.73908341e-01, -2.43889746e-01,
-2.10275312e-04, 1.67260936e-01, 8.27515080e-03, -3.85271443e-02,
1.40574031e-01, -6.91170120e-02, -3.03533770e-02, 4.36044513e-02,
6.53480795e-03, 3.48562972e-01, 3.81207646e-02, 2.12294650e-01,
1.37878922e-01, -2.19623701e-01, 4.09898494e-02, -1.31708774e-02,
-1.78900973e-02, 2.42607087e-03, 3.14801899e-02, 3.44814079e-02,
2.59723498e-03, -2.00335054e-02, 3.68773565e-03, 1.75328683e-02,
-6.44631385e-02, 6.46585338e-02, 5.75447016e-02, -1.56181005e-02,
-1.47293524e-02, -1.61766587e-01, 3.15482798e-03, -2.36391663e-01,
-7.59884246e-02, -1.98443256e-01, -4.50284225e-02, -1.03828833e-01,
3.53474695e-02, -1.75930249e-01, -2.24283935e-02, 3.15305407e-01,
1.11137397e-01, -1.87891103e-01, 1.00704819e-01, 6.24550923e-02,
-3.41999807e-02, 1.13475557e-01, -2.60730071e-02, 1.43323613e-01,
-1.56250938e-03, 1.02246588e-01, -1.27730473e-01, -3.73056009e-02,
-2.51357486e-02, 2.17889221e-02])), (1.027681645874375, array([-0.01310177, 0.01012866, -0.
03677743, 0.04007282, 0.02208434,
-0.0266942 , -0.05120676, 0.00704602, -0.03124247, 0.12697724,
-0.06647692, -0.00186713, -0.01114223, 0.08695685, -0.03312939,
-0.03077175, 0.0148754 , 0.02452668, 0.01383393, 0.00964574,
0.28695189, -0.07626678, -0.06120312, 0.2266083 , 0.12077144,
0.00223681, 0.03746042, -0.23434108, 0.07633301, 0.07424533,
-0.00093397, 0.04150766, -0.15909224, 0.23787766, -0.03534287,
0.0836789 , 0.00457522, 0.07283674, -0.06969312, -0.02903364,
-0.01943321, -0.09437936, -0.01090542, 0.01431064, 0.00078577,
-0.06842964, -0.00583048, 0.09406928, -0.11215129, -0.05293754,
0.04177536, 0.10423728, 0.01473155, -0.02739377, -0.01496355,
-0.05298254, 0.00411919, 0.11092835, -0.08741769, 0.01822837,
0.01908275, -0.02756407, 0.0354024 , -0.02845482, 0.00891159,
0.1017309 , 0.1356951 , -0.05839743, -0.1241226 , 0.19573594,
-0.0861197 , -0.07853395, 0.15779429, 0.01583231, -0.08672095,
0.0854994 , 0.26404478, 0.0554805 , 0.0046045 , -0.09116275,
0.14921383, -0.13585461, -0.09655757, 0.02634402, -0.0284895 ,
-0.23497157, -0.21359727, -0.42175165, 0.02789145, 0.04223427])), (1.0238189302225627, arra
y([-3.34466103e-03, 6.38920502e-03, -3.61879641e-02, 5.70146856e-02,
2.37921035e-02, 1.36315975e-02, -3.13055516e-02, 5.38838099e-03,
-2.36030724e-02, 2.13974787e-02, 3.25216864e-02, 2.21149601e-02,
-5.06359500e-02, -1.44759871e-04, -6.38409007e-03, 3.93762100e-02,
2.92729685e-02, 4.69509273e-03, -1.87429134e-01, -9.71013168e-02,
3.43571003e-02, 5.75364424e-02, -3.26306519e-02, 1.28664041e-01,
7.77140198e-02, -4.93662232e-02, 6.92259270e-03, -4.87507395e-02,
9.57992859e-02, -3.19971853e-02, -5.98961019e-03, -6.69245285e-02,
-1.58410107e-01, 1.68967514e-01, 9.27890488e-02, 4.63448002e-02,
-1.19614283e-02, 9.31119485e-02, -2.89798864e-02, -1.81792535e-02,
-2.19585018e-02, -7.84419532e-02, -2.71152907e-02, 6.45713023e-02,
-4.00682871e-02, 1.38066375e-01, 6.14912115e-03, -7.93693725e-02,
1.36834518e-01, 4.22202882e-03, 1.75654360e-02, -2.64308583e-01,
1.18320984e-02, 1.30152840e-02, 4.13198580e-02, 2.01704456e-02,
-5.94070552e-02, 6.11268842e-02, -2.01341437e-02, -3.41170818e-02,
3.44108864e-02, -1.73302164e-03, 2.84298457e-02, 3.41668876e-03,
3.49527597e-02, -1.89583555e-01, 6.14688462e-02, 9.06845883e-03,
4.80229754e-02, -1.62294049e-01, 2.15218250e-01, 3.15533805e-02,
3.14501896e-01, -1.74348423e-01, 2.61057759e-01, -1.32057252e-01,
-2.01552514e-01, 2.76094879e-01, -3.33620929e-03, 1.74634167e-01,
-3.68153250e-01, 9.10455881e-02, 1.19570172e-01, -1.59658147e-02,
-1.08267500e-02, -1.04737207e-01, -1.22950130e-01, -1.03392515e-01,
5.15375610e-02, -3.83322394e-02])), (1.015216974820168, array([-0.01618276, -0.02360014, -0.
02926011, 0.02127483, 0.00887336,
-0.01031012, -0.03890124, -0.02537713, -0.00709545, 0.0183462 ,
0.02564321, 0.03171079, -0.02552684, -0.05290231, -0.05858041,
0.01397781, 0.04483173, 0.10630102, -0.09710735, 0.21751889,
-0.16086921, 0.03953656, 0.01445865, 0.05711042, -0.05360371,
0.04744034, -0.10515641, 0.06592877, -0.06065161, 0.19837085,
-0.08167092, 0.001723 , -0.05500338, 0.0178873 , -0.02935533,
0.23903312, 0.05112242, -0.20404582, -0.00608127, 0.05924824,
-0.03380718, -0.13369095, 0.09619283, -0.16325832, -0.08844357,
0.05882671, 0.00166718, -0.12655043, 0.08310361, 0.10773846,
-0.0047094 , -0.0187615 , -0.0257083 , 0.03410192, -0.00164383,
0.06775397, -0.13520662, 0.03043049, 0.01271346, -0.02116568,
-0.00908414, 0.07359619, -0.09341584, 0.15139447, 0.01570467,
0.03237653, -0.36783292, -0.1593625 , -0.00612681, -0.02159892,
0.23910808, -0.0372705 , -0.09220093, 0.02165323, -0.11853159,
0.10044386, -0.19909785, 0.07842302, 0.10094837, -0.02818856,
0.28365203, -0.00366906, 0.21658786, 0.07471534, -0.0130536 ,
-0.20419061, -0.08081454, -0.1592173 , 0.02630263, 0.15241717])), (1.0136313723660826, arra
y([ 0.0047057 , 0.00537221, -0.00935277, 0.02499465, 0.01681789,
-0.00869954, -0.01181598, 0.00597274, 0.00296281, 0.05173311,
-0.02052558, 0.02072112, 0.01177928, -0.05292222, -0.06871935,
0.04976477, 0.04398419, 0.05734055, 0.28848477, -0.02145317,
-0.02022251, 0.03238485, 0.0190972 , 0.06581662, 0.02599537,
-0.04334498, -0.02015788, -0.01891913, -0.00523579, -0.00176664,
0.00598669, 0.01938069, 0.05690915, 0.25457602, 0.00047613,
-0.43171399, -0.04496656, -0.15158992, -0.0630216 , 0.01819853,
0.08063392, -0.05862379, 0.01054687, 0.15530031, -0.0435065 ,
-0.11169971, 0.00510733, -0.11902368, -0.03422134, -0.00074141,
0.02469103, -0.08633909, -0.00487503, 0.00313326, 0.03130132,
-0.00756799, -0.09661252, 0.04985106, 0.01110038, -0.0254607 ,
0.0017376 , 0.01130168, -0.05056147, 0.16722782, 0.00277896,
0.11068128, -0.00414049, -0.33462666, 0.0250716 , 0.44450508,
0.13689589, -0.1373213 , -0.01468817, -0.07279611, 0.0601332 ,
-0.1086719 , -0.10858592, -0.05833797, 0.07717707, -0.12419186,
-0.09329828, 0.13223833, 0.03631919, -0.06444136, 0.00906957,
-0.00439056, -0.04548441, 0.17891649, 0.03655766, -0.03796676])), (1.0118982712943438, arra
y([ 0.00694201, -0.0018722 , -0.00281566, 0.01770142, -0.0496463 ,
0.02999209, -0.00662784, -0.00043536, -0.00665007, -0.04269924,
0.01364174, 0.0757743 , -0.08204845, -0.03454817, 0.116922 ,
-0.07129044, -0.08191329, -0.15735237, 0.02548282, 0.28429731,
-0.17313292, 0.00367994, 0.01073114, -0.12907741, 0.05887494,
-0.09817751, 0.04700288, -0.04577495, 0.10941022, -0.01155122,
-0.01163151, -0.03089187, -0.18285806, -0.0729333 , 0.15773669,
0.25080006, -0.00395445, 0.17083143, -0.00613639, -0.07731857,
-0.16149356, 0.00252433, 0.04088276, -0.09682253, -0.06168423,
0.02003816, -0.00543857, 0.15537622, -0.00710586, -0.02816829,
-0.00369263, 0.15853635, -0.02918925, 0.0329961 , -0.05282505,
0.0276843 , 0.03692497, -0.0533023 , -0.00063083, 0.01558268,
0.02200682, -0.03344913, 0.06746109, -0.0555776 , 0.01370926,
-0.11782188, -0.07465837, -0.33355448, -0.07237854, 0.27166404,
-0.16659519, -0.0669088 , -0.07566776, 0.051166 , 0.26762813,
-0.06626345, 0.10173895, -0.15699645, -0.11477982, 0.02515217,
-0.21230598, 0.16731697, 0.12222344, 0.04058821, 0.02654391,
-0.14220437, -0.00807149, 0.07995315, 0.02476597, 0.07745563])), (1.0092805032957575, arra
y([-1.67124034e-04, -8.52995967e-03, 2.43243338e-03, -4.55422356e-03,
-4.47697903e-02, 4.11149142e-02, 4.35385290e-03, -8.08236351e-03,
-2.23270588e-02, 4.54276255e-02, 2.24447699e-02, 3.72905391e-02,
-4.58866899e-02, -4.43140590e-02, -3.17861262e-03, 1.63223223e-02,
-1.44613585e-02, -1.95174729e-03, 4.26371234e-02, 5.62827516e-02,
9.19523422e-02, 3.32038518e-03, -2.69015123e-02, 6.20466908e-02,
1.08340927e-01, -8.06021310e-03, -8.97465505e-03, 1.76064776e-02,
3.46254356e-02, -1.46048653e-01, 5.39475564e-02, -3.71279001e-02,
-1.13952238e-02, 2.91946391e-02, -4.12869437e-02, -2.42218191e-01,
-2.60603425e-02, 2.30730290e-01, -1.20611776e-02, -1.97548654e-02,
8.48527712e-02, 4.62356706e-02, 3.98832642e-02, 3.56493600e-02,
-1.63797314e-01, 1.28019247e-01, 5.54600893e-07, -2.04307386e-02,
5.26514477e-02, 3.96836522e-02, -3.91024288e-02, 4.98853407e-02,
-4.92069634e-03, 1.52943052e-02, -3.13390345e-02, -2.35876291e-02,
-4.92749417e-02, 2.72170140e-03, -8.76600831e-03, 1.43572974e-02,
-6.00816508e-03, 1.31966835e-02, 2.88326136e-02, 5.78179715e-02,
8.31240235e-03, -1.10802895e-01, -4.18330250e-01, 5.32012386e-01,
8.97901621e-02, 1.36166215e-01, -2.68026171e-02, -2.12739925e-01,
-1.58496722e-01, 7.91803374e-02, -5.45815088e-02, -5.92837221e-02,
6.01606163e-02, -1.03465752e-01, -9.80008579e-03, -4.96013262e-03,
-9.50121718e-02, 1.54394759e-01, 1.14727370e-01, 4.03985262e-02,
1.69140520e-02, -1.89851376e-01, 1.36970462e-01, -1.95413367e-01,
2.32617269e-02, -6.62692406e-02])), (1.0062251615105764, array([-0.00527282, -0.01688188, -0
.00468497, -0.00153288, 0.01518627,
-0.02070147, 0.01180818, -0.01694398, 0.00687768, -0.02536273,
0.02915883, -0.0167478 , 0.01175675, 0.00314778, -0.07007524,
0.01965608, 0.05609278, 0.0854933 , -0.30186512, 0.197778 ,
-0.0217644 , 0.05291083, -0.04169708, -0.32132285, 0.11409975,
0.04440943, -0.14455571, 0.03647878, -0.01699231, 0.07447113,
-0.00450323, -0.00360366, -0.01549301, -0.02353203, -0.0314328 ,
0.0648022 , 0.01905337, 0.16487256, 0.01712573, 0.02184085,
0.014108 , -0.04253202, 0.02446466, -0.06003886, 0.00268156,
0.05985967, -0.01170449, -0.06945482, -0.00672605, 0.007663 ,
-0.02848922, 0.05508542, -0.02811783, -0.05140486, 0.1162407 ,
-0.01263096, 0.0140447 , -0.0037905 , 0.010043 , -0.03004912,
0.02598565, 0.01165745, -0.00817854, -0.09190378, 0.024504 ,
0.06228976, 0.29323598, 0.11533381, -0.11551801, 0.29810874,
0.20107726, -0.17000041, 0.20522724, -0.05589527, -0.43514698,
-0.00324024, -0.10267147, -0.01583151, 0.04671075, -0.05257544,
-0.15347668, 0.09413035, -0.12768923, -0.01715487, -0.0009201 ,
-0.06927188, 0.16380177, 0.10717406, 0.00399982, 0.02845797])), (1.0055267813795965, arra
y([ 0.00554142, 0.00457078, -0.01396835, 0.03459154, -0.02500141,
-0.01279514, 0.03819387, 0.00537668, 0.01086488, -0.03214228,
-0.01963652, 0.04784108, -0.02883686, 0.03075782, -0.09864469,
0.02593862, 0.04063425, 0.12849444, 0.00061827, -0.37222894,
0.23468193, 0.0169766 , -0.01899214, -0.15736419, 0.03104597,
-0.08805389, 0.06859634, 0.02762659, 0.06709668, -0.02284565,
-0.05722747, -0.08353022, -0.03362415, 0.0023764 , 0.06809688,
0.01539331, -0.00184929, -0.00796171, -0.01225957, 0.00114309,
-0.04512352, 0.06240917, 0.00397843, 0.02635132, -0.07712403,
0.40446577, -0.00588481, -0.0807644 , 0.13050743, 0.02701916,
-0.04189661, -0.09141902, 0.01572212, 0.06521795, -0.10411642,
-0.02482952, 0.0830768 , -0.0319343 , -0.01365172, 0.00232728,
0.0491055 , -0.02366732, -0.04097945, -0.05273117, 0.02251404,
-0.08276872, -0.03561784, -0.23494425, -0.04202028, 0.13276423,
-0.22751433, 0.326788 , -0.11771476, 0.02225007, -0.22325178,
-0.01935321, 0.04904005, 0.13896845, 0.08632332, -0.0863012 ,
0.07668643, 0.03075034, 0.09717638, 0.07397294, 0.04806132,
-0.22366605, 0.19813719, 0.08417379, 0.04081818, -0.01479638])), (1.0041959740870794, arra
y([ 0.00822194, -0.02209017, 0.00179947, 0.01200197, -0.02659481,
-0.00764602, -0.01011892, -0.01927093, 0.00778091, -0.05763482,
-0.00875465, 0.04845326, -0.01277327, -0.02985779, -0.04904464,
0.04505036, 0.03809419, 0.02666303, -0.06238701, 0.1059027 ,
-0.24762375, 0.04243724, 0.01488936, -0.1477348 , -0.05574035,
-0.02966242, -0.02771328, 0.14853575, -0.01779685, 0.04024805,
-0.04920764, -0.03116516, -0.03367022, 0.14545037, 0.04058069,
-0.29717955, -0.00541922, 0.34204901, -0.03090687, -0.01240487,
0.05571747, 0.01055122, 0.03993054, 0.01027291, -0.02747238,
-0.24191898, -0.01100649, -0.16332822, 0.04035829, 0.12041442,
-0.03116228, 0.07143561, -0.01542356, 0.00947767, -0.01068069,
0.04649149, 0.06563955, -0.02545183, 0.00819118, -0.02578114,
0.02388872, -0.00179482, -0.00264004, 0.01443597, 0.02018351,
-0.00036709, 0.17338862, 0.02049706, -0.2299085 , -0.05434517,
-0.12314334, 0.36342366, -0.02171211, -0.07421302, 0.22051444,
0.02311083, -0.09116037, 0.04618421, 0.03368351, -0.12647943,
0.2145283 , 0.09262533, 0.02206381, 0.01459231, 0.00105971,
-0.01764596, 0.11883972, -0.33359707, 0.0272489 , 0.03302657])), (1.0019590929564808, arra
y([ 1.23686212e-02, -5.74858375e-03, 1.28667027e-02, 3.18889545e-04,
-1.11497815e-02, -7.92571051e-03, -1.29181317e-02, -3.05287058e-03,
-6.21565856e-03, 3.37501736e-02, -5.36172042e-02, 4.12248652e-02,
-2.63260342e-02, 2.97645063e-03, 6.91072991e-02, 8.48349472e-03,
-3.65125316e-02, -9.07608200e-02, -5.00614804e-02, 3.02119105e-01,
5.47601346e-01, 1.22451603e-03, -2.54262020e-02, -2.63624794e-01,
1.47799419e-01, -8.11331299e-03, -8.87887105e-02, -3.65799921e-02,
-2.21336504e-02, 2.27787137e-02, 5.50243628e-02, 1.89375610e-02,
1.02874548e-02, 8.69723865e-02, -5.44377833e-02, -9.23134139e-02,
-6.87575836e-03, 8.50482063e-02, -2.91144402e-02, -5.19158070e-04,
2.72972991e-02, 3.06226585e-02, 1.67418376e-02, -2.18970537e-02,
-4.24693396e-02, -2.24557075e-01, 1.05633996e-02, -1.17081779e-02,
-8.54985571e-02, -5.48092323e-02, 2.88630646e-02, 5.09284942e-02,
-8.70277980e-03, -4.38590457e-02, 6.95909530e-02, -1.31847092e-02,
-1.28531482e-02, 5.13641392e-02, -4.43967172e-02, 1.39497982e-02,
-2.62894441e-03, 4.11256283e-03, 1.50977741e-02, 1.02750976e-01,
5.59174440e-03, 2.15061961e-02, -3.03984631e-01, -1.64982643e-01,
-2.21324250e-02, -3.01645261e-01, 3.32622827e-02, 1.78737544e-01,
4.47817036e-02, -1.86339019e-02, -1.31331866e-02, 1.70534610e-02,
7.72245188e-03, 1.52724696e-01, -5.91228172e-02, -2.68464737e-02,
6.80235221e-03, 1.28991505e-01, -8.18517388e-02, -4.24656582e-02,
-1.03033094e-02, -2.74849429e-02, 5.25201610e-03, 2.37418675e-01,
9.48423952e-03, -5.23983741e-02])), (1.0005162465340969, array([ 3.04267717e-03, -1.29479763
e-02, -4.44458112e-03, 1.33631972e-02,
-9.33292211e-03, 1.54291205e-02, -4.73800601e-02, -1.16362669e-02,
3.04550598e-03, -3.17009338e-02, -3.74833233e-03, 3.23998573e-02,
5.86232357e-03, -7.50056174e-02, -1.70478255e-03, -9.50892730e-04,
8.94366331e-03, -2.46116099e-02, 4.01358936e-01, -1.37224653e-01,
-2.78080056e-02, 3.66459641e-02, 5.88338432e-02, -4.73700705e-01,
-7.82532804e-02, -3.20303058e-02, 6.81517973e-03, -4.04013209e-02,
3.08022972e-02, 1.42444360e-01, -4.70847004e-02, 3.52799485e-02,
-2.20051786e-01, 3.64519976e-01, -1.65562998e-01, 1.86948483e-01,
2.49965282e-02, 4.49718315e-04, -1.04893408e-01, -2.70135651e-02,
3.39406504e-02, -4.57270480e-02, -3.20001865e-02, -9.09485457e-02,
3.93327053e-02, 1.91371073e-01, -3.04620513e-03, -1.61414711e-02,
-1.02539124e-01, 1.02693088e-01, 4.55301940e-02, -5.51003928e-02,
3.75783992e-02, -1.67074257e-02, -2.24419941e-02, 4.82987324e-02,
5.09746900e-02, -2.54777753e-02, 6.10867595e-03, -7.68573197e-03,
-5.37906032e-02, 1.04949406e-01, -9.46918350e-03, 2.55699582e-02,
3.51676513e-03, 1.13913878e-01, -5.14912042e-02, 1.92816611e-01,
-2.01767844e-02, -1.29106051e-01, 5.41321727e-02, -8.14963631e-04,
9.72048527e-02, -2.91598197e-02, 6.02243306e-02, 4.79375887e-02,
3.04488822e-02, -2.10908861e-01, 1.83075881e-02, -1.12247174e-01,
-1.22270980e-01, -6.68043476e-02, 4.66913562e-02, -4.16616129e-02,
-3.00273118e-02, 9.55748449e-02, -7.54848923e-02, -2.59008996e-02,
2.91027290e-03, 4.74474999e-02])), (0.9936805751250474, array([-0.00764089, -0.00593278, 0
.02204877, -0.05256475, 0.00842695,
-0.01501379, 0.02522124, -0.00706479, 0.01803719, -0.02255416,
-0.02159876, -0.04619848, 0.0606624 , 0.06655174, -0.13064015,
0.04125599, 0.07308245, 0.14042212, -0.19026609, -0.08782066,
0.22890378, -0.02309647, 0.02434943, -0.18001576, -0.27168967,
0.16065591, 0.03658186, 0.08846365, -0.06826107, -0.06298364,
0.04706691, 0.03390692, -0.01736851, -0.09409377, 0.05434106,
0.12652597, -0.0080583 , 0.15550196, 0.03400027, 0.02964071,
-0.0185398 , 0.05566647, -0.07869831, 0.049827 , 0.15379865,
-0.15366605, -0.00384803, -0.12618657, 0.0222475 , 0.05088107,
-0.0265037 , -0.06968476, 0.04039596, -0.02097603, -0.0149297 ,
-0.06291175, 0.02400028, -0.0155054 , 0.03367925, -0.0381606 ,
0.03294177, -0.00982517, -0.04136651, -0.08259673, 0.01412005,
0.11410254, -0.23336836, -0.22037642, 0.33824349, 0.16398198,
-0.03704472, -0.17364007, 0.08558085, -0.01478625, 0.20608258,
0.05323019, 0.00586049, -0.08892334, 0.0996065 , -0.07159233,
-0.11680752, -0.08930236, 0.01112726, -0.05935329, -0.01780898,
0.14924434, 0.15771004, -0.2990459 , -0.06971069, -0.00463926])), (0.9932619924220245, arra
y([-4.80983034e-03, -8.33491749e-03, 1.43478570e-02, -3.39065901e-02,
8.83024216e-03, 7.12824218e-03, 4.75620170e-03, -8.78881996e-03,
1.12222071e-02, -5.61743165e-04, 6.82925646e-02, -5.90986511e-02,
1.59812288e-02, 2.54101713e-03, -1.04896277e-01, 3.51737986e-02,
6.67716613e-02, 1.29244810e-01, 2.98451160e-01, 7.20075288e-02,
1.32998042e-01, 8.42103921e-03, -5.60365919e-02, 2.59706507e-01,
1.64749395e-03, 1.48056990e-01, -8.53023881e-02, 4.35266088e-02,
-3.46030524e-02, -8.41578541e-02, 7.04725333e-02, 2.81031391e-02,
-6.20476632e-02, -1.09682870e-01, -3.49301258e-03, 2.10854352e-01,
5.52984391e-03, -2.87410233e-02, 3.55661568e-02, 4.25020090e-02,
9.28653091e-03, 9.60221857e-03, -1.80697323e-03, -4.10870665e-02,
2.65839708e-02, 1.96398351e-02, -1.76642116e-03, -5.26396067e-02,
-4.08825297e-05, -1.47565964e-02, -3.35712506e-02, 1.11012873e-01,
-2.84716492e-02, -8.03036754e-02, 1.45558330e-01, -9.02451607e-03,
1.73580715e-02, -5.84138883e-02, 7.40185841e-02, -4.79314092e-02,
-1.53805564e-02, 2.60951831e-02, 2.65729607e-02, -5.54298295e-02,
2.61356655e-03, -7.91097579e-02, -1.44960160e-01, 4.12691982e-02,
-2.39118825e-01, 2.14212567e-01, 1.71250062e-01, 4.78605901e-01,
1.71255685e-01, 6.99230903e-02, -9.17598680e-03, -9.36932301e-02,
-5.15256326e-02, -2.10854344e-01, 4.20936287e-02, 1.67529425e-01,
-8.86265705e-02, 1.31353716e-01, -9.90303711e-02, 2.10599781e-03,
-4.12338886e-02, 1.80605862e-01, 4.81498794e-02, -9.01894399e-02,
-3.02906660e-02, -9.57214123e-02])), (0.989390243013148, array([ 0.0027574 , 0.0061329 , -0.
02627846, 0.05095417, -0.00642473,
0.00568102, 0.02281071, 0.00631694, 0.00755118, -0.06691076,
0.06289873, 0.02585864, -0.03773079, -0.02254831, -0.16308598,
0.04149849, 0.08420201, 0.19496387, 0.39379716, 0.17782559,
0.11539291, 0.0197762 , -0.02419242, -0.05530074, 0.06499963,
-0.09218429, 0.01384134, 0.05434109, 0.07868315, 0.01245479,
-0.04835957, -0.06038021, -0.16823152, -0.02203941, 0.22528898,
0.04443168, 0.00703952, 0.03605971, -0.01016621, -0.03127999,
-0.11144527, -0.02134988, 0.04999845, -0.07390443, 0.00045527,
-0.29444954, -0.00354035, -0.11068436, 0.14758104, -0.02428335,
-0.01217597, -0.16673344, -0.03072165, 0.04867053, 0.02903683,
0.02513363, -0.04544139, 0.04575122, -0.04047531, 0.01634853,
0.04759149, -0.06510098, 0.02291705, -0.20123276, 0.00801083,
-0.26857335, 0.18454691, 0.05340024, 0.24677043, -0.10037036,
-0.09368322, -0.18139713, -0.02268909, -0.03296994, -0.1736654 ,
-0.070339 , 0.02794478, -0.01026421, 0.10866061, 0.16523264,
0.21315456, 0.01944449, 0.07190493, 0.07717728, 0.03061382,
0.07312778, -0.00948213, 0.03159338, 0.06226249, -0.01358497])), (0.9869860806882016, arra
y([-2.43715809e-02, -1.07128953e-02, -2.79427241e-02, 3.89261512e-03,
3.60937407e-02, 8.57047187e-03, -1.08440482e-02, -1.47852771e-02,
2.35617945e-02, -5.52860667e-02, 8.90308989e-02, -3.95984242e-02,
-9.60278839e-03, -1.10836591e-02, 6.15373627e-02, -8.70643248e-02,
-2.15993332e-02, -3.64849330e-02, -6.41388458e-02, -5.37994962e-02,
1.01884145e-01, 1.92638366e-02, -2.82527603e-02, -1.53857810e-01,
-1.07907271e-01, 6.36331411e-02, 9.89305735e-02, -1.33425549e-01,
2.25705919e-02, 1.88958506e-01, -1.01889068e-01, 7.94116198e-02,
-1.12291832e-01, -9.18573887e-02, 2.79692392e-01, -3.16952567e-01,
2.27300392e-03, 1.33228325e-01, 3.25689301e-02, -3.09505476e-02,
-5.68545611e-02, -2.02397441e-01, -6.35443881e-03, 6.98686485e-02,
8.01945161e-02, 9.92512023e-03, 1.27386606e-02, 2.51450191e-01,
-3.89108234e-02, -6.28715652e-02, 2.70404571e-02, -2.01667306e-02,
-3.59301358e-02, 1.05886343e-01, -1.00517967e-01, 5.56644863e-02,
-4.85317142e-02, 1.73674110e-03, 2.56392918e-02, -2.74969992e-02,
3.33287649e-02, -3.81662601e-02, 1.12145451e-02, -2.12696445e-01,
6.24958136e-03, 5.91309092e-04, -1.93620042e-01, 1.22308284e-01,
-8.50767120e-02, 2.07259697e-01, 2.06397007e-01, 1.49853779e-01,
-3.93533862e-02, -6.78712515e-03, -1.85078872e-02, 2.14694222e-03,
-1.44830363e-01, 8.07264254e-03, -3.51936040e-02, 1.82751221e-01,
9.46179819e-02, -3.26090474e-01, 6.50092847e-02, 3.40593128e-02,
4.38808117e-05, 1.04720971e-01, -1.48980575e-01, 8.79513418e-02,
4.67517524e-03, 1.49568000e-01])), (0.9828875619572957, array([ 0.01595755, -0.01989033, 0
.04337725, -0.04633459, 0.02648134,
-0.04697169, -0.0191114 , -0.015711 , 0.01704849, -0.01540329,
-0.06637341, 0.02710851, 0.04839863, -0.10418241, 0.16225215,
-0.01776017, -0.09851717, -0.18995589, 0.29041107, -0.06236597,
0.22727756, 0.00658669, -0.01409841, -0.14199334, -0.0466045 ,
0.13775554, -0.13467361, 0.13090866, -0.19777716, 0.10239725,
-0.00367906, -0.02341806, 0.30707283, -0.21449997, -0.08920904,
0.02326865, 0.06790668, 0.11017155, 0.06531824, 0.06059489,
0.01953891, 0.04596282, 0.09166934, -0.16995796, -0.01987271,
0.16604913, 0.03892869, 0.02906565, -0.04459147, -0.01199517,
0.03124182, -0.06696273, -0.02876974, 0.02837694, 0.02627329,
0.02119666, -0.11566776, 0.11415444, -0.01093605, -0.05076346,
0.02776756, -0.04904545, 0.0584223 , 0.11361133, 0.01445449,
-0.17196192, 0.24766112, -0.00709194, 0.07229663, 0.18348122,
0.05605101, -0.05606634, 0.01176013, -0.08243724, 0.17129387,
0.08871317, -0.12892466, 0.10690118, -0.15630681, 0.12557654,
0.12195274, 0.0135422 , 0.0194999 , -0.05461132, 0.01772277,
-0.16588352, -0.04162553, -0.18084166, -0.03946636, 0.00890658])), (0.9812541978117575, arra
y([-1.58853556e-03, -2.77244697e-04, 1.30495900e-02, -2.57070737e-02,
-2.38716681e-02, 1.76918901e-03, -1.73442425e-02, -5.66356465e-04,
-2.92004497e-02, 1.10917867e-01, -6.27488840e-02, -1.78243115e-02,
2.15038257e-02, 6.41694035e-02, -1.85885073e-02, -3.34623753e-02,
4.91507234e-03, 1.85513793e-03, 1.99405167e-01, -4.64871501e-01,
-1.81224305e-01, 3.78104386e-02, -1.92021780e-03, -1.00559003e-01,
5.91510263e-02, 1.22049864e-01, -1.62150573e-01, -3.66253270e-02,
3.51101906e-03, -5.65836709e-02, 1.34142652e-01, -1.98874144e-02,
9.32473244e-03, -1.65789870e-01, 3.90059070e-02, 8.92804163e-02,
-7.03717091e-02, 1.86738236e-01, 2.48520537e-02, -8.63464230e-04,
-3.74401876e-02, -7.57339151e-03, -2.88428766e-02, 1.22443436e-01,
2.08410599e-02, -3.60085882e-01, 1.84958173e-02, 4.52949299e-02,
8.45531569e-02, -6.45286314e-02, -9.57236707e-03, 9.57283766e-02,
4.89745997e-02, -3.24114570e-02, -6.36907782e-02, -1.65291836e-02,
-8.62304468e-03, -1.83196203e-03, -3.28560945e-02, 4.24690729e-02,
2.89588908e-02, -4.45793674e-02, -3.55242576e-02, -2.97437416e-03,
-3.18511701e-03, -7.89019068e-02, -1.62655658e-01, -9.17582531e-02,
-2.82301267e-01, -1.33934778e-01, 2.30107909e-01, -1.45985964e-01,
8.49375746e-03, -1.38999842e-01, -1.13966250e-01, 1.39946369e-01,
1.02468432e-01, 6.37333277e-02, 3.10513608e-02, 1.30701912e-02,
-5.85314632e-02, 9.30604487e-02, 2.89469803e-02, 2.54664458e-02,
1.47083627e-02, -1.94198651e-01, 2.29775840e-02, 4.11200067e-02,
-2.41273580e-02, 9.62378425e-02])), (0.9698086941074693, array([-0.01071378, 0.00486143, -0
.00196125, -0.01630911, -0.02259151,
0.01407496, 0.00847955, 0.00253303, -0.0309036 , 0.0140293 ,
0.03300172, 0.01524013, -0.05273072, -0.04567607, 0.23391132,
-0.07848098, -0.12769542, -0.21407682, 0.07116596, -0.03609987,
-0.10502449, 0.04005191, 0.05065142, 0.17250256, -0.05749271,
0.03372211, -0.13167271, 0.08150494, -0.04870796, 0.09539604,
0.05417698, 0.05531372, -0.17647055, 0.11591562, -0.06368605,
-0.09531928, 0.03512564, 0.09801424, -0.02272997, -0.03956482,
0.0484233 , 0.06916443, 0.01819234, -0.04563155, -0.08722159,
-0.07701787, -0.02717661, 0.26806801, 0.00531044, 0.04114546,
-0.09148475, -0.42609818, 0.08854235, -0.09061132, 0.12185133,
-0.04883606, 0.13125812, -0.09962217, 0.04209341, -0.00978047,
0.00611318, 0.01726034, -0.02390289, -0.15085736, 0.00502519,
0.05113877, -0.09804292, -0.19753755, 0.08767721, -0.01680878,
-0.09742722, 0.0012008 , 0.14237304, 0.16035753, -0.31354836,
0.09154538, -0.13866588, -0.00058064, -0.14211276, 0.03044497,
0.00991763, 0.05382957, -0.02379609, 0.01326394, 0.00805447,
0.04422099, 0.19718486, -0.13166665, -0.00919805, -0.05891948])), (0.9529273395960557, arra
y([ 1.01680239e-02, -1.88980519e-02, 2.99406811e-02, -3.35410551e-02,
2.60871214e-02, 1.87005328e-04, -4.50241781e-03, -1.58850270e-02,
1.78052925e-02, 3.46440672e-02, 3.00147401e-04, -8.67904552e-03,
2.08497368e-02, -4.51580137e-02, 4.06369697e-02, -2.56313711e-02,
-2.62486361e-02, 6.18857441e-02, -1.22065618e-01, 2.46349337e-02,
1.09344651e-01, 1.82070441e-01, 7.71152985e-03, 1.22815065e-02,
-1.33315954e-01, -6.85951998e-03, 1.24658808e-01, -6.65173562e-02,
1.06037135e-02, -6.65605576e-02, 8.48243247e-02, -1.20348568e-01,
6.09639452e-02, 8.70517651e-02, -3.45482804e-02, 2.23037440e-01,
-3.14108802e-02, 1.42127354e-02, -1.78074199e-02, 8.72644635e-03,
-6.32716232e-02, -3.20996837e-02, -1.54585621e-02, 6.23968567e-02,
6.59357152e-02, -1.16186844e-01, -1.81810459e-02, 1.96731998e-01,
4.51921052e-02, -1.51162617e-01, -2.85625471e-02, -3.77574702e-01,
-5.11642449e-02, 1.12291336e-01, 2.85677301e-02, 1.15832788e-01,
-2.29778840e-01, 7.90781789e-02, 1.51605150e-02, -2.54163196e-02,
-5.69090347e-02, 1.06857705e-01, -4.99492608e-02, 1.83663474e-01,
-1.06629346e-03, -6.12071463e-02, -7.51621854e-03, 1.39727334e-01,
-3.61690448e-01, 1.05352784e-01, -1.24486885e-01, -8.92559677e-02,
6.51931348e-02, -1.23212333e-01, 1.09853846e-01, -1.25294871e-01,
1.05331636e-02, -1.14362626e-01, -2.24907114e-03, -1.45530433e-01,
2.37937775e-01, -2.62807130e-02, 8.51306367e-02, 2.80731422e-02,
1.80940813e-02, 5.39997622e-02, 2.34065029e-01, 1.21637193e-01,
-4.51543427e-02, -1.09308849e-01])), (0.949801385112714, array([ 0.026247 , -0.00047457, 0.
01750291, 0.01778684, 0.02567284,
-0.04074609, 0.01524609, 0.00458663, 0.0247263 , -0.08198198,
0.03632847, -0.04948209, -0.0359562 , 0.14245157, 0.05160305,
-0.03531067, -0.02601146, 0.05392896, -0.08556124, -0.04018908,
-0.02069455, -0.01690763, -0.01912764, -0.04071749, -0.08931593,
0.0863003 , 0.00435261, 0.01343055, 0.04213723, -0.26182683,
0.12130055, 0.03108943, 0.02772986, 0.27414521, -0.17068434,
-0.04992821, 0.05622691, -0.08057179, -0.06151518, -0.10373174,
0.06792272, 0.13515434, 0.07802657, -0.37489407, 0.02056473,
-0.09680857, 0.00834453, 0.1153062 , 0.14089884, -0.18118251,
0.01663615, 0.06270688, -0.06002869, 0.12750963, -0.12121124,
-0.05682705, 0.13333067, -0.04704703, -0.01074914, 0.01353065,
0.04337889, -0.06263388, 0.00747122, -0.14375484, 0.00436615,
-0.2299231 , 0.04541104, -0.02688675, 0.13416425, 0.05508127,
0.37018287, 0.0655413 , -0.00761955, 0.08851763, 0.1076094 ,
-0.03038559, -0.00648776, -0.07013712, -0.03279395, -0.17809777,
0.10159005, -0.10103883, -0.04794133, 0.20119432, 0.03467796,
-0.1360069 , 0.00991476, 0.06485029, 0.01914473, -0.11337748])), (0.9442733887767248, arra
y([-0.00331595, 0.01580543, -0.01863078, 0.02641719, -0.01756419,
0.00863267, -0.05255544, 0.01428063, -0.03098377, 0.04713682,
-0.01223649, -0.0234114 , 0.01673431, 0.04137828, -0.04917881,
0.02671128, 0.03935362, -0.02912825, -0.15252448, -0.16400054,
0.0269342 , 0.01335285, 0.0356738 , -0.06873641, 0.03472405,
-0.08950344, -0.01980065, 0.07594835, 0.0620901 , -0.34044703,
0.22865382, 0.07859782, -0.07444977, -0.1105835 , 0.18634094,
-0.09233799, 0.00439686, 0.12022529, 0.00175177, -0.03072499,
-0.07645899, 0.05477025, 0.03649804, -0.21479675, -0.08708492,
0.15125139, 0.04209627, 0.00873448, -0.30818838, 0.252356 ,
0.07504972, -0.12498248, 0.011856 , -0.03461129, 0.07415387,
0.08630299, -0.09882293, -0.0011802 , 0.02215717, -0.01518408,
0.03271371, -0.03960954, -0.00258443, 0.0248585 , 0.00954393,
0.06160335, 0.02599234, -0.10098387, -0.06614956, -0.08452039,
0.09361263, -0.07846221, 0.06104535, 0.08054712, -0.04221732,
-0.12005225, 0.01158872, -0.22989874, 0.04004718, 0.02458765,
0.2916282 , 0.17946123, 0.03770972, -0.12429102, -0.01816835,
0.07699743, -0.30798999, -0.01072195, 0.02391488, -0.03918125])), (0.9416943150113966, arra
y([ 3.61742058e-03, -1.94654621e-02, -1.86145277e-02, 3.91588971e-02,
1.38525227e-02, 4.44739728e-03, 9.45754087e-04, -1.76770213e-02,
2.02461721e-02, -1.61417044e-01, 6.29136317e-02, 5.17455870e-02,
-2.53843405e-02, -1.11295932e-01, 1.30763281e-01, -3.64336748e-02,
-8.15317828e-02, -8.43985908e-02, -1.27475240e-01, -2.00239008e-01,
1.98557819e-01, -7.27339155e-02, -1.16408967e-02, 1.64224903e-01,
-1.59373307e-02, -1.04195288e-01, -2.05862256e-02, 1.60821247e-01,
3.37202388e-03, 1.16542567e-01, 1.55131159e-01, -1.04539639e-01,
-2.05624726e-01, 3.19145268e-02, -2.09991272e-01, 1.19720329e-01,
-7.45497298e-02, 1.99388651e-01, -4.70761420e-02, 1.88768117e-02,
9.38880282e-02, -9.86437441e-02, 6.49199923e-02, 1.32814904e-01,
-9.76731607e-02, -1.04217562e-01, 2.45565669e-02, 3.85863854e-03,
-2.37514855e-02, 2.17459287e-02, 1.12076101e-02, 2.49809124e-01,
-3.18286457e-02, 2.87709220e-02, -6.44097575e-02, 1.09883045e-01,
-2.07807208e-01, 3.93896455e-02, 5.11527104e-02, -2.75265015e-02,
-2.81618559e-02, 4.95628121e-02, -1.27479090e-01, 1.18502408e-01,
-2.04002732e-02, -4.26501972e-02, 1.92286785e-01, -7.34691849e-02,
1.73694750e-01, 1.13387762e-04, 1.12026592e-01, 6.61066412e-02,
-2.07059118e-01, -6.88492163e-02, -1.43136942e-01, -8.95962379e-02,
-2.53968635e-02, -1.38401224e-01, -2.66401891e-02, -3.33911002e-02,
-5.20087744e-02, 1.54766613e-02, 1.63449322e-01, 1.96457648e-01,
4.19831632e-03, 2.46611286e-01, -8.13378520e-03, 5.43401026e-04,
7.06501601e-02, 8.24856486e-02])), (0.9337938560734337, array([-2.78666149e-04, 2.86153266
e-02, -4.40329619e-02, 7.63570923e-02,
3.35212441e-02, 1.20383512e-02, -1.94893707e-02, 2.69527911e-02,
-5.20199285e-03, 6.19190179e-02, 1.12686446e-02, 2.49743679e-03,
-7.47851523e-04, -6.50827399e-02, 6.04982626e-02, 3.50763726e-02,
-2.70478021e-02, -9.41447301e-02, -4.79984129e-02, 7.06010461e-02,
-1.57413633e-01, 1.78383556e-02, 4.43158321e-02, -1.81734767e-01,
3.99219004e-03, -1.22994888e-01, 4.80848553e-02, -1.08445254e-01,
1.47732168e-01, -1.72394451e-02, 1.07269968e-01, -2.70397426e-02,
-2.00465373e-02, -2.29905336e-01, -3.09686448e-01, 1.01529457e-01,
-2.86786596e-02, 4.32508641e-02, 2.90138653e-02, 1.79205933e-01,
2.04568838e-01, -5.35395804e-02, -8.76425041e-02, 2.18448161e-01,
6.14465619e-02, -8.95877060e-02, 5.63716438e-02, -1.66880716e-01,
3.88398200e-02, 6.64525839e-02, 1.02053052e-01, -2.19362265e-01,
-3.47961229e-02, 7.50291476e-02, 4.47602226e-03, -9.31343903e-02,
2.14805378e-01, -4.51143042e-02, -2.67282618e-02, 6.72241207e-03,
-1.36953903e-02, 6.76453603e-02, -3.86191907e-02, -9.30012005e-03,
1.40203816e-02, -1.22670598e-01, -1.20398887e-01, 1.60151086e-02,
1.16043517e-01, 2.22964993e-01, 2.96600305e-03, 1.32948538e-01,
3.60556747e-03, -1.11277075e-01, -5.62721267e-02, -6.35223718e-02,
1.17300309e-01, 5.81474467e-02, -8.29515328e-02, 5.69520136e-02,
1.58439491e-01, -1.20199487e-02, -1.07114984e-01, 1.06543666e-01,
2.92226824e-02, -8.56989836e-03, -3.23945528e-01, -1.02605111e-02,
6.87594084e-02, -2.13386880e-01])), (0.931041145818125, array([-0.00585469, 0.01812109, 0.
00792955, -0.02462631, -0.01451196,
0.02830446, -0.00879729, 0.01597911, -0.03214228, -0.02254815,
0.01669926, -0.02767143, -0.00122349, 0.04274492, 0.0346502 ,
-0.00490572, -0.03843456, -0.11066957, -0.02813814, -0.03271021,
0.14880528, 0.06438382, 0.01505054, -0.04131551, -0.01495056,
-0.0418485 , 0.04085931, 0.0137256 , -0.01961479, -0.05199247,
0.02304338, 0.34772615, -0.13803342, -0.13670866, -0.21230403,
-0.09764183, 0.04755794, -0.22810261, 0.04591375, 0.02773834,
0.20986703, -0.00934168, -0.00424333, -0.04100463, -0.00221949,
-0.05967108, 0.01744279, -0.23425454, 0.25326177, 0.02664948,
-0.0048028 , 0.0316391 , 0.01544314, -0.01943449, -0.00194862,
0.17549254, -0.25550851, 0.00912207, 0.06729272, -0.02334901,
-0.06605862, 0.08964901, 0.06782998, -0.46031615, -0.01514615,
0.03390418, 0.04328068, -0.02500638, -0.26223213, -0.02173067,
-0.15071761, -0.11753421, 0.00625882, 0.06036338, 0.0669714 ,
-0.01291813, -0.05330965, -0.02390962, -0.0820911 , -0.04149348,
-0.0594575 , 0.04543746, 0.15142232, -0.00184985, 0.00565584,
-0.07747515, -0.04673792, -0.03077443, -0.02645897, 0.02890448])), (0.9160124331703235, arra
y([-0.04512716, -0.0385672 , -0.0206561 , -0.04705618, -0.04212048,
-0.01466519, -0.01829115, -0.04505448, 0.01255338, -0.04392882,
0.12015178, -0.03515198, -0.08976007, 0.08473334, 0.0292259 ,
-0.0184341 , -0.03226551, 0.0674858 , -0.02706551, -0.11339415,
-0.01596481, 0.04625842, -0.11063261, 0.06854895, 0.03595167,
0.10528534, 0.12790198, -0.10999625, -0.00719051, 0.08049302,
0.02848249, -0.19192094, -0.00162064, -0.01302792, -0.11972141,
-0.12932582, 0.0461804 , 0.16420206, 0.02892848, 0.14381952,
0.13948706, -0.05823622, 0.10181279, -0.3449726 , -0.04151092,
-0.0870399 , 0.01424011, -0.26918623, 0.06644839, 0.05707586,
-0.04335126, -0.05749606, -0.02655391, -0.01128123, 0.08300746,
0.01291072, 0.0271534 , -0.04632249, -0.03392848, 0.03656581,
0.05312645, 0.03488761, -0.20652097, 0.20327968, 0.03210534,
0.03886553, -0.02695497, -0.04171325, -0.03426775, -0.00089415,
-0.21073789, -0.04495686, 0.19303697, 0.16995306, 0.06502183,
-0.02814482, 0.10027648, -0.1487149 , -0.01703184, 0.27576032,
-0.05856994, -0.2091439 , -0.05783323, 0.0506891 , -0.0645663 ,
-0.09810442, -0.01194532, 0.20861665, -0.05955877, 0.23499399])), (0.9017616031502895, arra
y([-0.01204613, 0.03157169, 0.00173901, -0.02522276, -0.02255628,
0.08860315, 0.00054061, 0.02748577, -0.03401337, -0.15531234,
0.11397098, -0.00209653, -0.10203533, 0.1247588 , -0.06968457,
0.01105041, 0.07674368, -0.11018815, -0.01025301, 0.03199334,
0.09125823, 0.05356348, 0.03618184, -0.15826721, -0.10560225,
0.01025809, 0.03069366, 0.02691185, 0.08197771, 0.00534303,
0.056395 , -0.36746288, 0.0181807 , -0.02410655, 0.01469366,
-0.08056341, 0.03822262, -0.25860032, -0.0505407 , -0.04810135,
-0.00116654, 0.05863885, -0.08239288, 0.0771162 , 0.03592515,
-0.09057108, 0.00783212, 0.02333246, 0.09206801, 0.16894537,
-0.00259428, 0.00753512, 0.06255998, -0.03827773, -0.0486697 ,
0.16976284, -0.03187038, -0.14708599, 0.06945362, 0.03166831,
0.02937489, -0.24219907, 0.36128429, 0.25402849, -0.01963118,
0.04604779, 0.03255906, -0.05974847, -0.16830735, -0.01446133,
0.04296761, -0.069433 , -0.01778528, 0.14103144, -0.13537139,
-0.02075788, -0.02933113, -0.03012077, -0.1153027 , 0.12267673,
-0.12071696, -0.09378854, -0.01146882, 0.1814344 , 0.06288938,
-0.0125343 , -0.05389026, -0.21635202, -0.03533361, -0.13628257])), (0.890644523647056, array
([ 9.98843162e-03, 1.83753484e-02, -2.15618355e-02, 5.60384562e-02,
1.90754629e-03, -4.93206812e-02, 3.44008833e-02, 1.92579847e-02,
5.36319810e-03, 1.61861064e-01, -1.27813948e-01, -7.75819723e-02,
2.87565785e-02, 2.07566454e-01, 2.24892437e-02, 1.65387794e-02,
-6.30551458e-02, 5.50758222e-02, 8.31593820e-02, 1.60433647e-01,
-3.29652843e-02, -2.71413980e-03, -5.05083252e-03, -1.54274608e-02,
-2.22873819e-02, 3.65170615e-02, 2.51583619e-02, 5.74703905e-02,
1.47873828e-01, -1.60874936e-01, -3.53703355e-01, 4.66269914e-02,
-5.77386648e-02, 5.34731395e-02, -6.99392009e-02, -5.59667683e-03,
1.50569071e-02, 1.55346729e-01, 5.42385274e-02, 2.98574646e-01,
6.16475449e-02, 1.38453606e-01, -1.04225708e-01, 7.80865747e-02,
2.68362391e-02, 1.41388217e-01, -6.26077147e-05, -8.67876826e-02,
-1.64909818e-01, -1.17812824e-01, -3.00302056e-02, -2.76871244e-02,
1.37548277e-01, -3.01084262e-02, -1.82363579e-01, -4.33397819e-02,
-1.35010942e-01, 7.45199010e-02, 1.43697804e-04, -3.62370319e-03,
8.65683739e-02, -1.18813821e-01, -1.84208722e-01, -1.43973498e-03,
-1.65915130e-02, 8.35012256e-02, 4.85575694e-02, -1.41136110e-01,
-1.23841638e-01, -1.27825175e-01, 1.58779130e-01, -7.77355540e-02,
-1.07927145e-01, -9.25315807e-03, -4.69461310e-02, -8.77350934e-02,
-1.55584716e-02, -8.70564735e-02, -2.14612552e-01, 2.16155162e-01,
1.32878204e-02, -1.04393498e-01, 7.12127204e-02, 1.61393672e-01,
2.70368961e-02, 4.29979717e-02, 1.53933935e-01, -5.87435978e-02,
5.96815311e-02, -1.17995448e-01])), (0.8885676555317741, array([ 0.00274796, -0.02361445, 0
.05116237, -0.0842552 , -0.07213576,
0.00093838, -0.01098897, -0.02175948, -0.00658835, -0.15644416,
0.21212145, -0.08688107, -0.12772555, 0.19739457, 0.00065999,
-0.05611503, -0.09542822, -0.0790884 , 0.07532689, 0.04622811,
0.0199169 , 0.07703434, -0.12052577, 0.0575971 , 0.04508036,
0.1558777 , -0.00821302, 0.18890517, -0.03258032, -0.193194 ,
-0.25381167, 0.10040089, 0.02806504, 0.02635801, -0.03138048,
0.06964833, 0.02135525, 0.03181669, 0.13739875, -0.08807277,
0.02874702, -0.04365259, 0.11838292, 0.27886171, -0.20779585,
0.08171919, -0.00790738, -0.00744908, -0.0837399 , 0.20445817,
-0.02006053, -0.01835591, -0.07266937, -0.02869563, 0.17928846,
0.05813758, 0.0441099 , -0.07759448, -0.03301373, 0.08159966,
0.07785581, -0.16431664, 0.01670631, -0.0194537 , -0.01016957,
-0.06294145, -0.07168093, -0.00171829, 0.01843777, 0.00627979,
-0.04178049, -0.05313962, 0.17234462, -0.27315857, 0.04703748,
0.05148856, 0.05858007, 0.01267136, 0.0298947 , -0.26037273,
0.07794181, -0.12729389, -0.03197504, 0.21431478, 0.01807618,
0.02617634, -0.08542077, 0.11938408, -0.06302607, 0.11919231])), (0.8791588942184467, arra
y([-0.03117837, -0.01399989, -0.03340375, 0.00088924, 0.02028349,
-0.01951033, -0.03937196, -0.01919307, -0.02840681, 0.14604114,
-0.04894041, 0.00851742, 0.04925042, -0.10984964, -0.00350001,
0.05979627, 0.05638347, 0.05399111, -0.06529045, -0.07689977,
-0.1253334 , 0.04593203, -0.18111414, -0.11470011, 0.18498936,
0.15281119, 0.16061652, 0.03052356, -0.31618665, -0.00304363,
0.20344699, 0.11546958, -0.02264744, 0.10675221, 0.00771112,
0.05565993, -0.06029402, -0.07399017, -0.06291728, 0.01991941,
-0.03703263, -0.13888537, 0.06051865, -0.07519849, -0.16860843,
0.01692649, -0.02738467, -0.15388506, -0.15641782, -0.12933231,
-0.03753313, -0.08508395, -0.00065314, 0.04269448, -0.0422484 ,
-0.11940561, 0.03233509, 0.19252331, -0.12759107, 0.01688613,
0.05361439, -0.20145027, 0.26238912, -0.0368179 , -0.00501027,
-0.03522915, -0.08001625, -0.02695852, -0.07996612, 0.03635574,
-0.12670346, -0.03504763, -0.12295178, 0.02712415, -0.00411092,
-0.0300398 , -0.05640326, 0.10016602, -0.13261971, 0.04293086,
-0.05410893, 0.01850406, -0.05688299, 0.32093398, -0.04501955,
0.29384958, 0.01334824, -0.05077861, 0.02474069, 0.03883143])), (0.8629753367296573, arra
y([ 0.01166394, -0.04751393, 0.02226712, -0.01738976, -0.0372354 ,
0.00137357, -0.03142243, -0.04260486, 0.01422279, -0.1007145 ,
0.04963574, 0.04530055, -0.02641501, -0.04232224, -0.11693779,
-0.02743814, 0.15277544, 0.18836382, -0.03718525, -0.06302018,
0.02320708, -0.00930566, 0.04126506, 0.01878739, -0.04468988,
-0.06594267, -0.05484857, 0.04120591, 0.02869672, -0.01306566,
0.05889018, 0.05443091, -0.03439045, 0.02260685, 0.08644482,
-0.02284593, -0.18626504, -0.12175179, -0.12165186, 0.50096173,
-0.06641931, -0.01516339, 0.10534416, 0.01814603, -0.12160945,
-0.03934602, -0.0093016 , 0.14292649, 0.05077062, 0.00787034,
-0.01718144, 0.03468688, -0.03991159, 0.00653545, 0.04821644,
0.00913398, 0.06714187, -0.05698063, 0.0077161 , 0.0090317 ,
-0.03761309, 0.02592563, 0.14953846, -0.03722617, 0.00620756,
0.11938813, 0.00961706, 0.05972683, 0.13493806, -0.01354354,
0.01276413, 0.04573275, 0.08990901, -0.19106456, 0.04890457,
0.03858561, 0.05644118, -0.09668513, -0.548203 , -0.01880534,
0.07071005, 0.06407696, -0.00113584, -0.07058048, -0.02843993,
-0.13582121, 0.05628283, 0.03813165, -0.01339753, 0.13068332])), (0.8317611004894274, arra
y([ 0.00545156, -0.07008116, -0.03053225, 0.06334239, -0.00939906,
-0.05711216, 0.02420308, -0.06509471, 0.06767608, 0.16041921,
-0.06848737, 0.01600285, -0.01452638, 0.05829487, -0.00285141,
-0.0621527 , -0.03677003, -0.15963981, -0.03683498, -0.07947628,
-0.0100988 , -0.03931464, 0.10738339, -0.05091965, -0.08705204,
-0.1497019 , -0.1584345 , -0.05739527, 0.24238019, -0.00924365,
-0.01982462, 0.02188455, 0.02368897, -0.07832417, -0.02152104,
-0.02948498, 0.08748718, -0.13049489, 0.02790162, -0.20781542,
0.08399357, -0.12619883, 0.08092237, 0.03826677, -0.01464393,
0.00819209, -0.09482382, -0.14170098, -0.141553 , -0.26439855,
-0.08143666, -0.04212307, -0.09987588, 0.04378873, 0.11845842,
-0.10995148, -0.0611874 , -0.03675344, 0.10142608, -0.04646941,
-0.07001699, -0.03058676, 0.28216562, 0.02546009, -0.02403687,
0.12041728, -0.01333376, 0.00929337, 0.14548328, -0.04950643,
0.01893052, 0.06961991, 0.08997331, -0.16177626, 0.05114716,
-0.13466767, 0.02926301, -0.18830137, 0.10727718, 0.17320939,
0.11706974, 0.06495608, -0.08982689, 0.17787126, -0.04179447,
-0.10593689, 0.23824305, -0.06316331, 0.06327437, 0.28536944])), (0.8087658281973401, arra
y([-1.38037599e-02, -2.62538836e-02, -1.50064891e-02, 7.73353884e-04,
5.58215569e-03, 3.85305662e-02, 4.85734662e-02, -2.74253271e-02,
8.12131793e-03, -3.85170607e-02, -7.26463929e-03, 4.13665719e-02,
8.12804341e-03, -7.43367589e-02, 2.48403179e-02, 1.01314548e-01,
-5.67692097e-01, 3.38188062e-01, -1.75549813e-02, -3.81677187e-03,
1.21591794e-02, -1.34692259e-02, 8.01078060e-02, -2.71126200e-02,
-5.01367953e-02, -5.56049468e-02, -6.84804047e-03, -8.00401890e-02,
-1.29583624e-02, 7.48112337e-02, 5.96831837e-02, 3.49400558e-02,
3.59546531e-02, 1.17052541e-01, -3.22895551e-02, -1.20091688e-02,
-3.68001001e-01, -3.27377391e-02, 5.25519139e-01, -5.54732492e-02,
-9.28767847e-02, 2.24409625e-02, 1.66298363e-02, -4.90522909e-02,
-4.74724762e-02, -2.35151595e-02, 2.44117022e-02, -3.14640906e-02,
-2.60737436e-02, 3.45476294e-02, 1.28535207e-02, -1.72745096e-02,
1.94282952e-02, -3.27232993e-03, -2.34476222e-02, 5.73397163e-02,
9.32157164e-03, -7.73506024e-02, 2.56921031e-02, 2.98558779e-02,
-4.61180858e-03, -6.72791909e-02, 1.15526687e-01, -1.48512807e-02,
-1.64962985e-02, 6.39246379e-02, 1.09245648e-02, -4.17016748e-03,
-1.24036206e-02, -1.39811565e-02, 1.60576603e-02, 1.25968178e-02,
-3.82114539e-03, -2.84256479e-02, -1.49265660e-02, 2.17853452e-04,
-3.03888764e-04, -3.57882575e-02, 1.10530819e-02, 7.29136841e-02,
-2.73872115e-02, 2.33068073e-02, -2.51894381e-02, 4.08449419e-03,
1.14798768e-02, -6.14097124e-02, -4.28706108e-02, -4.81625998e-02,
1.97217871e-02, -5.97897486e-02])), (0.7892722212229611, array([ 0.01759243, 0.0244195 , -0
.01348355, 0.05594109, -0.02822828,
-0.03114862, -0.01025114, 0.02642081, -0.00898686, -0.42732225,
0.05497189, 0.09793574, -0.00615247, -0.19597648, 0.10631042,
0.27251081, 0.14753006, -0.00906053, -0.0057597 , -0.00494552,
0.05043376, 0.12632815, 0.11116636, 0.13092532, -0.06112112,
-0.09693479, -0.05446578, -0.14006325, 0.00753102, 0.02019177,
0.00181738, 0.21916386, 0.17045392, -0.02379436, -0.12168054,
0.00553675, -0.00742759, 0.07757025, -0.19196648, -0.05025151,
-0.231689 , 0.14095046, -0.02819515, -0.0134607 , -0.0707351 ,
0.00763897, -0.02964745, -0.16497223, -0.19827882, -0.18079722,
-0.0066776 , -0.05190613, 0.05535967, -0.00839952, -0.06733092,
0.25211957, 0.17666072, -0.13722392, -0.04466507, 0.09203293,
0.05474654, -0.09144626, -0.07607113, -0.06254102, -0.01293667,
0.0762188 , -0.07111797, -0.01021933, -0.10496458, 0.01394139,
0.05186032, -0.09578208, 0.00189994, -0.11034241, -0.0020095 ,
0.00307013, 0.04406047, 0.02847866, 0.08419341, 0.14161595,
-0.00614269, -0.04934289, -0.02857221, 0.07786195, -0.02494885,
-0.03030585, 0.04267903, -0.03400083, 0.06690435, 0.00336231])), (0.7619517528659735, arra
y([ 0.02689377, -0.05957791, -0.02589526, 0.09474024, -0.02102001,
0.03012182, -0.04118114, -0.05106918, 0.04332153, -0.29746521,
0.04675402, 0.06112399, -0.0113335 , 0.02307008, -0.06525496,
-0.27287067, -0.09734835, -0.0244697 , 0.00126079, -0.00036962,
-0.00616904, 0.22725556, -0.12359879, 0.05529858, 0.0652219 ,
0.06904022, 0.031982 , 0.08157967, 0.01682568, -0.01334902,
0.02608926, -0.14114573, -0.20349955, -0.08650184, 0.10832692,
0.00046514, 0.01815474, -0.09422992, 0.11832343, -0.03575713,
0.28709648, 0.24770688, -0.08384164, -0.01241468, -0.05904321,
0.01440831, -0.05734269, -0.09360715, -0.15191822, -0.17191821,
0.00443259, 0.02042583, 0.10592989, -0.01977457, -0.16017916,
0.00170861, 0.1962971 , 0.20708513, -0.15433359, -0.09634164,
-0.04421257, 0.22916174, 0.1700131 , 0.03785226, 0.1072855 ,
0.05827851, -0.01864558, -0.00840599, -0.03722675, 0.05147844,
0.07801918, -0.0726303 , 0.02782703, -0.07946549, 0.02506575,
0.09877713, 0.03717162, 0.07143827, 0.02336953, 0.02438579,
0.10413921, 0.00061904, 0.07713089, -0.18325313, -0.12507461,
0.06179492, 0.05677923, 0.00796665, 0.12786405, -0.01064299])), (0.754916612665571, array
([ 3.44821659e-02, 1.26829308e-01, 5.43327399e-03, 5.40251208e-02,
-6.31772239e-03, -7.31096931e-02, 1.00194762e-01, 1.26311627e-01,
-2.83309870e-02, -1.38370377e-01, -2.42896274e-02, 3.44449527e-03,
3.27200264e-02, 3.32972459e-02, 3.54149565e-02, -2.48360446e-01,
-1.19435469e-02, 1.12217435e-01, 2.54212469e-02, -3.23663707e-02,
3.06979595e-02, 1.09563881e-02, 6.97932703e-02, 4.88264866e-02,
2.38292722e-02, -1.34668990e-01, -1.80846307e-01, 8.10677045e-03,
1.07050838e-01, -2.81961937e-02, -4.41994556e-02, -4.31431671e-03,
1.40805904e-01, 9.43767166e-02, 9.03656398e-02, 3.75429265e-02,
-1.45878411e-01, 4.79768652e-02, 6.34142899e-03, 8.05501059e-02,
2.01874747e-01, -8.59169435e-02, -1.31662987e-01, -5.88583553e-02,
3.81207580e-01, 6.76882097e-02, -6.34175310e-02, -1.04675300e-01,
-1.41017598e-01, -5.40828206e-02, -2.52485663e-02, -4.05080992e-02,
-1.23833066e-01, 5.85499444e-02, 1.34698798e-01, 1.27605442e-01,
5.56295706e-02, -4.32121795e-02, 2.69422352e-02, -1.30918221e-02,
3.10371534e-02, -7.44338840e-02, -1.96521751e-02, 1.76668316e-02,
-1.26032960e-02, -2.26440295e-01, -1.03492450e-01, -5.83729646e-05,
-8.81297653e-02, 1.30297831e-02, -5.17166653e-02, -1.00603415e-01,
-4.24944869e-02, 2.26723469e-01, -2.88255891e-02, -3.69593029e-02,
-4.74195219e-02, 1.69271688e-01, -1.43065221e-01, -1.81965117e-01,
-3.44930629e-02, 1.20781113e-01, -6.03238202e-02, 4.87405557e-02,
6.82943198e-02, 1.60096454e-01, -6.43351316e-02, -7.49687169e-02,
-4.76429693e-03, 3.09960604e-01])), (0.7309008552453276, array([-3.12576183e-02, -3.00353136
e-02, -1.15434524e-02, -3.74192404e-02,
1.21194607e-02, 3.54390679e-02, -5.92220714e-02, -3.43419973e-02,
-1.62954981e-02, 3.61814212e-02, -4.15691648e-02, 1.62172077e-02,
3.11683856e-02, -9.99366833e-02, 2.28516084e-01, -4.59537346e-01,
1.41672965e-01, 1.77388278e-01, -2.70709877e-03, 3.68925400e-02,
4.19564801e-03, -1.19786356e-01, 9.82757133e-02, -2.37302746e-02,
-1.15006306e-02, -1.82167491e-02, 3.53555482e-02, -8.99522902e-02,
-1.05099269e-01, 1.40408070e-02, -3.63399133e-02, 9.96778586e-02,
1.44669392e-01, 1.55519585e-02, 2.29991872e-01, 1.48939095e-01,
-2.33116872e-01, 2.86760961e-02, -1.95486408e-01, -1.15693709e-01,
4.00805322e-01, 2.27205418e-02, 5.11989738e-02, -1.45252118e-02,
-1.52042970e-01, -1.81434799e-02, 8.15185742e-02, -4.65860405e-02,
4.41958474e-02, 8.88374795e-02, 1.01053499e-02, -2.93229234e-02,
3.75101120e-02, -2.43764479e-02, -1.78384967e-02, 1.00186522e-01,
-4.09938703e-02, -9.78378753e-02, 1.45384629e-02, 8.39263450e-02,
5.42777656e-02, -1.67621893e-01, -3.27909821e-02, -2.52681884e-02,
-4.20184070e-02, 1.35099267e-01, 3.70030499e-02, -2.57640682e-04,
2.09275366e-02, -1.31613049e-02, 1.01878028e-02, 3.99945612e-02,
-6.81091768e-03, -9.42781981e-02, 9.02301120e-03, -2.31518100e-02,
1.72141119e-02, -5.61493349e-02, 1.20112902e-01, 1.12640843e-01,
-1.63524514e-02, -3.38260736e-02, -3.09316007e-03, 4.91087253e-02,
-2.52284664e-02, -9.83192805e-02, 2.23107503e-02, 1.35065309e-02,
-2.10648854e-02, -2.23614183e-01])), (0.6902086442370304, array([-2.48394712e-02, -6.54456851
e-02, -6.04496784e-02, 5.97797123e-02,
1.04453000e-02, 4.43533564e-02, -2.34111646e-02, -6.65301057e-02,
1.68492557e-02, 1.19272280e-01, 1.81350517e-03, 6.97093156e-03,
2.12508055e-02, -1.60698027e-02, -1.93840577e-01, -1.31447806e-02,
-7.15273312e-02, -1.20965888e-01, 6.88178481e-03, -6.60825012e-04,
-1.85468037e-02, -5.97973938e-01, 9.73324139e-02, -1.66199196e-02,
8.01983100e-02, 2.45483333e-02, 3.08441845e-02, -4.48242011e-02,
-9.00496982e-02, -5.94900724e-02, 6.40474671e-02, -2.86411959e-02,
-3.14888366e-02, 6.02455867e-02, 4.18503179e-02, 1.80504114e-02,
2.18824714e-01, 8.52678194e-03, 1.26993359e-01, 8.53242763e-02,
2.12018084e-02, 7.16294977e-02, -8.95454956e-02, 6.51731177e-03,
-1.56303498e-02, -1.06026249e-04, -3.25688168e-02, 6.19167864e-04,
4.47910476e-02, -3.85713194e-02, 3.36407794e-02, -7.61597641e-02,
7.70892141e-03, -9.53573105e-03, 2.38271484e-02, 5.23502403e-01,
9.54240031e-02, -6.46799446e-02, -7.69163962e-02, 3.55080562e-02,
4.00248106e-02, 2.41773244e-02, -2.64112052e-02, 1.34953270e-02,
4.29121784e-02, 3.65390693e-03, -1.79335175e-02, 3.32345142e-02,
-2.99637256e-03, 9.53443834e-02, 4.16274493e-02, -1.70963714e-02,
2.96411797e-02, -3.01885058e-02, 4.34496481e-02, 6.73465056e-02,
1.56924088e-02, 1.05070539e-01, -2.55381324e-02, 2.46675269e-03,
5.41649939e-02, 3.12106893e-02, 9.12954266e-03, 8.48837089e-02,
-1.92830666e-01, 7.03777360e-02, 1.22657643e-01, 6.04117042e-02,
6.30270009e-02, 3.83421834e-02])), (0.6604534042250566, array([ 0.03493521, 0.00433818, 0
.00421347, 0.05698903, -0.0472373 ,
0.08277977, -0.04077489, 0.01079527, 0.04588983, 0.22098083,
0.20979844, -0.09531005, -0.20616871, -0.00620906, 0.38560093,
0.15308173, 0.12878625, 0.21170295, 0.01391932, 0.00681338,
-0.02000289, -0.21287615, -0.12193815, -0.05404867, 0.0496134 ,
0.09394178, 0.02479103, 0.11795467, 0.14014343, -0.02948085,
-0.12000781, -0.05256568, -0.06624699, -0.14857877, -0.17420537,
-0.08197003, -0.38631173, -0.04199609, -0.18837499, -0.1385434 ,
-0.10178701, 0.17064317, -0.10241735, -0.01967789, 0.11111213,
0.02463147, -0.00974989, 0.00286693, -0.04670847, 0.01330558,
-0.02012631, -0.01593084, 0.01209422, -0.02779087, 0.02695335,
0.08297272, -0.04745558, 0.06196023, -0.0145512 , -0.06916519,
-0.0275976 , 0.11834801, 0.15227742, 0.01652194, 0.06305596,
-0.02901072, 0.01837811, -0.01971193, 0.00842294, 0.04959294,
0.02020987, -0.00417164, 0.06980409, 0.03091795, 0.03161785,
0.09845365, 0.02155261, 0.07654164, 0.03378785, -0.02283066,
0.0701262 , -0.00271604, 0.07094519, -0.02398482, -0.15269574,
0.04210218, 0.04026346, 0.01763493, 0.02093315, 0.11314927])), (0.6482152267621847, arra
y([ 0.04976821, 0.10658763, 0.0956541 , -0.07532322, -0.05847024,
-0.05523187, 0.07456164, 0.11014023, -0.05021298, 0.06017025,
-0.0058527 , 0.02331819, -0.03819477, 0.01330611, -0.00425335,
-0.02067504, 0.00599168, 0.00552276, -0.01696796, -0.00386224,
-0.02471487, -0.21103374, -0.09226116, 0.019868 , -0.01843372,
0.10301034, 0.0947492 , 0.05973184, 0.00468052, 0.06264263,
-0.03086377, -0.07067192, -0.1095315 , -0.1011189 , -0.03080424,
-0.03838276, -0.05651225, -0.03669107, -0.0314335 , -0.02534076,
0.01485672, -0.23178877, 0.20503302, -0.03629108, -0.11232838,
-0.00681464, -0.11182121, -0.16201526, -0.22460152, -0.17242559,
0.05171118, -0.07330419, -0.00841712, 0.04114159, -0.03167827,
0.18503452, 0.05815386, -0.12078235, -0.07831803, 0.08114397,
0.00129601, 0.12729861, 0.06880663, -0.01546559, 0.07662265,
-0.15732585, -0.01931486, -0.07002065, -0.04065681, -0.0829991 ,
0.05445492, -0.00582084, -0.05749134, -0.08301152, -0.05502221,
-0.08062502, 0.00127886, -0.09811222, -0.0351346 , -0.09271583,
-0.07466982, -0.14215307, -0.07893433, -0.32281736, 0.48346038,
-0.04718825, 0.07673822, -0.0884037 , -0.05632293, -0.13510046])), (0.5714116669727295, arra
y([ 1.59600051e-02, 4.21114418e-02, 1.28461418e-03, 2.71529821e-02,
3.42196303e-02, 1.54971693e-01, 1.19960622e-01, 4.28047863e-02,
2.80131156e-02, 2.61894305e-01, 3.98224856e-01, 2.22430745e-02,
-2.12989146e-01, -2.37918870e-01, -1.51039961e-01, -7.93097234e-02,
-3.08705345e-02, -5.13930226e-02, -6.11104656e-03, -4.42445262e-03,
8.73600832e-03, 1.14425648e-01, 9.10932453e-02, -2.46844763e-02,
-4.11581153e-02, -4.09992595e-02, 2.33406505e-02, -8.65906300e-02,
-8.25769878e-02, 2.64222833e-04, 7.82106040e-02, 2.64207688e-02,
4.98407339e-02, 4.63747920e-02, 6.02988198e-02, 3.19919850e-02,
9.85422558e-02, 2.08581457e-02, 2.67063403e-02, 5.80787317e-02,
4.37215388e-02, 2.30841366e-01, -6.49068804e-02, -9.32392711e-03,
-3.29560958e-02, -6.30860877e-03, -1.62708094e-01, -9.22755011e-02,
-1.48081897e-01, -1.26421154e-01, 5.96946527e-02, 5.95035353e-02,
-2.80467760e-02, 8.64978493e-03, 2.13743205e-02, -2.22829559e-01,
-2.65768161e-01, -3.39638410e-01, 1.08326126e-02, 2.26939806e-01,
9.86318206e-02, 4.80225690e-03, -3.33788664e-02, -9.39573658e-03,
8.31612558e-02, -6.48915304e-02, 5.78423406e-02, -1.50498485e-02,
-4.49476406e-03, 1.04890517e-02, 2.80642735e-02, 3.19174967e-02,
1.41741391e-02, 6.45255334e-04, -1.34223998e-02, 1.02578208e-01,
2.84638364e-02, 9.67061410e-02, -4.95645113e-02, -6.37995458e-02,
-6.47332744e-03, -9.21554205e-02, 1.06503605e-01, -6.52962072e-02,
-2.19855513e-02, 4.75944087e-02, -5.60596598e-03, -3.71788394e-02,
9.41067553e-02, 2.11100251e-02])), (0.42504944989687043, array([-3.47767932e-02, 7.5845150
9e-02, -4.23429705e-02, 9.86716501e-03,
1.59940061e-01, -2.70717662e-01, 1.77383051e-01, 6.49096231e-02,
5.12056845e-02, 7.02072691e-02, 1.30791907e-01, 2.40911060e-03,
-6.06508944e-02, -7.60844667e-02, -6.15267202e-02, -1.58919873e-02,
-1.34459445e-02, -1.77913733e-02, -6.43477985e-03, -4.10655156e-03,
-9.60136167e-03, -8.54788290e-02, -1.57451652e-01, -3.58176329e-02,
-6.13724475e-02, -1.69618627e-02, -1.66075688e-02, 4.99590425e-03,
1.02490121e-01, 7.75399505e-02, 5.23546073e-02, 3.21318129e-02,
6.12294011e-02, 4.85299700e-02, 4.39752350e-02, 3.26550708e-02,
6.30899239e-02, -1.77241962e-03, -1.34126947e-03, 6.32237248e-03,
1.14247306e-02, 2.16675054e-01, 1.34977708e-01, 5.60382056e-02,
-2.14084863e-02, -4.14393775e-03, 5.55328275e-01, 3.27885606e-04,
-6.98772398e-03, -5.20532257e-02, -5.58848615e-01, -7.48481022e-03,
-3.12871441e-02, 2.89332523e-02, 1.22703678e-02, 4.22965525e-02,
1.80475928e-02, 2.41221529e-02, -1.97257540e-02, -2.97198520e-02,
1.06193706e-02, 4.19201509e-02, 3.43960546e-02, 1.42779862e-03,
3.66200453e-02, 4.72563016e-02, -2.81442245e-02, 1.92407077e-03,
-5.40808902e-02, -2.68162723e-02, -1.76657479e-02, -2.53211115e-02,
-1.78627652e-02, -2.81419084e-02, -2.40262612e-03, -2.32095273e-02,
-4.49570115e-03, -2.05890542e-02, -1.64679051e-04, -2.21925687e-02,
-3.75932025e-02, -1.62296398e-02, -5.76901005e-03, -4.50529220e-02,
6.19451479e-02, 2.40252178e-02, -1.40397374e-01, -2.13950623e-02,
7.27210216e-02, 6.17761287e-03])), (0.3978166894517955, array([-0.11005611, 0.05075236, -0
.09624746, -0.0346809 , 0.24849045,
-0.26368587, 0.28140826, 0.02678829, 0.06076458, 0.04763941,
0.11089526, -0.01547567, -0.03883996, -0.05911073, -0.02447412,
-0.03007646, -0.00050185, 0.00710358, -0.00213599, -0.00313048,
-0.01511068, -0.07451991, -0.29919188, -0.01310891, -0.09356173,
-0.01210994, -0.03275207, 0.04013982, 0.15592938, 0.132279 ,
0.08999955, 0.10598052, 0.10504277, 0.0389563 , 0.04564958,
0.02487026, 0.01432714, 0.01865657, -0.01893375, 0.00605514,
0.01792934, 0.28197809, 0.19678471, 0.07381823, -0.01360361,
-0.00726783, -0.38029588, 0.04512801, 0.09609548, 0.10470117,
0.38468441, -0.02207943, -0.01634265, 0.01960711, 0.00246179,
0.05325301, 0.08705457, 0.17774632, 0.04495314, -0.13924086,
-0.03897458, -0.05755354, -0.02677295, 0.00089107, -0.06797897,
0.06020379, -0.04883762, -0.00645087, -0.07838135, -0.04969979,
-0.04826861, -0.0359796 , -0.01252585, -0.01454904, -0.01312105,
-0.0546026 , -0.01316386, -0.07149409, 0.0161508 , 0.05153606,
-0.05670376, 0.01740911, -0.01657489, 0.04728431, 0.02907354,
-0.02996158, -0.04353655, -0.00926233, 0.01796126, 0.01685783])), (0.35238333407115874, arr
ay([ 2.72172870e-01, -5.71486439e-02, 2.58216217e-01, 5.05159254e-02,
-2.04062845e-01, 4.30256785e-01, 1.45763747e-01, -1.72881194e-03,
1.51499040e-01, 6.40095103e-02, 1.82131063e-01, 9.17543931e-02,
-1.26643385e-01, -1.76507321e-01, -1.16503634e-01, -4.72962966e-02,
-3.04675728e-02, -5.48197726e-02, -3.20158781e-02, -1.50510244e-02,
-1.22736041e-02, -3.80745058e-02, -6.50252743e-02, 1.23130614e-02,
-2.72930734e-04, 9.95479012e-03, 1.29080008e-02, 1.73148004e-02,
-4.74740301e-03, 2.57731002e-03, 1.92073566e-02, 2.77597161e-02,
3.68733793e-02, 2.46648624e-02, 1.48229781e-02, 1.21683841e-02,
2.71664328e-02, 8.47512627e-03, -8.89655041e-03, 7.07393823e-03,
2.64322939e-03, -6.02993028e-02, 8.05093040e-02, 9.54459142e-03,
1.44879125e-01, 7.48095238e-03, 4.13573441e-02, -6.52609644e-02,
-4.99949244e-02, -5.31302586e-02, -8.08456803e-02, -2.23092074e-02,
3.28519179e-02, -1.07182823e-02, -4.06721339e-02, 6.01641342e-02,
1.35307018e-01, 3.11889795e-01, 1.75319796e-01, -2.24787076e-01,
-1.42653467e-01, -2.41317120e-01, -1.45056322e-01, -3.60470006e-02,
-2.73935664e-01, 3.35921547e-02, -1.05881773e-02, -2.23898076e-02,
-8.04181260e-03, -4.15624181e-02, 1.37560250e-02, 1.81325894e-05,
4.01312448e-02, 1.56110706e-02, -1.81586891e-02, 6.98493209e-03,
2.21580621e-02, -5.27720713e-02, -1.73222731e-02, -5.76305220e-03,
3.77462940e-03, -1.69856197e-02, -1.57448527e-02, -7.42518745e-03,
3.44003188e-02, -1.07208876e-01, -3.22301219e-02, 3.40912034e-03,
-1.51647621e-02, -6.78213851e-02])), (0.337764060513702, array([-0.10267333, -0.29443803, -0.
0915461 , -0.02929046, -0.01193299,
0.13856508, 0.76303677, -0.2975754 , 0.17926268, -0.05497305,
-0.15320546, -0.03482298, 0.08583347, 0.10016258, 0.09579815,
0.03742155, 0.05589383, 0.01959031, 0.00804244, 0.00642888,
0.00549271, 0.02367011, 0.16449441, 0.00371798, 0.03479047,
0.03756304, 0.02415189, -0.0281378 , -0.09282584, -0.05453758,
-0.05188437, -0.05920363, -0.0772278 , -0.02662847, -0.03024676,
-0.01297158, -0.02741065, -0.0048173 , -0.09995176, -0.01941201,
-0.0087679 , -0.04618292, 0.01019038, -0.01331143, -0.05073854,
0.00250184, 0.01289621, -0.00670389, -0.01413956, -0.00291049,
-0.02149748, -0.02609368, 0.0018284 , -0.00596198, 0.01653413,
-0.00368353, -0.02215825, -0.04566043, -0.03991345, 0.04523418,
0.01758591, 0.0493502 , 0.02356299, 0.00854587, 0.04467478,
-0.03361031, -0.00296489, 0.00296015, -0.03464237, 0.0162329 ,
0.0092337 , -0.01099873, 0.01092503, -0.00199202, -0.00341072,
0.03307768, 0.0134801 , 0.03392747, -0.02049286, -0.02969398,
-0.00254682, 0.02844248, 0.01158971, -0.02635988, -0.00291389,
0.02692371, -0.12320224, -0.01920371, 0.01894774, 0.01093196])), (0.24537240875649707, arr
ay([ 0.33346055, 0.05539091, 0.28223949, 0.12145879, -0.07080231,
-0.54679222, 0.05434262, 0.11623824, 0.25216066, 0.04192859,
0.03877752, 0.02991654, -0.03311103, -0.04946977, -0.0402435 ,
-0.02243829, -0.01467914, -0.00918881, -0.0232997 , -0.01064298,
0.01051628, 0.05725223, 0.33682475, -0.00087466, 0.08132144,
0.09169357, 0.05327385, -0.08259453, -0.2000579 , -0.09646241,
-0.10150003, -0.12684152, -0.17657528, -0.06797636, -0.04195422,
-0.03898997, -0.07239586, -0.02050143, -0.02096775, -0.02422455,
-0.00828482, -0.00810232, 0.25424483, 0.02165218, 0.09415622,
0.00784175, -0.04497648, 0.03142562, 0.0495527 , 0.04873873,
0.07287478, 0.00271661, 0.01906564, -0.01831662, -0.00271585,
0.00710061, -0.00067786, -0.01882985, 0.00452397, 0.01880303,
-0.00790012, -0.01824323, 0.00445034, -0.00215321, -0.01528222,
0.06074812, -0.00679641, 0.02165929, -0.02416152, 0.01245209,
-0.00502446, -0.01888459, 0.01762397, 0.02308425, 0.00868448,
0.03326563, 0.02750681, 0.04028162, 0.01820977, 0.0451108 ,
-0.01384362, 0.03892947, 0.01754462, 0.05322128, -0.13580873,
0.02668102, -0.02897502, -0.00206738, 0.0692191 , -0.04783141])), (0.2105885520158932, arra
y([-3.10827511e-02, 1.13819233e-01, -2.59825677e-02, -1.18901436e-02,
6.81701205e-01, 3.06343714e-01, -1.26543027e-01, 1.01456929e-01,
9.27070651e-02, -1.32567215e-02, -4.05979079e-02, -1.17258673e-02,
2.21670119e-02, 2.38461476e-02, 4.04458947e-02, 2.00715548e-02,
1.69954253e-02, 1.89284684e-02, 5.34251756e-03, 5.52036179e-03,
7.37334954e-03, 3.87231506e-02, 2.99418215e-01, -4.13107676e-03,
8.31918236e-02, 8.00694576e-02, 4.75864449e-02, -5.59973043e-02,
-1.92155780e-01, -1.10033864e-01, -9.15510655e-02, -1.07284028e-01,
-1.41548109e-01, -4.97663771e-02, -3.94930698e-02, -2.29170359e-02,
-3.44465298e-02, -4.57614518e-03, -1.30644373e-02, -7.07281585e-03,
-1.16763094e-02, 1.65801688e-01, 8.82661150e-02, 3.80122928e-02,
-3.37033427e-02, -6.39047241e-04, 2.76063573e-03, -1.30101566e-02,
1.93777474e-03, -2.38012823e-03, -2.63197535e-02, 3.26343809e-03,
-9.28626432e-02, 3.89161237e-02, 9.71798149e-02, 2.76026138e-02,
7.80342774e-02, 9.49966701e-02, 2.93796276e-02, -6.03171537e-02,
-4.85341294e-02, -4.54767224e-02, -2.95147400e-02, -4.82907511e-03,
-7.11207984e-02, -1.90243245e-02, -2.03019810e-02, -1.67590552e-02,
-1.26001114e-02, -3.31330276e-02, -2.24885219e-02, 4.54737692e-03,
-2.60866996e-02, -3.48038740e-02, -1.82847898e-02, -6.46713273e-02,
-2.47287306e-02, -5.94244019e-02, -4.63910905e-03, -2.27185408e-02,
-2.22768472e-02, -5.12580769e-02, -6.51606223e-02, -4.44153155e-02,
1.35305829e-01, -5.71161658e-02, 1.20859100e-02, -2.49796205e-02,
1.07341383e-01, 1.86479045e-01])), (0.1947049465927752, array([-3.13869905e-01, 1.80486031
e-01, -2.65715965e-01, -1.14222059e-01,
-3.14658867e-01, 2.07415000e-01, -7.25748995e-02, 1.10132718e-01,
2.73851128e-02, -2.79589005e-02, -6.61198858e-02, -2.94104068e-02,
4.76871738e-02, 5.84940338e-02, 2.88044602e-02, 2.15043113e-02,
7.68030166e-03, -5.12020248e-03, -5.74688081e-03, -2.59236090e-04,
-7.45518168e-04, 6.28118296e-03, 5.26188188e-02, -1.97042248e-03,
-3.47192829e-03, 4.47694529e-02, 4.15371618e-02, -6.65222015e-02,
-6.05610881e-02, 6.49012625e-03, -7.36300556e-04, -2.45104237e-02,
-8.83109239e-03, 5.97309951e-03, -5.23943333e-03, 1.80728351e-03,
8.64929209e-03, -2.23466435e-03, 6.52532347e-03, 6.87686189e-03,
3.35876540e-03, 8.63268838e-02, 6.05110803e-01, 1.02871165e-01,
3.29692003e-01, 4.14976067e-02, -1.15587468e-02, -8.60571370e-03,
-1.93268840e-02, -1.58919638e-02, -1.88075817e-02, -4.20552528e-03,
1.36702370e-02, -8.55654201e-03, -1.25536852e-02, -1.46180593e-02,
-2.38149310e-02, -2.39224795e-02, -2.00389712e-04, 4.77005636e-03,
1.56076319e-02, 1.42342996e-02, 8.27201823e-03, 5.16918033e-04,
2.24203268e-02, 1.39543962e-02, 1.51098959e-03, 9.13492250e-03,
-6.92233410e-05, 5.79228782e-04, 2.28272273e-02, -1.40926361e-03,
3.42544640e-02, 1.98555635e-03, 1.44637639e-03, 4.26271148e-02,
2.96759892e-02, 2.97914223e-02, -1.21144395e-02, -5.41792673e-03,
4.83805949e-03, 6.05804192e-02, 4.77400799e-02, 1.42403755e-02,
-7.40234918e-02, 3.17133754e-02, 1.40296135e-02, 7.85101481e-03,
2.74875888e-01, -5.90879502e-02])), (0.12332392942132495, array([ 4.53435101e-02, -2.7803492
2e-03, -3.13445963e-01, 6.30711864e-01,
1.27697456e-02, 6.64868949e-02, -5.75087081e-03, 6.07346566e-03,
-2.61678153e-02, -9.93067403e-03, -7.91752836e-04, 1.70733274e-02,
-2.73025573e-03, -2.26305080e-02, -8.96305492e-03, -3.60634540e-04,
-8.49617743e-03, -9.80700990e-03, -2.24852279e-04, 4.09108102e-03,
2.05637292e-03, 9.24051832e-03, 2.42592699e-02, -3.64164001e-03,
1.68472969e-02, 2.54693542e-02, 5.05075770e-03, 5.16173513e-03,
-4.83468454e-02, -1.21900774e-02, -9.11780503e-03, -5.05512123e-03,
-7.41232540e-03, 3.91398550e-03, 3.06965071e-03, -5.90932528e-03,
1.12445860e-03, 4.66719907e-03, 7.76292483e-03, -4.68010209e-04,
4.57085333e-03, 6.98203998e-02, 2.23256968e-01, 6.03967088e-02,
5.41465239e-02, 6.50298816e-03, 1.54786976e-03, -6.12458026e-03,
-1.21478016e-02, -2.36115428e-02, -1.44891907e-02, 1.00069594e-02,
1.28721599e-02, -2.86915970e-03, -2.29801640e-02, -1.27419711e-02,
-1.79027671e-02, -2.41362676e-02, -7.78849976e-03, 1.25177575e-02,
1.22658786e-02, 2.01564609e-02, 6.24882932e-03, -2.94930421e-03,
2.19246690e-02, -1.21455484e-02, -3.64289500e-03, -2.47392662e-03,
-5.05322234e-03, -7.04776467e-03, 2.60680790e-03, 1.63317314e-03,
1.57057505e-03, -6.94289571e-03, -9.13210988e-03, -5.14828300e-03,
-2.05609675e-03, -1.55565554e-02, -1.56562532e-03, -6.52763569e-03,
-3.96162744e-03, -5.70459097e-04, 7.76754226e-04, -1.51831525e-03,
9.68083322e-04, 6.81558268e-04, -1.87228840e-03, -2.19265410e-03,
-6.51541156e-01, -1.35251471e-02])), (0.07848642552981931, array([ 1.84188359e-01, -3.0587017
1e-01, 1.49198471e-01, 7.87807951e-02,
1.22599538e-01, -4.03112767e-02, 3.34042236e-02, -2.53340913e-01,
-7.77881450e-01, 4.92023239e-03, 3.09791110e-02, 5.48255151e-03,
-2.82090547e-02, -1.27406253e-02, 1.15325868e-03, 4.43431341e-03,
6.29308236e-03, 1.45105664e-03, -1.82329674e-03, -7.60767616e-04,
-8.41740602e-04, 5.47909104e-03, 9.09192366e-03, -1.39850991e-03,
2.32559995e-02, 2.59952546e-02, 2.72665779e-02, -6.95731563e-03,
-3.79661683e-02, -3.82250001e-03, -7.30620065e-03, -8.43124545e-03,
-2.49057213e-02, -1.13018454e-02, -8.81137961e-03, -7.00195233e-03,
-1.75154527e-02, -4.03319983e-03, -6.31246022e-03, -3.38662606e-03,
-1.90484877e-03, 8.86981906e-02, 2.13060832e-01, 6.32707375e-02,
2.35421226e-01, 2.89764159e-02, 1.67968346e-03, -3.84168000e-03,
-3.03666587e-03, -1.83109292e-03, -8.57071972e-04, -4.31219406e-03,
-6.20497375e-03, 6.97048660e-03, 1.14633859e-02, -2.87600954e-04,
-3.78304190e-03, -1.95330813e-02, -5.22629574e-03, 2.04417140e-02,
4.52053030e-03, -7.15957141e-03, -2.63864754e-03, 2.18279169e-03,
2.14227840e-05, -5.32141382e-03, -5.85467921e-03, -1.59644921e-02,
2.70014938e-03, -1.34069282e-02, -1.25762248e-02, 2.80874041e-03,
-3.06062185e-02, 3.51151122e-04, -6.19744668e-03, -3.97887545e-02,
-2.64247400e-02, -3.69463581e-02, 7.20491728e-03, 3.56915543e-03,
-6.49136832e-03, -2.98564240e-02, -3.13811801e-02, -1.69712136e-02,
-3.22039697e-02, -2.11954676e-02, -2.95472599e-03, 3.49282285e-04,
1.61455654e-01, 1.24901710e-02])), (0.051763400830042, array([-1.88697768e-02, 1.99153916e
-02, -1.50992854e-02, -8.39538161e-03,
-6.13134826e-02, 1.27543697e-02, -3.74091591e-03, 1.51760514e-02,
6.56315244e-02, 1.01737069e-03, 8.89451723e-03, -1.99027424e-03,
-5.68317612e-03, -2.06171266e-04, -1.40521699e-03, -1.39625680e-03,
7.59116202e-04, -5.96010680e-04, 4.60939872e-04, 1.45203641e-04,
-3.86390842e-03, -3.24537563e-04, 3.86384430e-04, 7.11164706e-04,
-2.66644623e-03, -3.15822443e-03, -2.77303177e-03, 1.65647745e-03,
-2.20855060e-03, 1.16677604e-03, -1.06345034e-03, 4.03579179e-03,
6.48850641e-03, 3.26805997e-03, 2.17878795e-03, 1.10405224e-03,
1.41902792e-03, -1.79237047e-05, -2.92586910e-03, -4.04844594e-03,
6.73236709e-04, -8.25972214e-03, -1.64017263e-02, -4.90970826e-03,
-2.56431608e-03, -2.54648914e-04, 1.61465231e-03, 3.32298891e-03,
3.68933548e-04, 9.76030252e-05, -3.00978027e-04, 2.25115748e-03,
8.40532033e-03, -6.25251233e-03, -1.82606624e-03, -1.01219485e-03,
-1.22428377e-02, -1.11400675e-02, -5.78214933e-03, 8.73957810e-03,
4.98077086e-03, 9.74562202e-03, 4.18968632e-03, -2.00213912e-04,
1.00137419e-02, -2.75230220e-01, -5.60595261e-02, -1.12764402e-01,
-5.78889079e-02, -9.77460257e-02, -1.01178443e-01, -4.26695112e-02,
-2.24173542e-01, -1.99158429e-01, -1.34496154e-01, -2.61734132e-01,
-2.48290120e-01, -1.79693938e-01, -4.12659042e-02, -1.21238110e-01,
-9.89963288e-02, -2.26076602e-01, -3.01991152e-01, -2.02651620e-01,
-5.99178341e-01, -1.34591980e-01, -4.30616081e-02, -1.15515162e-01,
9.39791440e-03, -7.74882426e-03])), (0.0022475672476950743, array([ 8.73743164e-04, -1.66423
040e-03, 7.45416458e-04, 3.07977559e-04,
1.72861907e-03, -2.04540557e-03, 6.82206778e-04, -1.40305294e-03,
-2.72609621e-03, -1.42354356e-03, 7.09013571e-05, -2.25955695e-05,
7.30409774e-05, 2.09513937e-04, -6.28938505e-04, 8.33854395e-05,
2.71834078e-05, 2.73636459e-05, 2.77647997e-05, 4.63309898e-06,
2.73807988e-04, 1.05254109e-03, -2.15173325e-04, 1.10141058e-04,
-3.63394974e-04, 1.89895596e-04, 2.13911999e-04, -2.12449614e-04,
-1.48090129e-04, 3.83683190e-05, 3.95015250e-05, -6.40889697e-05,
-2.15752268e-04, 3.13663774e-05, 5.52593117e-05, -1.68659188e-06,
1.24672096e-04, -3.22821095e-05, 4.53533751e-06, 1.52531961e-05,
2.96810024e-05, 2.66726327e-04, -3.15592018e-04, 1.24963156e-04,
2.91945427e-04, 3.44997444e-05, -4.49126982e-04, -3.91767013e-04,
3.18626012e-04, 2.06652903e-04, -1.15869167e-03, -1.19228002e-01,
-6.73506700e-01, -6.19958284e-01, -3.84399876e-01, -2.63273327e-05,
-4.93419905e-03, -1.18597689e-03, 2.38658754e-04, 1.05118521e-03,
4.73676341e-04, 4.94709946e-04, 3.05776306e-04, 6.32243095e-05,
7.23131883e-04, -4.31812813e-04, 2.56914039e-04, -1.10376292e-04,
7.39428235e-05, -4.68659075e-05, 1.93069842e-05, 7.41033672e-05,
-3.28099516e-04, -2.30607646e-04, -7.37500178e-04, -2.47522772e-04,
-6.48623696e-04, -2.00731100e-04, 6.41467503e-05, 5.37530518e-05,
-7.53452293e-05, -3.79687489e-04, 1.58121247e-04, -1.08978707e-04,
-1.66192458e-03, 4.65935481e-05, 1.81179430e-04, -1.10096449e-04,
1.35285991e-03, 3.28324358e-04])), (0.0009933514215786233, array([ 1.53628210e-04, -6.22361
736e-04, -4.57054309e-06, 2.90935816e-04,
-2.10106863e-04, 1.00204884e-04, 2.17499715e-04, -5.57897504e-04,
-1.39568550e-03, 1.05339167e-01, 3.79239031e-01, 5.45831880e-01,
5.02591619e-01, 2.61126125e-01, 1.08363182e-01, 3.79714888e-02,
2.14380574e-02, 1.61903934e-02, 1.14422809e-02, 8.09929611e-03,
7.40576108e-03, 2.66280569e-02, 1.98802392e-01, 1.10899671e-02,
1.30437900e-01, 1.77213342e-01, 1.47040349e-01, 1.47554793e-01,
2.16513272e-01, 1.10090824e-01, 8.76644327e-02, 7.05251496e-02,
7.71524303e-02, 3.13236194e-02, 2.52734757e-02, 1.43036488e-02,
2.08533768e-02, 5.21462344e-03, 7.36003888e-03, 6.38283136e-03,
3.69168397e-03, 1.23179683e-05, 1.01732980e-03, 2.06129745e-04,
1.45626634e-03, 4.51948859e-03, 1.02505841e-05, 1.06353364e-06,
7.59423127e-04, 4.73252908e-05, 3.53150845e-05, 5.34980746e-04,
-9.21386327e-05, -1.81679508e-04, -1.39970011e-04, 1.93297011e-03,
5.53199015e-05, 1.82548204e-04, 7.77840325e-04, 4.21404278e-04,
2.32995847e-05, 3.22842741e-05, -9.13394568e-06, 6.90184011e-07,
3.49610347e-05, -4.14575111e-04, -1.47745097e-04, -1.84970570e-04,
-2.31609979e-04, -1.85588148e-04, -1.81753881e-04, -1.27181386e-04,
-3.94305437e-04, -3.18831514e-04, -2.19945839e-04, 7.05051980e-05,
-4.52457928e-04, -2.64528565e-05, -4.66528170e-05, -1.51408972e-04,
-2.01521049e-04, -3.48861044e-04, -5.67246400e-04, -2.76629167e-04,
-1.07199259e-03, 9.37205376e-04, -6.20598211e-05, -1.53413905e-04,
-1.32020174e-04, -1.10134760e-04])), (0.0001285036483623192, array([-1.28878123e-04, 2.02334
342e-05, -1.98617012e-04, 1.09363369e-04,
2.14764133e-04, 2.23521491e-04, -1.43039095e-04, -5.62481330e-06,
-2.80984771e-04, 5.46493983e-02, 1.96852465e-01, 2.83402156e-01,
2.60788652e-01, 1.35444226e-01, 5.62126533e-02, 1.96921722e-02,
1.11068787e-02, 8.39898897e-03, 5.94223765e-03, 4.20047060e-03,
-1.42471248e-02, -5.25585800e-02, -3.82741827e-01, -2.13698723e-02,
-2.50747229e-01, -3.41163519e-01, -2.83196435e-01, -2.84414624e-01,
-4.17342376e-01, -2.12227247e-01, -1.69255898e-01, -1.36025496e-01,
-1.48792418e-01, -6.03392197e-02, -4.87875548e-02, -2.75823143e-02,
-4.02682463e-02, -1.00689806e-02, -1.42375864e-02, -1.23355135e-02,
-7.12337590e-03, 9.68627459e-05, 1.52790204e-04, 4.97796947e-05,
1.16390050e-04, -1.09305830e-03, 2.32283741e-05, -3.69245654e-05,
-2.08640398e-04, -2.08468889e-05, -2.73325647e-05, 3.09570464e-04,
1.12712030e-05, 6.72111137e-06, 4.97753122e-05, 1.44149409e-03,
1.38348556e-03, 3.95272055e-03, 6.47066837e-03, 5.84691125e-03,
1.93258877e-03, 1.09953636e-03, 4.14830539e-04, 7.20333669e-05,
2.33433608e-03, 1.50046037e-04, -8.99742286e-07, 5.09254547e-05,
-3.69858670e-05, 3.66282565e-05, 7.63683400e-05, -7.89082876e-06,
1.09914137e-04, 7.57720383e-05, 6.72733706e-05, 1.36273518e-04,
1.23933000e-04, 2.45637766e-04, 2.42022541e-05, 6.39095337e-05,
4.85970167e-05, 1.10251470e-04, 1.34102319e-04, 8.35564438e-05,
4.47564514e-04, 1.36209850e-04, 5.51574641e-05, 6.13542319e-05,
-3.51101653e-05, 5.54513489e-05])), (8.546833259599872e-05, array([-2.08332763e-04, 2.31838
890e-04, -2.32036055e-04, 2.13627016e-05,
-3.21972859e-04, -3.28785753e-05, -1.44691509e-05, 1.78841317e-04,
3.08667982e-04, -1.27699843e-04, -2.38053889e-03, -3.47639669e-03,
-3.15684072e-03, -1.61671008e-03, -6.72242549e-04, -2.34562134e-04,
-1.34216500e-04, -1.01989746e-04, -6.92548070e-05, -4.77069908e-05,
1.28954766e-04, 1.82467528e-03, 3.69931946e-03, 1.86164953e-04,
2.40342441e-03, 3.29844614e-03, 2.71115400e-03, 2.74510258e-03,
4.07991636e-03, 2.06063416e-03, 1.66397221e-03, 1.34137927e-03,
1.48732389e-03, 6.03148471e-04, 4.85348750e-04, 2.75466231e-04,
4.07980815e-04, 1.01221155e-04, 1.47168736e-04, 1.25627378e-04,
7.56742638e-05, -5.29424996e-05, 8.91831971e-05, -8.22435331e-07,
3.13514097e-05, 1.22806426e-05, -7.69374333e-05, -8.32139535e-06,
-5.22312387e-05, -3.01101714e-05, -7.96754176e-06, -2.62440143e-05,
-3.81164553e-05, -8.87918641e-05, 1.50166487e-04, 4.25264035e-02,
1.30542379e-01, 3.84450432e-01, 6.32638568e-01, 5.75707304e-01,
1.90209340e-01, 1.06746617e-01, 3.98898116e-02, 6.44077334e-03,
2.28826068e-01, -5.74563925e-05, -5.80535539e-06, -5.59460932e-06,
-8.47758206e-05, -1.59189210e-05, 7.39386200e-04, -1.86751752e-05,
-2.45961895e-06, -3.30313631e-07, -2.55624742e-05, 8.58763777e-06,
-2.34121617e-05, 3.01494013e-05, -1.62412636e-05, -3.43000551e-05,
-5.75614077e-06, 7.25723622e-06, 1.39889108e-05, -1.43744435e-05,
-1.70305297e-04, 3.63750540e-05, -6.76543807e-05, -3.14403423e-05,
-9.10703227e-05, -8.62663594e-05])), (3.57406919508468e-15, array([ 2.11575486e-02, -1.582900
18e-02, -2.31031029e-02, -1.32339463e-02,
-1.89671464e-15, -4.50856253e-16, -2.47485440e-16, 1.67721240e-02,
-6.87602672e-18, 1.66117623e-13, 5.83720984e-13, 8.40294151e-13,
7.73710867e-13, 4.02040948e-13, 1.66488434e-13, 5.83168521e-14,
3.32222490e-14, 2.49726506e-14, 1.75715810e-14, 1.24680774e-14,
-3.05980931e-14, -1.02417681e-13, -8.22521604e-13, -4.60839231e-14,
-5.39098389e-13, -7.33497436e-13, -6.08832099e-13, -6.11607489e-13,
-8.97321584e-13, -4.56141407e-13, -3.63870351e-13, -2.92295232e-13,
-3.19878831e-13, -1.29643197e-13, -1.05080017e-13, -5.93866341e-14,
-8.60871852e-14, -2.14069617e-14, -3.08313969e-14, -2.67523519e-14,
-1.55999860e-14, -3.68177357e-16, 9.83300189e-16, -2.88448726e-17,
1.17248397e-15, -8.87031146e-16, -5.82693334e-16, -1.15822815e-16,
-7.13689134e-16, -2.60166268e-16, -2.90415081e-16, -6.04816455e-15,
-3.87908532e-14, -3.59788680e-14, -2.04042558e-14, 3.36160503e-13,
1.02335089e-12, 3.01489393e-12, 4.96135057e-12, 4.51465170e-12,
6.15185598e-01, 3.45170989e-01, 1.28946998e-01, 2.08244423e-02,
-6.95452578e-01, -1.17725314e-15, -2.14993045e-16, -1.88381812e-16,
-1.07716607e-15, -1.92010054e-16, 5.71067215e-15, -1.84127565e-16,
-6.35501344e-16, -4.92896282e-16, -3.09190140e-16, -2.45488394e-16,
-9.10601070e-16, 1.07212288e-16, -2.11995340e-16, -5.21567041e-16,
-4.83790506e-16, -7.36752799e-16, -9.05612969e-16, -6.78891034e-16,
-3.07509446e-15, 7.67017500e-16, -3.25664494e-16, -3.37512170e-16,
4.22647841e-16, -1.53183588e-16])), (-5.203296557290081e-16, array([ 3.95776120e-01, 5.9535
0761e-01, -2.60441848e-01, -1.49186602e-01,
1.92539649e-15, -7.15780497e-16, 1.14891672e-15, -6.30822900e-01,
-3.29257908e-15, 4.67313577e-14, 1.76804996e-13, 2.53917221e-13,
2.33699003e-13, 1.21103520e-13, 5.04820000e-14, 1.73684194e-14,
1.00907288e-14, 7.40941579e-15, 5.25033951e-15, 3.63889926e-15,
-1.92335010e-14, -7.64211538e-14, -5.14525154e-13, -2.85113552e-14,
-3.36988883e-13, -4.58831878e-13, -3.80916894e-13, -3.82484085e-13,
-5.61969388e-13, -2.85650684e-13, -2.27840712e-13, -1.83258941e-13,
-2.00699534e-13, -8.12421295e-14, -6.55676014e-14, -3.71603918e-14,
-5.46295728e-14, -1.35999504e-14, -1.93399333e-14, -1.67395955e-14,
-9.51656615e-15, 4.15643054e-16, 2.39123077e-16, -2.40023495e-16,
4.57368674e-16, -2.19071205e-15, 8.04654859e-16, 1.91213598e-16,
2.23220914e-16, 2.44529910e-16, 4.61959783e-17, 6.51377538e-15,
3.49687157e-14, 3.24735983e-14, 1.92711779e-14, -1.91410281e-13,
-5.90311992e-13, -1.74015128e-12, -2.86353645e-12, -2.60610537e-12,
-1.84157089e-02, -1.03327654e-02, -3.86005521e-03, -6.23384014e-04,
2.08185177e-02, 5.82569889e-16, -1.27445237e-17, 1.40344000e-17,
3.79983681e-16, 7.56614939e-17, -3.43738723e-15, 8.31006203e-17,
8.20325305e-17, 1.13494052e-16, 2.93170621e-16, -1.51020000e-16,
3.11752960e-16, 1.58851686e-16, 1.59089842e-16, 4.46618813e-17,
6.60833994e-17, -8.21858778e-17, -1.70396109e-16, -1.16447521e-16,
1.30103395e-15, -3.37879073e-16, 1.11958831e-16, 2.80088150e-16,
2.49661075e-15, 3.14516975e-16])), (-1.832295599444726e-15, array([-3.66771361e-01, 5.3013
3370e-01, 4.49762306e-01, 2.57633367e-01,
-1.00010513e-15, 1.36101936e-15, -5.96497519e-16, -5.61719731e-01,
2.96800585e-15, 2.21860401e-13, 8.01561027e-13, 1.15431277e-12,
1.06242717e-12, 5.51661224e-13, 2.28765043e-13, 7.99484576e-14,
4.53604094e-14, 3.40503053e-14, 2.42471615e-14, 1.70529159e-14,
-5.43693189e-14, -2.02956039e-13, -1.45906397e-12, -8.12234427e-14,
-9.55450657e-13, -1.30039646e-12, -1.07947378e-12, -1.08345113e-12,
-1.59000274e-12, -8.08602402e-13, -6.44860817e-13, -5.18191098e-13,
-5.66762590e-13, -2.29591484e-13, -1.85566519e-13, -1.05031958e-13,
-1.53445492e-13, -3.83317479e-14, -5.41923598e-14, -4.70300061e-14,
-2.70197749e-14, -8.38247062e-16, -2.00960536e-15, -6.14130512e-16,
-1.24468044e-15, -4.01922963e-15, 8.54756834e-16, 6.71081649e-17,
-5.35323451e-16, 1.09902452e-16, -4.81268135e-16, 4.21930429e-15,
1.59689958e-14, 1.45423480e-14, 8.54643474e-15, -6.70301174e-14,
-2.17659478e-13, -6.42007685e-13, -1.05641968e-12, -9.61851422e-13,
-9.68124537e-03, -5.43199492e-03, -2.02925349e-03, -3.27716605e-04,
1.09444159e-02, -2.06603474e-16, -1.48153166e-16, -2.55972266e-16,
-1.98805873e-16, -1.06059060e-16, -1.46806749e-15, -1.30672293e-16,
-2.75698019e-16, -4.80575050e-16, -2.11066013e-16, -3.44234921e-16,
-2.54309196e-16, 5.75283223e-16, -8.71629386e-17, -6.15334883e-16,
-1.03644619e-16, -5.13019067e-16, -6.55711395e-16, -8.17593232e-16,
-8.48519545e-17, 8.21246329e-17, 1.26381210e-16, -1.13566604e-16,
-1.49054402e-15, 7.83773982e-17]))]
Eigenvalues in descending order:
[6.400301029851477, 4.23053271502051, 3.0220056962108997, 2.3606995503606907, 1.7227802802980245, 1.
7053304684180373, 1.5823948341725016, 1.5166979298471606, 1.4840050960170488, 1.3921255385114626, 1.
338123871891159, 1.2745588269563284, 1.2235863310684403, 1.2178118832338365, 1.1943992057853123, 1.1
835468203512876, 1.175215028494778, 1.1607311253642982, 1.148470393300462, 1.1194893763589782, 1.109
6027631760017, 1.1049386135422454, 1.0914342813543214, 1.0846048546102214, 1.0827345320601791, 1.071
1889349293784, 1.0636889264741323, 1.054552899346843, 1.0495564454254236, 1.048150719936755, 1.04163
63339168457, 1.0381369601703594, 1.0334519547258465, 1.027681645874375, 1.0238189302225627, 1.015216
974820168, 1.0136313723660826, 1.0118982712943438, 1.0092805032957575, 1.0062251615105764, 1.0055267
813795965, 1.0041959740870794, 1.0019590929564808, 1.0005162465340969, 0.9936805751250474, 0.9932619
924220245, 0.989390243013148, 0.9869860806882016, 0.9828875619572957, 0.9812541978117575, 0.96980869
41074693, 0.9529273395960557, 0.949801385112714, 0.9442733887767248, 0.9416943150113966, 0.933793856
0734337, 0.931041145818125, 0.9160124331703235, 0.9017616031502895, 0.890644523647056, 0.88856765553
17741, 0.8791588942184467, 0.8629753367296573, 0.8317611004894274, 0.8087658281973401, 0.78927222122
29611, 0.7619517528659735, 0.754916612665571, 0.7309008552453276, 0.6902086442370304, 0.660453404225
0566, 0.6482152267621847, 0.5714116669727295, 0.42504944989687043, 0.3978166894517955, 0.35238333407
115874, 0.337764060513702, 0.24537240875649707, 0.2105885520158932, 0.1947049465927752, 0.1233239294
2132495, 0.07848642552981931, 0.051763400830042, 0.0022475672476950743, 0.0009933514215786233, 0.000
1285036483623192, 8.546833259599872e-05, 3.57406919508468e-15, -5.203296557290081e-16, -1.8322955994
44726e-15]

In [174]:
tot = sum(eigenvalues)
var_explained = [(i / tot) for i in sorted(eigenvalues, reverse=True)]

# an array of variance explained by each

# eigen vector... there will be 90 entries as there are 90 eigen vectors)


cum_var_exp = np.cumsum(var_explained)
# an array of cumulative variance. There will be 90 entries with 90 th entry cumulative reaching almost 100%

In [175]:

print(len(var_explained))

print((cum_var_exp))

90
[0.07111057 0.11811392 0.15168992 0.17791848 0.19705944 0.21600652
0.23358772 0.250439 0.26692704 0.28239426 0.29726149 0.31142248
0.32501714 0.33854764 0.35181802 0.36496782 0.37802505 0.39092136
0.40368144 0.41611953 0.42844778 0.4407242 0.45285059 0.46490109
0.47693082 0.48883227 0.50065039 0.512367 0.5240281 0.53567358
0.54724669 0.55878091 0.57026308 0.58168114 0.59305629 0.60433586
0.61559781 0.62684051 0.63805413 0.6492338 0.66040571 0.67156283
0.6826951 0.69381134 0.70485163 0.71588727 0.72687989 0.73784581
0.74876618 0.75966841 0.77044347 0.78103098 0.79158375 0.8020751
0.8125378 0.82291272 0.83325705 0.84343441 0.85345344 0.86334895
0.87322138 0.88298928 0.89257737 0.90181865 0.91080445 0.91957366
0.92803933 0.93642683 0.94454751 0.95221608 0.95955405 0.96675604
0.97310471 0.97782723 0.98224717 0.98616233 0.98991506 0.99264127
0.99498101 0.99714428 0.99851447 0.9993865 0.99996161 0.99998659
0.99999762 0.99999905 1. 1. 1. 1. ]

From above table we conclude that 96% variance is contributed by about 72 features
In [176]:

plt.figure(figsize=(plotSizeX, plotSizeY))
plt.bar(range(0,90), np.array(var_explained), alpha = 0.5, align='center', label='individual explained variance')
plt.step(range(0,90), np.array(cum_var_exp), where= 'mid', label='cumulative explained variance')
plt.ylabel('Explained variance ratio')
plt.xlabel('Principal components')
plt.legend(loc = 'best')
plt.show()

72 dimensions covering 97% variance in the data. So we can reduce to 72 dimension space

Now will recall the ensemble models from our initial run to check the feature selection using featureimp from individual models

In [177]:
#Building fuction to return the feature importances for the model
predictors = [x for x in dff.columns if x not in ['price']]

def modelfit(alg, dxtrain, dytrain, printFeatureImportance=True):


#feature importance
alg.fit(dxtrain,dytrain)
alg_imp_feature_1=pd.DataFrame(alg.feature_importances_, columns = ["Imp"], index = predictors)
alg_imp_feature_1.sort_values(by="Imp",ascending=False)
alg_imp_feature_1['Imp'] = alg_imp_feature_1['Imp'].map('{0:.5f}'.format)
alg_imp_feature_1=alg_imp_feature_1.sort_values(by="Imp",ascending=False)
alg_imp_feature_1.Imp=alg_imp_feature_1.Imp.astype("float")

feat_30list=list(alg_imp_feature_1.index[:30])

if printFeatureImportance:
alg_imp_feature_1[:30].plot.bar(figsize=(plotSizeX, plotSizeY))
#First 20 features have an importance of 90.5% and first 30 have importance of 95.15
print("First 25 feature importance:\t",(alg_imp_feature_1[:25].sum())*100)
print("First 30 feature importance:\t",(alg_imp_feature_1[:30].sum())*100)

return feat_30list

Will run above function with ensemble models: Gradient boosting, Random forest, Bagging
In [178]:

#Gradient boost model


modelfit(GB1,X_train,y_train)

First 25 feature importance: Imp 96.698


dtype: float64
First 30 feature importance: Imp 98.305
dtype: float64

Out[178]:
['furnished_1',
'living_measure',
'yr_built',
'living_measure15',
'quality_8',
'City_Bellevue',
'City_Seattle',
'lot_measure15',
'HouseLandRatio',
'City_Kent',
'quality_9',
'sight_4',
'City_Federal Way',
'coast_1',
'City_Mercer Island',
'City_Kirkland',
'City_Medina',
'City_Redmond',
'quality_11',
'ceil_measure',
'quality_7',
'City_Renton',
'City_Maple Valley',
'quality_6',
'total_area',
'quality_10',
'basement',
'City_Issaquah',
'City_Sammamish',
'condition_5']

The top 30 features are covering about 98% in gradient boosting model. This is very good coverage for just 30% of the variables
In [179]:

#Random Forest model


modelfit(RF1,X_train,y_train)

First 25 feature importance: Imp 93.273


dtype: float64
First 30 feature importance: Imp 95.008
dtype: float64

Out[179]:

['furnished_1',
'yr_built',
'living_measure',
'living_measure15',
'quality_8',
'HouseLandRatio',
'lot_measure15',
'quality_9',
'ceil_measure',
'City_Bellevue',
'total_area',
'lot_measure',
'City_Seattle',
'City_Kirkland',
'City_Kent',
'City_Federal Way',
'coast_1',
'basement',
'City_Mercer Island',
'quality_7',
'City_Redmond',
'sight_4',
'City_Renton',
'City_Maple Valley',
'City_Medina',
'City_Sammamish',
'quality_10',
'has_renovated_Yes',
'room_bath_2.5',
'room_bed_3']

The top 30 features are covering about 95% in random forest model

Now will extract the top 30 features from the above models
In [180]:

feat_list_GB1=modelfit(GB1,X_train,y_train, printFeatureImportance=False)
print(feat_list_GB1)

feat_list_RF1=modelfit(RF1,X_train,y_train, printFeatureImportance=False)
print(feat_list_RF1)

['furnished_1', 'living_measure', 'yr_built', 'living_measure15', 'quality_8', 'City_Bellevue', 'Cit


y_Seattle', 'lot_measure15', 'HouseLandRatio', 'City_Kent', 'quality_9', 'sight_4', 'City_Federal Wa
y', 'coast_1', 'City_Mercer Island', 'City_Kirkland', 'City_Medina', 'City_Redmond', 'quality_11', '
ceil_measure', 'quality_7', 'City_Renton', 'City_Maple Valley', 'quality_6', 'total_area', 'quality_
10', 'basement', 'City_Issaquah', 'City_Sammamish', 'condition_5']
['furnished_1', 'yr_built', 'living_measure', 'living_measure15', 'quality_8', 'HouseLandRatio', 'lo
t_measure15', 'quality_9', 'ceil_measure', 'City_Bellevue', 'total_area', 'lot_measure', 'basement',
'City_Kent', 'City_Kirkland', 'City_Federal Way', 'City_Seattle', 'quality_7', 'City_Mercer Island',
'City_Redmond', 'coast_1', 'City_Renton', 'sight_4', 'quality_10', 'City_Maple Valley', 'City_Sammam
ish', 'City_Medina', 'room_bed_4', 'condition_3', 'City_Issaquah']

From the above 2 feature list, we will consolidate all the features

In [181]:
Key_feat=list(set(feat_list_GB1).union(feat_list_RF1))
print(len(Key_feat))
print(Key_feat)

33
['City_Mercer Island', 'condition_5', 'City_Sammamish', 'yr_built', 'sight_4', 'City_Seattle', 'City
_Federal Way', 'City_Maple Valley', 'City_Bellevue', 'furnished_1', 'City_Kent', 'quality_9', 'City_
Redmond', 'City_Issaquah', 'quality_8', 'total_area', 'quality_7', 'ceil_measure', 'City_Medina', 'c
oast_1', 'condition_3', 'lot_measure15', 'HouseLandRatio', 'City_Kirkland', 'City_Renton', 'living_m
easure15', 'basement', 'room_bed_4', 'quality_6', 'lot_measure', 'quality_10', 'quality_11', 'living
_measure']

From two models we have 33 importance features. We will freeze on the above 33 list and make another dataframe (along with 'price')

In [182]:

dff33=dff[['price','basement', 'City_Bellevue', 'coast_1', 'HouseLandRatio', 'City_Seattle', 'quality_10', 'quali


ty_9', 'ceil_measure', 'City_Renton', 'City_Redmond', 'City_Federal Way', 'City_Mercer Island', 'yr_built', 'livi
ng_measure15', 'living_measure', 'City_Maple Valley', 'sight_3', 'total_area', 'City_Kirkland', 'sight_4', 'quali
ty_6', 'quality_7', 'City_Sammamish', 'quality_8', 'City_Kent', 'quality_12', 'lot_measure', 'condition_3', 'furn
ished_1', 'City_Issaquah', 'quality_11', 'City_Medina', 'lot_measure15']].copy()

In [183]:
dff33.shape

Out[183]:

(18287, 34)

In [184]:

dff33.head()

Out[184]:

price basement City_Bellevue coast_1 HouseLandRatio City_Seattle quality_10 quality_9 ceil_measure City_Renton ... quality_8 City_Kent qualit

17786 430000 0 0 0 19.0 1 0 0 2550 0 ... 1 0

3782 385500 420 0 0 16.0 0 0 0 1120 0 ... 0 0

10069 736000 0 1 0 16.0 0 0 1 2290 0 ... 0 0

7114 580000 970 0 0 24.0 1 0 0 970 0 ... 0 0

10080 315000 1160 0 0 22.0 1 0 0 1160 0 ... 0 0

5 rows × 34 columns
In [185]:

X3 = dff33.drop("price" , axis=1)
y3 = dff33["price"]

X3_train, X3_test, y3_train, y3_test = train_test_split(X3, y3, test_size=0.2, random_state=10)


X3_train, X3_val, y3_train, y3_val = train_test_split(X3_train, y3_train, test_size=0.2, random_state=10)

print(X3_train.shape)
print(X3_test.shape)
print(X3_val.shape)

(11703, 33)
(3658, 33)
(2926, 33)

Eventhough PCA is helping us to reduce dimensions upto about 60 dimensions, we can see that in our random
forest model top 30 features are explaining the 95% variance in the regression and in gradient boosting model
top 30 features are covering 98% varience.

Hence we conclude that we will use features selection by considering the feature importance fucntion in
individual models. Thus we extracted 33 important features

HYPERTUNING with Gridsearch CV

In [186]:

from sklearn.model_selection import GridSearchCV


from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import cross_val_score

Since we have better performance in gradient boosting model, we will hypertune the model for improving the score

Following are the parameters we tune for the gradient boosting model.

In [187]:

param_grid = {
'loss':['ls','lad','huber'],
'bootstrap': ['True','False'],
'max_depth': range(5,11,1),
'max_features': ['auto','sqrt'],
'learning_rate': [0.05,0.1,0.2,0.25],
'min_samples_leaf': [4,10,20],
'min_samples_split': [5,10,1000],
'n_estimators': [10,50,100,150,200],
'subsample':[0.8,1]
}

In [188]:

GBR_test=GradientBoostingRegressor(random_state=22)

First will tune each parameter separately

In [189]:
param_grid1 = {'n_estimators': range(50,401,50)}

In [190]:

grid_search1 = GridSearchCV(estimator = GBR_test, param_grid = param_grid1,


cv = 3, n_jobs = 2, verbose = 1)
In [191]:

grid_search1.fit(X_train,y_train)
grid_search1.best_params_

Fitting 3 folds for each of 8 candidates, totalling 24 fits

[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.


[Parallel(n_jobs=2)]: Done 24 out of 24 | elapsed: 57.6s finished

Out[191]:

{'n_estimators': 400}

In [192]:

grid_search1.best_params_, grid_search1.best_score_

Out[192]:
({'n_estimators': 400}, 0.7757647547223905)

n_estimators of 400 is best in range 50 to 400. Will test same until 1000

In [193]:

param_grid2 = {'n_estimators': range(400,1001,200)}


GBR_test=GradientBoostingRegressor(random_state=22)

grid_search2 = GridSearchCV(estimator = GBR_test, param_grid = param_grid2,


cv = 3, n_jobs = 2, verbose = 1)
grid_search2.fit(X_train,y_train)

Fitting 3 folds for each of 4 candidates, totalling 12 fits

[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.


[Parallel(n_jobs=2)]: Done 12 out of 12 | elapsed: 1.3min finished

Out[193]:
GridSearchCV(cv=3, error_score='raise-deprecating',
estimator=GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
learning_rate=0.1, loss='ls', max_depth=3, max_features=None,
max_leaf_nodes=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_sampl...te=22, subsample=1.0, tol=0.0001,
validation_fraction=0.1, verbose=0, warm_start=False),
fit_params=None, iid='warn', n_jobs=2,
param_grid={'n_estimators': range(400, 1001, 200)},
pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
scoring=None, verbose=1)

In [194]:

grid_search2.cv_results_,grid_search2.best_params_, grid_search2.best_score_

Out[194]:

({'mean_fit_time': array([ 7.2032059 , 10.84747616, 14.41415413, 17.8464543 ]),


'std_fit_time': array([0.0866392 , 0.19536189, 0.14661922, 0.92177025]),
'mean_score_time': array([0.03063202, 0.04291979, 0.06155936, 0.07675632]),
'std_score_time': array([0.00097431, 0.00340029, 0.00733648, 0.01039824]),
'param_n_estimators': masked_array(data=[400, 600, 800, 1000],
mask=[False, False, False, False],
fill_value='?',
dtype=object),
'params': [{'n_estimators': 400},
{'n_estimators': 600},
{'n_estimators': 800},
{'n_estimators': 1000}],
'split0_test_score': array([0.77559185, 0.77864467, 0.77983937, 0.78052058]),
'split1_test_score': array([0.76537408, 0.77109939, 0.77235457, 0.7724209 ]),
'split2_test_score': array([0.78632834, 0.78828157, 0.78829273, 0.78811941]),
'mean_test_score': array([0.77576475, 0.77934188, 0.78016222, 0.78035363]),
'std_test_score': array([0.00855542, 0.0070319 , 0.00651073, 0.00640998]),
'rank_test_score': array([4, 3, 2, 1]),
'split0_train_score': array([0.86386211, 0.88101725, 0.89106634, 0.89835051]),
'split1_train_score': array([0.86284551, 0.87877078, 0.88780479, 0.89494197]),
'split2_train_score': array([0.85757011, 0.87496329, 0.88575537, 0.89367633]),
'mean_train_score': array([0.86142591, 0.87825044, 0.88820883, 0.89565627]),
'std_train_score': array([0.00275787, 0.00249875, 0.00218694, 0.00197394])},
{'n_estimators': 1000},
0.7803536277850995)
In [195]:

param_grid2 = {'n_estimators': range(1000,2000,300)}


GBR_test=GradientBoostingRegressor(random_state=22)

grid_search2 = GridSearchCV(estimator = GBR_test, param_grid = param_grid2,


cv = 5, n_jobs = 3, verbose = 1)
grid_search2.fit(X_train,y_train)

Fitting 5 folds for each of 4 candidates, totalling 20 fits

[Parallel(n_jobs=3)]: Using backend LokyBackend with 3 concurrent workers.


[Parallel(n_jobs=3)]: Done 20 out of 20 | elapsed: 4.1min finished

Out[195]:

GridSearchCV(cv=5, error_score='raise-deprecating',
estimator=GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
learning_rate=0.1, loss='ls', max_depth=3, max_features=None,
max_leaf_nodes=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_sampl...te=22, subsample=1.0, tol=0.0001,
validation_fraction=0.1, verbose=0, warm_start=False),
fit_params=None, iid='warn', n_jobs=3,
param_grid={'n_estimators': range(1000, 2000, 300)},
pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
scoring=None, verbose=1)

In [196]:

grid_search2.best_params_, grid_search2.best_score_

Out[196]:
({'n_estimators': 1000}, 0.7885965739886799)

n_estimators of 1000 is giving best result in range 400 to 1000

In [197]:

param_grid3 = {
'learning_rate': [0.1,0.2],
'min_samples_leaf': [5,10,20],
'min_samples_split': [5,10,20],
'n_estimators': [500,1000],
}

In [198]:

GBR_test=GradientBoostingRegressor(random_state=22)

grid_search3 = GridSearchCV(estimator = GBR_test, param_grid = param_grid3,


cv = 5, n_jobs = 3, verbose = 1)
grid_search3.fit(X_train,y_train)

Fitting 5 folds for each of 36 candidates, totalling 180 fits

[Parallel(n_jobs=3)]: Using backend LokyBackend with 3 concurrent workers.


[Parallel(n_jobs=3)]: Done 44 tasks | elapsed: 5.1min
[Parallel(n_jobs=3)]: Done 180 out of 180 | elapsed: 20.3min finished

Out[198]:

GridSearchCV(cv=5, error_score='raise-deprecating',
estimator=GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
learning_rate=0.1, loss='ls', max_depth=3, max_features=None,
max_leaf_nodes=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_sampl...te=22, subsample=1.0, tol=0.0001,
validation_fraction=0.1, verbose=0, warm_start=False),
fit_params=None, iid='warn', n_jobs=3,
param_grid={'learning_rate': [0.1, 0.2], 'min_samples_leaf': [5, 10, 20], 'min_samples_split'
: [5, 10, 20], 'n_estimators': [500, 1000]},
pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
scoring=None, verbose=1)
In [199]:

grid_search3.best_params_, grid_search3.best_score_

Out[199]:

({'learning_rate': 0.1,
'min_samples_leaf': 10,
'min_samples_split': 5,
'n_estimators': 1000},
0.7880978276736184)

In combination of 4 parameters above values are giving best result. We can see n_estimators of 1000 is best again. Now, will change the ranges
of other 3 parameters

In [200]:

param_grid4 = {
'learning_rate': [0.1,0.15],
'max_depth': [5,10],
'min_samples_leaf': [5,8],
'min_samples_split': [20,30],
'n_estimators': [1000],
}

In [201]:

GBR_test=GradientBoostingRegressor(random_state=22)

grid_search4 = GridSearchCV(estimator = GBR_test, param_grid = param_grid4,


cv = 5, n_jobs = 3, verbose = 1)
grid_search4.fit(X_train,y_train)

Fitting 5 folds for each of 16 candidates, totalling 80 fits

[Parallel(n_jobs=3)]: Using backend LokyBackend with 3 concurrent workers.


[Parallel(n_jobs=3)]: Done 44 tasks | elapsed: 23.3min
[Parallel(n_jobs=3)]: Done 80 out of 80 | elapsed: 45.2min finished

Out[201]:

GridSearchCV(cv=5, error_score='raise-deprecating',
estimator=GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
learning_rate=0.1, loss='ls', max_depth=3, max_features=None,
max_leaf_nodes=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_sampl...te=22, subsample=1.0, tol=0.0001,
validation_fraction=0.1, verbose=0, warm_start=False),
fit_params=None, iid='warn', n_jobs=3,
param_grid={'learning_rate': [0.1, 0.15], 'max_depth': [5, 10], 'min_samples_leaf': [5, 8], '
min_samples_split': [20, 30], 'n_estimators': [1000]},
pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
scoring=None, verbose=1)

In [202]:

grid_search4.best_params_, grid_search4.best_score_

Out[202]:

({'learning_rate': 0.1,
'max_depth': 5,
'min_samples_leaf': 8,
'min_samples_split': 20,
'n_estimators': 1000},
0.7821899364744039)

Now the score has reduced compared to earlier run


In [203]:

param_grid5 = {
'learning_rate': [0.1],
'max_depth': [5],
'min_samples_leaf': [8,10],
'min_samples_split': [30,40],
'n_estimators': [1000],
}

GBR_test=GradientBoostingRegressor(random_state=22)

grid_search5 = GridSearchCV(estimator = GBR_test, param_grid = param_grid5,


cv = 5, n_jobs = 2, verbose = 1)
grid_search5.fit(X_train,y_train)

Fitting 5 folds for each of 4 candidates, totalling 20 fits

[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.


[Parallel(n_jobs=2)]: Done 20 out of 20 | elapsed: 7.6min finished

Out[203]:

GridSearchCV(cv=5, error_score='raise-deprecating',
estimator=GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
learning_rate=0.1, loss='ls', max_depth=3, max_features=None,
max_leaf_nodes=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_sampl...te=22, subsample=1.0, tol=0.0001,
validation_fraction=0.1, verbose=0, warm_start=False),
fit_params=None, iid='warn', n_jobs=2,
param_grid={'learning_rate': [0.1], 'max_depth': [5], 'min_samples_leaf': [8, 10], 'min_sampl
es_split': [30, 40], 'n_estimators': [1000]},
pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
scoring=None, verbose=1)

In [204]:

grid_search5.best_params_, grid_search5.best_score_

Out[204]:

({'learning_rate': 0.1,
'max_depth': 5,
'min_samples_leaf': 10,
'min_samples_split': 40,
'n_estimators': 1000},
0.7844535606632613)

Above score has improved from earlier runs


In [205]:

param_grid6 = {
'learning_rate': [0.1],
'max_depth': [5],
'min_samples_leaf': [8],
'min_samples_split': [40,50],
'n_estimators': [1000],
}

GBR_test=GradientBoostingRegressor(random_state=22)

grid_search6 = GridSearchCV(estimator = GBR_test, param_grid = param_grid6,


cv = 5, n_jobs = 2, verbose = 1)
grid_search6.fit(X_train,y_train)

Fitting 5 folds for each of 2 candidates, totalling 10 fits

[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.


[Parallel(n_jobs=2)]: Done 10 out of 10 | elapsed: 3.6min finished

Out[205]:

GridSearchCV(cv=5, error_score='raise-deprecating',
estimator=GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
learning_rate=0.1, loss='ls', max_depth=3, max_features=None,
max_leaf_nodes=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_sampl...te=22, subsample=1.0, tol=0.0001,
validation_fraction=0.1, verbose=0, warm_start=False),
fit_params=None, iid='warn', n_jobs=2,
param_grid={'learning_rate': [0.1], 'max_depth': [5], 'min_samples_leaf': [8], 'min_samples_s
plit': [40, 50], 'n_estimators': [1000]},
pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
scoring=None, verbose=1)

In [206]:

grid_search6.best_params_, grid_search6.best_score_

Out[206]:
({'learning_rate': 0.1,
'max_depth': 5,
'min_samples_leaf': 8,
'min_samples_split': 50,
'n_estimators': 1000},
0.7828068526559553)

There is very marginal improvment in score. We are getting best score at min_samples_split of 40 among 30,40,50.

Will tune the final set of parameters along with above finalized ones
In [207]:

param_grid7 = {
'loss':['ls','lad','huber'],
'max_features': ['auto','sqrt'],
'learning_rate': [0.1],
'max_depth': [5],
'min_samples_leaf': [8],
'min_samples_split': [40],
'n_estimators': [1000],
'subsample':[0.8,1]
}

GBR_test=GradientBoostingRegressor(random_state=22)

grid_search7 = GridSearchCV(estimator = GBR_test, param_grid = param_grid7,


cv = 5, n_jobs = 2, verbose = 1)
grid_search7.fit(X_train,y_train)

Fitting 5 folds for each of 12 candidates, totalling 60 fits

[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.


[Parallel(n_jobs=2)]: Done 46 tasks | elapsed: 12.2min
[Parallel(n_jobs=2)]: Done 60 out of 60 | elapsed: 14.8min finished

Out[207]:

GridSearchCV(cv=5, error_score='raise-deprecating',
estimator=GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
learning_rate=0.1, loss='ls', max_depth=3, max_features=None,
max_leaf_nodes=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_sampl...te=22, subsample=1.0, tol=0.0001,
validation_fraction=0.1, verbose=0, warm_start=False),
fit_params=None, iid='warn', n_jobs=2,
param_grid={'loss': ['ls', 'lad', 'huber'], 'max_features': ['auto', 'sqrt'], 'learning_rate'
: [0.1], 'max_depth': [5], 'min_samples_leaf': [8], 'min_samples_split': [40], 'n_estimators': [1000
], 'subsample': [0.8, 1]},
pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
scoring=None, verbose=1)

In [208]:

grid_search7.best_params_, grid_search7.best_score_

Out[208]:
({'learning_rate': 0.1,
'loss': 'huber',
'max_depth': 5,
'max_features': 'sqrt',
'min_samples_leaf': 8,
'min_samples_split': 40,
'n_estimators': 1000,
'subsample': 1},
0.7965973506104334)

There is improvement in the score. will try one more iteration with changing other parameters
In [209]:

param_gridF = {
'loss':['huber'],
'max_features': ['sqrt'],
'learning_rate': [0.1,0.2],
'max_depth': [5,8],
'min_samples_leaf': [5],
'min_samples_split': [40,50],
'n_estimators': [1000],
'subsample':[1]
}

GBR_test=GradientBoostingRegressor(random_state=22)

grid_searchF = GridSearchCV(estimator = GBR_test, param_grid = param_gridF,


cv = 5, n_jobs = 2, verbose = 1)
grid_searchF.fit(X_train,y_train)
grid_searchF.best_params_,grid_searchF.best_score_

Fitting 5 folds for each of 8 candidates, totalling 40 fits

[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.


[Parallel(n_jobs=2)]: Done 40 out of 40 | elapsed: 6.0min finished

Out[209]:

({'learning_rate': 0.1,
'loss': 'huber',
'max_depth': 5,
'max_features': 'sqrt',
'min_samples_leaf': 5,
'min_samples_split': 40,
'n_estimators': 1000,
'subsample': 1},
0.7958994895003749)

The above iteration gives best result of 0.799.


Final parameters that are giving best result on training set are:

'learning_rate': 0.1, 'loss': 'huber', 'max_depth': 5, 'max_features': 'sqrt', 'min_samples_leaf': 5, 'min_samples_split': 50, 'n_estimators': 1000,
'subsample': 1 </b>

Hypertuning using graph


In [210]:

min_samples_leafs = range(1, 15, 1)


train_results = []
val_results = []
for min_samples_leaf in min_samples_leafs:
GBR_test=GradientBoostingRegressor(
loss='huber',
learning_rate=0.1,
n_estimators=1000,
subsample=1.0,
min_samples_split=40,
min_samples_leaf=min_samples_leaf,
max_depth=5,
random_state=22,
alpha=0.9,
)
GBR_test.fit(X_train,y_train)
y_GBR_predtr= GBR_test.predict(X_train)
y_GBR_predvl= GBR_test.predict(X_val)

result_leafs_tr=r2_score(y_GBR_predtr,y_train)
train_results.append(result_leafs_tr)
result_leafs_vl=r2_score(y_GBR_predvl,y_val)
val_results.append(result_leafs_vl)

from matplotlib.legend_handler import HandlerLine2D


line1, = plt.plot(min_samples_leafs,train_results,"b", label='Train r2')
line2, = plt.plot(min_samples_leafs, val_results,"r", label='val r2')
plt.legend(handler_map={line1: HandlerLine2D(numpoints=2)})
plt.ylabel("r2 score")
plt.xlabel("min samples leaf")
plt.show()

From above, min_samples_leaf of 6 is giving best score


In [211]:

min_samples_splits = [10,15,30,50,100,500,700,1000]
train_results_spt = []
val_results_spt = []
for min_samples_split in min_samples_splits:
GBR_test=GradientBoostingRegressor(
loss='huber',
learning_rate=0.1,
n_estimators=1000,
subsample=1.0,
min_samples_split=min_samples_split,
min_samples_leaf=5,
max_depth=5,
random_state=22,
alpha=0.9,
)
GBR_test.fit(X_train,y_train)
y_GBR_predtr= GBR_test.predict(X_train)
y_GBR_predvl= GBR_test.predict(X_val)

result_spt_tr=r2_score(y_GBR_predtr,y_train)
train_results_spt.append(result_spt_tr)
result_spt_vl=r2_score(y_GBR_predvl,y_val)
val_results_spt.append(result_spt_vl)

from matplotlib.legend_handler import HandlerLine2D


line1, = plt.plot(min_samples_splits,train_results_spt,"b", label='Train R2')
line2, = plt.plot(min_samples_splits, val_results_spt,"r", label='Val R2')
plt.legend(handler_map={line1: HandlerLine2D(numpoints=2)})
plt.ylabel("R2 score")
plt.xlabel("min samples split")
plt.show()

From above, min_samples_splits of about 10 is giving best score. Will try expanding the range around 10
In [212]:

min_samples_splits = [10,15,20,30,40,50,60,70,80,90,100]
train_results_spt = []
val_results_spt = []
for min_samples_split in min_samples_splits:
GBR_test=GradientBoostingRegressor(
loss='huber',
learning_rate=0.1,
n_estimators=1000,
subsample=1.0,
min_samples_split=min_samples_split,
min_samples_leaf=5,
max_depth=5,
random_state=22,
alpha=0.9,
)
GBR_test.fit(X_train,y_train)
y_GBR_predtr= GBR_test.predict(X_train)
y_GBR_predvl= GBR_test.predict(X_val)

result_spt_tr=r2_score(y_GBR_predtr,y_train)
train_results_spt.append(result_spt_tr)
result_spt_vl=r2_score(y_GBR_predvl,y_val)
val_results_spt.append(result_spt_vl)

from matplotlib.legend_handler import HandlerLine2D


line1, = plt.plot(min_samples_splits,train_results_spt,"b", label='Train R2')
line2, = plt.plot(min_samples_splits, val_results_spt,"r", label='Val R2')
plt.legend(handler_map={line1: HandlerLine2D(numpoints=2)})
plt.ylabel("R2 score")
plt.xlabel("min samples split")
plt.show()

From above, min_samples_splits of about 10 is giving best score


In [213]:

min_samples_splits = [7,8,9,10,11,12,13,14,15,20]
train_results_spt = []
val_results_spt = []
for min_samples_split in min_samples_splits:
GBR_test=GradientBoostingRegressor(
loss='huber',
learning_rate=0.1,
n_estimators=1000,
subsample=1.0,
min_samples_split=min_samples_split,
min_samples_leaf=5,
max_depth=5,
random_state=22,
alpha=0.9,
)
GBR_test.fit(X_train,y_train)
y_GBR_predtr= GBR_test.predict(X_train)
y_GBR_predvl= GBR_test.predict(X_val)

result_spt_tr=r2_score(y_GBR_predtr,y_train)
train_results_spt.append(result_spt_tr)
result_spt_vl=r2_score(y_GBR_predvl,y_val)
val_results_spt.append(result_spt_vl)

from matplotlib.legend_handler import HandlerLine2D


line1, = plt.plot(min_samples_splits,train_results_spt,"b", label='Train R2')
line2, = plt.plot(min_samples_splits, val_results_spt,"r", label='Val R2')
plt.legend(handler_map={line1: HandlerLine2D(numpoints=2)})
plt.ylabel("R2 score")
plt.xlabel("min samples split")
plt.show()

From above, min_samples_splits of about 12 is giving best score


In [214]:

max_depths = range(3,11,1)
train_results_dpt = []
val_results_dpt = []
for max_depth in max_depths:
GBR_test=GradientBoostingRegressor(
loss='huber',
learning_rate=0.1,
n_estimators=1000,
subsample=1.0,
min_samples_split=10,
min_samples_leaf=6,
max_depth=max_depth,
random_state=22,
alpha=0.9,
)
GBR_test.fit(X_train,y_train)
y_GBR_predtr= GBR_test.predict(X_train)
y_GBR_predvl= GBR_test.predict(X_val)

result_dpt_tr=r2_score(y_GBR_predtr,y_train)
train_results_dpt.append(result_dpt_tr)
result_dpt_vl=r2_score(y_GBR_predvl,y_val)
val_results_dpt.append(result_dpt_vl)

from matplotlib.legend_handler import HandlerLine2D


line1, = plt.plot(max_depths,train_results_dpt,"b", label='Train R2')
line2, = plt.plot(max_depths, val_results_dpt,"r", label='Val R2')
plt.legend(handler_map={line1: HandlerLine2D(numpoints=2)})
plt.ylabel("R2 score")
plt.xlabel("max depth")
plt.show()

From above, max_depth of about 6 is giving best score for validation set and not overfitting of training set
In [215]:

estimators = range(100,1500,100)
train_results_est = []
val_results_est = []
for n_estimators in estimators:
GBR_test=GradientBoostingRegressor(
loss='huber',
learning_rate=0.1,
n_estimators=n_estimators,
subsample=1.0,
min_samples_split=30,
min_samples_leaf=6,
max_depth=9,
random_state=22,
alpha=0.9,
)
GBR_test.fit(X_train,y_train)
y_GBR_predtr= GBR_test.predict(X_train)
y_GBR_predvl= GBR_test.predict(X_val)

result_est_tr=r2_score(y_GBR_predtr,y_train)
train_results_est.append(result_est_tr)
result_est_vl=r2_score(y_GBR_predvl,y_val)
val_results_est.append(result_est_vl)

from matplotlib.legend_handler import HandlerLine2D


line1, = plt.plot(estimators,train_results_est,"b", label='Train R2')
line2, = plt.plot(estimators, val_results_est,"r", label='Val R2')
plt.legend(handler_map={line1: HandlerLine2D(numpoints=2)})
plt.ylabel("R2 score")
plt.xlabel("n_estimators")
plt.show()

From above, n_estimators of about 1000 is giving best score

In [217]:
param_gridF = {
'loss':['huber'],
'max_features': ['sqrt'],
'learning_rate': [0.1],
'max_depth': [6],
'min_samples_leaf': [6],
'min_samples_split': [12],
'n_estimators': [1000],
'subsample':[1]
}

GBR_test=GradientBoostingRegressor(random_state=22)

grid_searchF = GridSearchCV(estimator = GBR_test, param_grid = param_gridF,


cv = 5, n_jobs = 2, verbose = 1)
grid_searchF.fit(X_train,y_train)
grid_searchF.best_score_

Fitting 5 folds for each of 1 candidates, totalling 5 fits

[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.


[Parallel(n_jobs=2)]: Done 5 out of 5 | elapsed: 58.4s finished

Out[217]:

0.7934419703161365
In [218]:

param_gridF = {
'loss':['huber'],
'max_features': ['sqrt'],
'learning_rate': [0.1],
'max_depth': [5],
'min_samples_leaf': [5],
'min_samples_split': [50],
'n_estimators': [1000],
'subsample':[1]
}

GBR_test=GradientBoostingRegressor(random_state=22)

grid_searchF = GridSearchCV(estimator = GBR_test, param_grid = param_gridF,


cv = 5, n_jobs = 2, verbose = 1)
grid_searchF.fit(X_train,y_train)
grid_searchF.best_score_,grid_searchF.best_params_

Fitting 5 folds for each of 1 candidates, totalling 5 fits

[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.


[Parallel(n_jobs=2)]: Done 5 out of 5 | elapsed: 35.2s finished

Out[218]:

(0.7928868850462906,
{'learning_rate': 0.1,
'loss': 'huber',
'max_depth': 5,
'max_features': 'sqrt',
'min_samples_leaf': 5,
'min_samples_split': 50,
'n_estimators': 1000,
'subsample': 1})

We can conclude from above that gridsearch CV is giving better results compared to that of tuning done by graphical method of individual
parameters

Final parameters that are giving best result on training set are:

'learning_rate': 0.1, 'loss': 'huber', 'max_depth': 5, 'max_features': 'sqrt', 'min_samples_leaf': 5, 'min_samples_split': 50, 'n_estimators': 1000,
'subsample': 1 </b>

CONFIDENCE INTERVAL
In [219]:

GBR_bestparam=GradientBoostingRegressor(
loss='huber',
learning_rate=0.1,
n_estimators=1000,
subsample=1.0,
min_samples_split=50,
min_samples_leaf=5,
max_depth=5,
random_state=22,
alpha=0.9,
)
GBR_bestparam.fit(X_train,y_train)
y_GBRF_predtr= GBR_bestparam.predict(X_train)
y_GBRF_predvl= GBR_bestparam.predict(X_val)
y_GBRF_predts= GBR_bestparam.predict(X_test)
In [220]:

#Model score and Deduction for each Model in a DataFrame


GBRF_trscore=r2_score(y_train,y_GBRF_predtr)
GBRF_trRMSE=np.sqrt(mean_squared_error(y_train, y_GBRF_predtr))
GBRF_trMSE=mean_squared_error(y_train, y_GBRF_predtr)
GBRF_trMAE=mean_absolute_error(y_train, y_GBRF_predtr)

GBRF_vlscore=r2_score(y_val,y_GBRF_predvl)
GBRF_vlRMSE=np.sqrt(mean_squared_error(y_val, y_GBRF_predvl))
GBRF_vlMSE=mean_squared_error(y_val, y_GBRF_predvl)
GBRF_vlMAE=mean_absolute_error(y_val, y_GBRF_predvl)

GBRF_tsscore=r2_score(y_test,y_GBRF_predts)
GBRF_tsRMSE=np.sqrt(mean_squared_error(y_test, y_GBRF_predts))
GBRF_tsMSE=mean_squared_error(y_test, y_GBRF_predts)
GBRF_tsMAE=mean_absolute_error(y_test, y_GBRF_predts)

GBRF_df=pd.DataFrame({'Method':['GBRF'],'Val Score':GBRF_vlscore,'RMSE_vl': GBRF_vlRMSE, 'MSE_vl': GBRF_vlMSE,'tr


ain Score':GBRF_trscore,'RMSE_tr': GBRF_trRMSE, 'MSE_tr': GBRF_trMSE,'test Score':GBRF_tsscore,'RMSE_ts': GBRF_ts
RMSE, 'MSE_ts': GBRF_tsMSE})

GBRF_df

Out[220]:

Method Val Score RMSE_vl MSE_vl train Score RMSE_tr MSE_tr test Score RMSE_ts MSE_ts

0 GBRF 0.80096 115867.988855 1.342539e+10 0.898909 81372.879729 6.621546e+09 0.793584 114695.310542 1.315501e+10

In [221]:

from sklearn.model_selection import KFold


from sklearn.model_selection import cross_val_score

num_folds = 50
seed = 7

kfold = KFold(n_splits=num_folds, random_state=seed)


model = GradientBoostingRegressor(n_estimators = 200, learning_rate = 0.1, random_state=22)
results = cross_val_score(GBR_bestparam, X, y, cv=kfold)
print(results)
print("Accuracy: %.3f%% (%.3f%%)" % (results.mean()*100.0, results.std()*100.0))

[0.86054651 0.81529 0.80351765 0.86060958 0.79642892 0.85548539


0.78098527 0.77925365 0.81822936 0.82102096 0.87499995 0.81800409
0.81853737 0.82096864 0.82206478 0.85415595 0.7952127 0.77879311
0.85529758 0.83972439 0.76258618 0.80910137 0.80208101 0.82664724
0.7825543 0.8601369 0.77441922 0.78867005 0.84107987 0.79025948
0.84773597 0.76865873 0.78487112 0.80018574 0.82324413 0.82243794
0.74048912 0.82370621 0.82606705 0.83661657 0.79192532 0.8126131
0.79097264 0.81741328 0.76640402 0.77512715 0.78013298 0.7859921
0.73054971 0.76721522]
Accuracy: 80.798% (3.241%)
In [222]:

from matplotlib import pyplot


# plot scores
pyplot.hist(results)
pyplot.show()
# confidence intervals
alpha = 0.95 # for 95% confidence
p = ((1.0-alpha)/2.0) * 100 # tail regions on right and left .25 on each side indicated by P value (
border)
lower = max(0.0, np.percentile(results, p))
p = (alpha+((1.0-alpha)/2.0)) * 100
upper = min(1.0, np.percentile(results, p))
print('%.1f confidence interval %.1f%% and %.1f%%' % (alpha*100, lower*100, upper*100))

95.0 confidence interval 74.5% and 86.1%

Dataset-1 Final summary:


The ensemble models have performed well compared to that of linear,KNN,SVR models
The best performance is given by Gradient boosting model with training (score-0.89,RMSE-81372), Validation (score-0.80,RSME-115867),
Testing(score-0.79,RMSE-114695) The 95% confidence interval scores range from 0.72 to 0.85.
The top key features that drive the price of the property are: 'furnished_1', 'yr_built', 'living_measure','quality_8', 'HouseLandRatio',
'lot_measure15', 'quality_9', 'ceil_measure', 'total_area'.
The above data is also reinforced by the analysis done during bivariate analysis.

For further improvization, the datasets can be made by treating outliers in different ways and hypertuning the ensemble models.
</b>

Dataset-2
In [2]:

import geopandas as gpd


from shapely.geometry import Point, Polygon
#For current working directory
import os
cwd = os.getcwd()

In [224]:

## Need to add file USA ZipCodes_1.xlsx to current working directory to access this data
USAZip=pd.read_excel("USA ZipCodes_1.xlsx",sheet_name="Sheet8")
USAZip.head()

Out[224]:

zipcode City County Type

0 98001 Auburn King Standard

1 98002 Auburn King Standard

2 98003 Federal Way King Standard

3 98004 Bellevue King Standard

4 98005 Bellevue King Standard

In [239]:

house_df = pd.read_csv('innercity.csv')
In [240]:

house_df1=house_df.merge(USAZip,how='left',on='zipcode')
#house_df.drop_duplicates()

house_df.shape

Out[240]:

(21613, 23)

In [5]:

#Add the folder WA to your current working directory


usa = gpd.read_file(cwd+'\\WA\\WSDOT__City_Limits.shp')
usa.head()
gdf = gpd.GeoDataFrame(
house_df,geometry = [Point(xy) for xy in zip(house_df['long'], house_df['lat'])])
#We can now plot our ``GeoDataFrame``
ax=usa[usa.CityName.isin(house_df.City.unique())].plot(
color='white', edgecolor='black',figsize=(20,8))
plt.figure(figsize=(15,15))
gdf.plot(ax=ax, color='green', marker='o',markersize=0.1)

Out[5]:

<matplotlib.axes._subplots.AxesSubplot at 0x1ccf1142588>

<Figure size 1080x1080 with 0 Axes>

In [241]:

#After analysis in p1 - Dropping 'cid','dayhours','basement','yr_built','yr_renovated','zipcode','lat','long','Co


unty','Type',
#'geometry','quality_group','month_year' columns.
cols=['cid','dayhours']
house_df_1=house_df.drop(cols, inplace = False, axis = 1)
The dataset worked earlier are giving r2 score on validation set in range 70%-75% with RMSE in range of 96000 to 155000. Trying with a
different dataset to see if this could be improved further.

For analysis in this iteration categorizing coast, furnished and quality. As in previous version tranformed many features but not got desired
result.

TREATING OUTLIERS
Removing data points which fall into below criteria:

1. living_measure greater than 9000


2. price greater than 4000000
3. romm_bed greater than 10
4. room_bath greater than 6

We have lost 20 records which is 0.09% of the data available. These records are extreme values for which we dont have much of data to provide
their better estimate. Hence removing them.

In [242]:

house_df_2=house_df_1[(house_df['living_measure']<=9000) & (house_df_1['price']<=4000000) &


(house_df_1['room_bed']<=10) & (house_df_1['room_bath']<=6) ]
house_df_2.shape

Out[242]:

(21593, 21)

In [243]:
house_df_2.columns

Out[243]:

Index(['price', 'room_bed', 'room_bath', 'living_measure', 'lot_measure',


'ceil', 'coast', 'sight', 'condition', 'quality', 'ceil_measure',
'basement', 'yr_built', 'yr_renovated', 'zipcode', 'lat', 'long',
'living_measure15', 'lot_measure15', 'furnished', 'total_area'],
dtype='object')

In [252]:

# Convert into dummies


house_df_final = pd.get_dummies(house_df_2, columns=['coast', 'quality', 'furnished'],drop_first=True)

In [253]:

house_df_final.columns

Out[253]:

Index(['price', 'room_bed', 'room_bath', 'living_measure', 'lot_measure',


'ceil', 'sight', 'condition', 'ceil_measure', 'basement', 'yr_built',
'yr_renovated', 'zipcode', 'lat', 'long', 'living_measure15',
'lot_measure15', 'total_area', 'coast_1', 'quality_3', 'quality_4',
'quality_5', 'quality_6', 'quality_7', 'quality_8', 'quality_9',
'quality_10', 'quality_11', 'quality_12', 'quality_13', 'furnished_1'],
dtype='object')

In [254]:

house_df_final.shape

Out[254]:

(21593, 31)

In [268]:

#Final Data columns


house_df_final.columns

Out[268]:
Index(['price', 'room_bed', 'room_bath', 'living_measure', 'lot_measure',
'ceil', 'sight', 'condition', 'ceil_measure', 'basement', 'yr_built',
'yr_renovated', 'zipcode', 'lat', 'long', 'living_measure15',
'lot_measure15', 'total_area', 'coast_1', 'quality_3', 'quality_4',
'quality_5', 'quality_6', 'quality_7', 'quality_8', 'quality_9',
'quality_10', 'quality_11', 'quality_12', 'quality_13', 'furnished_1'],
dtype='object')
Shows the Data Correlation between Attributes with Heatmap

In [256]:

#total_area is highly correlated with lot_measure, ceil_measure is highly correlated with living_measure
house_corr_2 = house_df_final.corr(method ='pearson')
house_corr_2.to_excel("house_corr_2.xls")

plt.figure(figsize=(35,20))
sns.heatmap(house_corr_2,cmap="coolwarm", annot=True,annot_kws={"size":9},fmt='.2')

Out[256]:

<matplotlib.axes._subplots.AxesSubplot at 0x225943454a8>

In [257]:

#creating a copy of the final dataframe


dff2=house_df_final.copy()

In [258]:

df_train, df_test = train_test_split(dff2, test_size=0.2, random_state=10)


df_train, df_val = train_test_split(df_train, test_size=0.2, random_state=10)

In [259]:

print(df_train.shape)
print(df_test.shape)
print(df_val.shape)

(13819, 31)
(4319, 31)
(3455, 31)
In [260]:

# Split the 'df_train' set into X and y


X_train2 = df_train.drop(['price'],axis=1)
y_train2 = df_train['price']
len_train=len(X_train2)
X_train2.shape
y_train2.head()

Out[260]:

1320 330000
16628 245000
2923 369000
15818 532000
4665 506400
Name: price, dtype: int64

In [261]:

# Split the 'df_val' set into X and y


X_val2 = df_val.drop(['price'],axis=1)
y_val2 = df_val['price']
len_val=len(X_val2)
X_val2.shape
y_val2.head()

Out[261]:

6030 225000
16781 373500
17420 325000
4147 260000
17992 233000
Name: price, dtype: int64

In [262]:
# Split the 'df_test' set into X and y
X_test2 = df_test.drop(['price'],axis=1)
y_test2 = df_test['price']
X_test2.shape
len_test=len(X_test2)
y_test2.head()

Out[262]:

19155 510000
10450 264500
14277 266000
7601 735000
6563 600000
Name: price, dtype: int64

Will use XGboost model apart from models that used earlier on dataset-1

Creating Dataframe for Results and Function to compute the scores for each model on its Train and Validation
datasets

In [24]:

#Creating empty dataframe to capture results


result_dff=pd.DataFrame()
In [25]:

#Function to give results of the models for its train and validation dataset.
#as input it requries model name to display, algorithm, train indepedent variables, train dependent variable,
#validation indepedent variables, validation dependent variable.
def result (model,pipe_model,X_train_set,y_train_set,X_val_set,y_val_set):
pipe_model.fit(X_train_set,y_train_set)
#predicting result over test data
y_train_predict= pipe_model.predict(X_train_set)
y_val_predict= pipe_model.predict(X_val_set)

trscore=r2_score(y_train_set,y_train_predict)
trRMSE=np.sqrt(mean_squared_error(y_train_set,y_train_predict))
trMSE=mean_squared_error(y_train_set,y_train_predict)
trMAE=mean_absolute_error(y_train_set,y_train_predict)

vlscore=r2_score(y_val,y_val_predict)
vlRMSE=np.sqrt(mean_squared_error(y_val,y_val_predict))
vlMSE=mean_squared_error(y_val,y_val_predict)
vlMAE=mean_absolute_error(y_val,y_val_predict)
result_df=pd.DataFrame({'Method':[model],'val score':vlscore,'RMSE_val':vlRMSE,'MSE_val':vlMSE,'MAE_vl': vlMA
E,
'train Score':trscore,'RMSE_tr': trRMSE,'MSE_tr': trMSE, 'MAE_tr': trMAE})
#Plot between actual and predicted values
plt.figure(figsize=(18,10))
sns.lineplot(range(len(y_val_set)),y_val_set,color='blue',linewidth=1.5)
sns.lineplot(range(len(y_val_set)),y_val_predict,color='hotpink',linewidth=.5)
plt.title('Actual and Predicted', fontsize=20) # Plot heading
plt.xlabel('Index', fontsize=10) # X-label
plt.ylabel('Values', fontsize=10) # Y-label

return result_df

LINEAR REGRESSION

In [26]:

#Starting with RFE first as there are many features


from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
In [27]:

clf=LinearRegression()
pipe_lr = Pipeline([('LR', clf)])
result_dff=pd.concat([result_dff,result('Linear Reg',pipe_lr,X_train,y_train,X_val,y_val)])
result_dff

Out[27]:

Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426

In [28]:

#checking the magnitude of coefficients


predictors = X_train.columns
coef = pd.Series(clf.coef_,predictors).sort_values()
coef.plot(kind='bar', title='Model Coefficients',color='darkblue',figsize=(10,5))

Out[28]:

<matplotlib.axes._subplots.AxesSubplot at 0x1ccf527d438>

RIDGE REGRESSION
In [29]:

from sklearn.linear_model import Ridge


from sklearn.pipeline import Pipeline
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
from sklearn.preprocessing import StandardScaler

In [30]:

clf=Ridge()
pipe_ridge = Pipeline([('Ridge', clf)])
result_dff=pd.concat([result_dff,result('Ridge_Reg_1',pipe_ridge,X_train,y_train,X_val,y_val)])
result_dff

Out[30]:

Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426

0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158


In [31]:

#checking the magnitude of coefficients


predictors = X_train.columns
coef = pd.Series(clf.coef_,predictors).sort_values()
coef.plot(kind='bar', title='Model Coefficients',color='darkblue',figsize=(10,5))

Out[31]:

<matplotlib.axes._subplots.AxesSubplot at 0x1ccf5c6cd30>

In [32]:

#Iteration 2
clf=Ridge(alpha=0.08)
pipe_ridge_1 = Pipeline([('Ridge',clf )])
result_dff=pd.concat([result_dff,result('Ridge_Reg_2',pipe_ridge_1,X_train,y_train,X_val,y_val)])
result_dff

Out[32]:

Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426

0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158

0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012


In [33]:

#checking the magnitude of coefficients


predictors = X_train.columns
coef = pd.Series(clf.coef_,predictors).sort_values()
coef.plot(kind='bar', title='Model Coefficients',color='darkblue',figsize=(10,5))

Out[33]:

<matplotlib.axes._subplots.AxesSubplot at 0x1ccf5d78358>

LASSO REGRESSION

In [34]:

from sklearn.linear_model import Lasso


In [35]:

clf=Lasso(alpha=10, max_iter=1000)
pipe_lasso_1 = Pipeline([('Lasso',clf )])
result_dff=pd.concat([result_dff,result('Lasso_Reg_1',pipe_lasso_1,X_train,y_train,X_val,y_val)])
result_dff

Out[35]:

Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426

0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158

0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012

0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034


In [36]:

#checking the magnitude of coefficients


predictors = X_train.columns
coef = pd.Series(clf.coef_,predictors).sort_values(ascending=False)
coef

Out[36]:

quality_13 1282757.93547
quality_12 720634.25526
lat 603385.77684
coast_1 515951.63187
furnished_1 356060.28711
quality_11 254292.72718
quality_8 51062.02158
sight 48526.08682
quality_3 47977.01520
room_bath 44364.15660
condition 35706.89861
ceil 28507.08484
living_measure 126.78842
living_measure15 33.97555
yr_renovated 23.50688
total_area 0.35066
quality_10 -0.00000
lot_measure -0.19029
lot_measure15 -0.29272
basement -8.54716
ceil_measure -15.67109
zipcode -512.28283
yr_built -2269.56051
quality_7 -16734.79835
room_bed -18988.08235
quality_6 -63515.14160
quality_4 -89548.16152
quality_5 -97142.30729
long -172566.03480
quality_9 -177720.13306
dtype: float64

KNN Regressor
In [37]:

from sklearn.neighbors import KNeighborsRegressor

pipe_knr = Pipeline([('KNNR', KNeighborsRegressor(n_neighbors=20,weights='distance'))])


result_dff=pd.concat([result_dff,result('KNN Reg',pipe_knr,X_train,y_train,X_val,y_val)])
result_dff

Out[37]:

Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426

0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158

0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012

0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034

0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898

Support Vector Regressor

In [38]:

#The model is not performing well at all.


#from sklearn.svm import SVR
#from sklearn.preprocessing import StandardScaler

#pipe_svr_1 = Pipeline([('scl', StandardScaler()),('SVR_1', SVR(kernel='rbf'))])


#result_dff=pd.concat([result_dff,result('SVR_1',pipe_svr_1,X_train_rfe,y_train,X_val_rfe,y_val)])
#result_dff

DECISION TREE

In [39]:

#Feature importance function


def feat_imp(model,X_data_set):
imp_feature_1=pd.DataFrame(model.feature_importances_, columns = ["Imp"], index = X_data_set.columns)
imp_feature_1=imp_feature_1.sort_values(by="Imp",ascending=False)
print(imp_feature_1)

#feature importance
plt.figure(figsize=(10,10))
imp_feature_1[:30].plot.bar(figsize=(15,5))

#First 20 and 30 feature importance sum


print("\nFirst 8 feature importance:\t",(imp_feature_1[:8].sum())*100)
print("\nFirst 12 feature importance:\t",(imp_feature_1[:12].sum())*100)
In [40]:

#Import library
from sklearn.tree import DecisionTreeRegressor

clf=DecisionTreeRegressor(random_state=1)
pipe_DT_1=Pipeline([('DT1',clf)])
result_dff=pd.concat([result_dff,result('DT1',pipe_DT_1,X_train,y_train,X_val,y_val)])
result_dff

Out[40]:

Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426

0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158

0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012

0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034

0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898

0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898

In [41]:

#Feature importance
feat_imp(clf,X_train)
Imp
furnished_1 0.33440
living_measure 0.19412
lat 0.17853
long 0.06748
coast_1 0.03510
ceil_measure 0.03389
yr_built 0.03233
living_measure15 0.03192
lot_measure 0.01480
zipcode 0.01341
lot_measure15 0.01192
total_area 0.00832
quality_9 0.00781
room_bath 0.00697
sight 0.00633
quality_8 0.00496
basement 0.00436
condition 0.00266
quality_12 0.00247
quality_10 0.00206
room_bed 0.00199
ceil 0.00180
yr_renovated 0.00080
quality_13 0.00048
quality_11 0.00044
quality_7 0.00030
quality_6 0.00026
quality_5 0.00008
quality_4 0.00000
quality_3 0.00000

First 8 feature importance: Imp 90.77687


dtype: float64

First 12 feature importance: Imp 95.62215


dtype: float64

<Figure size 720x720 with 0 Axes>

RANDOM FOREST REGRESSOR

In [42]:
from sklearn.ensemble import RandomForestRegressor
In [43]:

clf=RandomForestRegressor(random_state=2)
pipe_RF_1=Pipeline([('RF1',clf)])
result_dff=pd.concat([result_dff,result('RF1',pipe_RF_1,X_train,y_train,X_val,y_val)])
result_dff

Out[43]:

Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426

0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158

0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012

0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034

0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898

0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898

0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866

In [44]:

#Feature importance
feat_imp(clf,X_train)
Imp
furnished_1 0.30826
living_measure 0.23477
lat 0.17234
long 0.06825
living_measure15 0.03089
yr_built 0.02564
coast_1 0.02493
sight 0.01985
ceil_measure 0.01696
zipcode 0.01531
lot_measure15 0.01387
quality_9 0.01243
total_area 0.01047
lot_measure 0.00850
room_bath 0.00705
basement 0.00688
quality_8 0.00417
room_bed 0.00380
condition 0.00321
quality_12 0.00262
yr_renovated 0.00247
ceil 0.00221
quality_11 0.00169
quality_10 0.00148
quality_13 0.00096
quality_7 0.00063
quality_6 0.00030
quality_5 0.00005
quality_4 0.00001
quality_3 0.00000

First 8 feature importance: Imp 88.49277


dtype: float64

First 12 feature importance: Imp 94.34987


dtype: float64

<Figure size 720x720 with 0 Axes>


In [45]:

clf=RandomForestRegressor(n_estimators=50,max_depth=18,min_samples_leaf=10,random_state=3)
pipe_RF_2=Pipeline([('RF2',clf)])
result_dff=pd.concat([result_dff,result('RF2',pipe_RF_2,X_train,y_train,X_val,y_val)])
result_dff

Out[45]:

Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426

0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158

0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012

0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034

0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898

0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898

0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866

0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388

In [46]:

#Feature importance
feat_imp(clf,X_train)
Imp
furnished_1 0.34209
living_measure 0.25693
lat 0.18194
long 0.07106
living_measure15 0.02514
yr_built 0.02336
sight 0.01984
ceil_measure 0.01841
zipcode 0.01135
quality_9 0.00908
coast_1 0.00864
lot_measure15 0.00801
total_area 0.00561
quality_8 0.00449
lot_measure 0.00336
room_bath 0.00277
basement 0.00172
quality_12 0.00139
condition 0.00123
quality_11 0.00095
room_bed 0.00073
quality_10 0.00073
quality_7 0.00044
ceil 0.00036
yr_renovated 0.00022
quality_6 0.00017
quality_5 0.00001
quality_4 0.00000
quality_3 0.00000
quality_13 0.00000

First 8 feature importance: Imp 93.87443


dtype: float64

First 12 feature importance: Imp 97.58247


dtype: float64

<Figure size 720x720 with 0 Axes>

Gradient Boost Regressor


In [47]:

from sklearn.ensemble import GradientBoostingRegressor

clf=GradientBoostingRegressor(random_state=4)
pipe_GB_1=Pipeline([('GB1',clf)])
result_dff=pd.concat([result_dff,result('GB1',pipe_GB_1,X_train,y_train,X_val,y_val)])
result_dff

Out[47]:

Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426

0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158

0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012

0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034

0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898

0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898

0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866

0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388

0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443

In [48]:

#Feature importance
feat_imp(clf,X_train)
Imp
living_measure 0.32718
furnished_1 0.21738
lat 0.17507
long 0.06494
living_measure15 0.03217
coast_1 0.03081
yr_built 0.03081
sight 0.02848
zipcode 0.01718
quality_9 0.01411
ceil_measure 0.01139
quality_12 0.00933
quality_8 0.00850
room_bath 0.00848
quality_11 0.00673
quality_13 0.00363
lot_measure15 0.00331
condition 0.00300
basement 0.00221
total_area 0.00147
yr_renovated 0.00103
lot_measure 0.00079
quality_7 0.00052
ceil 0.00048
quality_10 0.00046
room_bed 0.00037
quality_6 0.00017
quality_3 0.00000
quality_4 0.00000
quality_5 0.00000

First 8 feature importance: Imp 90.68436


dtype: float64

First 12 feature importance: Imp 95.88511


dtype: float64

<Figure size 720x720 with 0 Axes>


In [49]:

clf=GradientBoostingRegressor(n_estimators=150,max_depth=5,random_state=5)
pipe_GB_2=Pipeline([('GB2',clf)])
result_dff=pd.concat([result_dff,result('GB2',pipe_GB_2,X_train,y_train,X_val,y_val)])
result_dff

Out[49]:

Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426

0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158

0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012

0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034

0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898

0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898

0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866

0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388

0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443

0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107

In [50]:

#Feature importance
feat_imp(clf,X_train)
Imp
living_measure 0.28697
furnished_1 0.22921
lat 0.17826
long 0.07063
living_measure15 0.04054
yr_built 0.03118
coast_1 0.03031
quality_9 0.02114
sight 0.02033
zipcode 0.01644
ceil_measure 0.01361
quality_8 0.00939
quality_10 0.00815
total_area 0.00797
room_bath 0.00609
lot_measure15 0.00533
lot_measure 0.00417
basement 0.00412
quality_12 0.00352
quality_11 0.00347
condition 0.00311
quality_13 0.00196
yr_renovated 0.00142
room_bed 0.00107
ceil 0.00096
quality_7 0.00053
quality_6 0.00009
quality_5 0.00004
quality_3 0.00000
quality_4 0.00000

First 8 feature importance: Imp 88.82445


dtype: float64

First 12 feature importance: Imp 94.80237


dtype: float64

<Figure size 720x720 with 0 Axes>

XGBOOST REGRESSOR
In [51]:

from xgboost.sklearn import XGBRegressor

clf=XGBRegressor(objective='reg:squarederror',random_state=6)
pipe_XGB_1=Pipeline([('XGB1',clf)])
result_dff=pd.concat([result_dff,result('XGB1',pipe_XGB_1,X_train,y_train,X_val,y_val)])
result_dff

Out[51]:

Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426

0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158

0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012

0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034

0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898

0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898

0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866

0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388

0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443

0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107

0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605

In [52]:

#Feature importance
feat_imp(clf,X_train)
Imp
furnished_1 0.44495
quality_9 0.15441
living_measure 0.08844
coast_1 0.04030
sight 0.03631
quality_8 0.03330
lat 0.03246
long 0.02696
quality_12 0.02049
yr_built 0.01917
living_measure15 0.01869
room_bath 0.01360
zipcode 0.01226
quality_11 0.01098
quality_7 0.00875
ceil_measure 0.00861
quality_13 0.00664
condition 0.00428
lot_measure15 0.00364
yr_renovated 0.00294
basement 0.00252
lot_measure 0.00238
ceil 0.00213
total_area 0.00198
quality_6 0.00197
room_bed 0.00186
quality_3 0.00000
quality_4 0.00000
quality_5 0.00000
quality_10 0.00000

First 8 feature importance: Imp 85.71182


dtype: float32

First 12 feature importance: Imp 92.90607


dtype: float32

<Figure size 720x720 with 0 Axes>


In [53]:

clf=XGBRegressor(n_estimators=150,max_depth=5,random_state=7)
pipe_XGB_2=Pipeline([('XGB2',clf)])
result_dff=pd.concat([result_dff,result('XGB2',pipe_XGB_2,X_train,y_train,X_val,y_val)])
result_dff

[18:09:21] WARNING: src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of re


g:squarederror.

Out[53]:

Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426

0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158

0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012

0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034

0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898

0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898

0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866

0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388

0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443

0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107

0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605

0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395

In [54]:

#Feature importance
feat_imp(clf,X_train)
Imp
furnished_1 0.59499
living_measure 0.06767
quality_9 0.06470
coast_1 0.04365
quality_8 0.03345
lat 0.03122
quality_10 0.03027
sight 0.02396
long 0.01961
quality_12 0.01526
living_measure15 0.01122
yr_built 0.01016
quality_11 0.00679
quality_13 0.00676
zipcode 0.00626
ceil_measure 0.00527
quality_7 0.00439
condition 0.00407
total_area 0.00362
room_bath 0.00278
lot_measure15 0.00274
yr_renovated 0.00219
lot_measure 0.00217
basement 0.00205
quality_6 0.00153
ceil 0.00153
room_bed 0.00113
quality_5 0.00055
quality_4 0.00000
quality_3 0.00000

First 8 feature importance: Imp 88.99167


dtype: float32

First 12 feature importance: Imp 94.61620


dtype: float32

<Figure size 720x720 with 0 Axes>

ADABOOST REGRESSOR
In [55]:

from sklearn.ensemble import AdaBoostRegressor

clf= AdaBoostRegressor(DecisionTreeRegressor(random_state=8))
pipe_ADAB_1=Pipeline([('ADAB1',clf)])
result_dff=pd.concat([result_dff,result('ADAB1',pipe_ADAB_1,X_train,y_train,X_val,y_val)])
result_dff

Out[55]:

Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426

0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158

0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012

0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034

0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898

0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898

0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866

0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388

0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443

0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107

0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605

0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395

0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105

In [56]:

#Feature importance
feat_imp(clf,X_train)
Imp
living_measure 0.50994
lat 0.09959
furnished_1 0.06601
long 0.06142
coast_1 0.04096
living_measure15 0.04011
sight 0.03042
ceil_measure 0.02662
yr_built 0.01886
lot_measure15 0.01721
zipcode 0.01391
total_area 0.01116
room_bath 0.01004
lot_measure 0.00888
quality_11 0.00824
basement 0.00793
quality_12 0.00540
quality_13 0.00373
quality_9 0.00355
room_bed 0.00343
ceil 0.00261
yr_renovated 0.00252
condition 0.00235
quality_8 0.00226
quality_10 0.00209
quality_7 0.00055
quality_6 0.00017
quality_5 0.00002
quality_4 0.00000
quality_3 0.00000

First 8 feature importance: Imp 87.50772


dtype: float64

First 12 feature importance: Imp 93.62163


dtype: float64

<Figure size 720x720 with 0 Axes>


In [57]:

clf= AdaBoostRegressor(DecisionTreeRegressor(max_depth=20),n_estimators=250,learning_rate=0.005,random_state=9)
pipe_ADAB_2=Pipeline([('ADAB2',clf)])
result_dff=pd.concat([result_dff,result('ADAB2',pipe_ADAB_2,X_train,y_train,X_val,y_val)])
result_dff

Out[57]:

Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426

0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158

0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012

0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034

0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898

0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898

0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866

0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388

0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443

0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107

0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605

0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395

0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105

0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662

In [58]:

#Feature importance
feat_imp(clf,X_train)
Imp
living_measure 0.31020
furnished_1 0.22982
lat 0.16848
long 0.07221
living_measure15 0.03353
coast_1 0.02876
yr_built 0.02456
ceil_measure 0.02078
sight 0.01669
zipcode 0.01550
lot_measure15 0.01533
total_area 0.01060
lot_measure 0.00862
quality_9 0.00846
room_bath 0.00701
basement 0.00560
quality_8 0.00364
room_bed 0.00335
condition 0.00317
quality_12 0.00265
quality_11 0.00260
yr_renovated 0.00229
ceil 0.00218
quality_10 0.00175
quality_13 0.00100
quality_7 0.00080
quality_6 0.00032
quality_5 0.00008
quality_4 0.00001
quality_3 0.00000

First 8 feature importance: Imp 88.83441


dtype: float64

First 12 feature importance: Imp 94.64739


dtype: float64

<Figure size 720x720 with 0 Axes>

BAGGING REGRESSION
In [59]:

from sklearn.ensemble import BaggingRegressor

clf= BaggingRegressor(random_state=10)
pipe_BAG_1=Pipeline([('BAG1',clf)])
result_dff=pd.concat([result_dff,result('BAG1',pipe_BAG_1,X_train,y_train,X_val,y_val)])
result_dff

Out[59]:

Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426

0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158

0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012

0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034

0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898

0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898

0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866

0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388

0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443

0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107

0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605

0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395

0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105

0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662

0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947


In [60]:

#Feature Importance
feature_importances = np.mean([ tree.feature_importances_ for tree in clf.estimators_], axis=0)
bg_imp_feature=pd.DataFrame(feature_importances, columns = ["Imp"],index=X_train.columns)
bg_imp_feature.sort_values(by="Imp",ascending=False)

Out[60]:

Imp

furnished_1 0.32952

living_measure 0.21044

lat 0.17412

long 0.06964

living_measure15 0.03440

yr_built 0.03000

coast_1 0.02448

ceil_measure 0.01991

zipcode 0.01548

sight 0.01531

lot_measure15 0.01498

total_area 0.00974

quality_9 0.00967

lot_measure 0.00809

room_bath 0.00737

basement 0.00434

room_bed 0.00403

quality_8 0.00399

condition 0.00313

yr_renovated 0.00237

ceil 0.00228

quality_11 0.00182

quality_12 0.00148

quality_10 0.00137

quality_13 0.00084

quality_7 0.00068

quality_6 0.00042

quality_5 0.00010

quality_4 0.00001

quality_3 0.00000

In [61]:

clf= BaggingRegressor(DecisionTreeRegressor(max_depth=12),n_estimators=250,random_state=11)
pipe_BAG_2=Pipeline([('BAG2',clf)])
result_dff=pd.concat([result_dff,result('BAG2',pipe_BAG_2,X_train,y_train,X_val,y_val)])
result_dff
Out[61]:

Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426

0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158

0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012

0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034

0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898

0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898

0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866

0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388

0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443

0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107

0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605

0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395

0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105

0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662

0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947

0 BAG2 0.87271 121403.44959 14738797572.78922 71079.17635 0.95589 73206.74410 5359227381.64863 48075.77140


In [62]:

#Feature Importance
pd.options.display.float_format = '{:.5f}'.format
feature_importances = np.mean([ tree.feature_importances_ for tree in clf.estimators_], axis=0)
bg_imp_feature=pd.DataFrame(feature_importances, columns = ["Imp"],index=X_train.columns)
bg_imp_feature.sort_values(by="Imp",ascending=False)

Out[62]:

Imp

furnished_1 0.31748

living_measure 0.23735

lat 0.17613

long 0.06834

living_measure15 0.02947

coast_1 0.02891

yr_built 0.02585

ceil_measure 0.01984

sight 0.01504

zipcode 0.01456

lot_measure15 0.01222

quality_9 0.00935

total_area 0.00814

lot_measure 0.00665

room_bath 0.00590

basement 0.00445

quality_8 0.00431

quality_12 0.00272

quality_11 0.00231

condition 0.00229

room_bed 0.00227

yr_renovated 0.00182

ceil 0.00158

quality_10 0.00150

quality_13 0.00076

quality_7 0.00048

quality_6 0.00021

quality_5 0.00006

quality_4 0.00000

quality_3 0.00000

In [ ]:

Dataset-2 model performance Summary

We have used Linear Regression, Ridge and Lasso, KNN, Ensemble Techniques - Decision Trees, Random Forest, Bagging, AdaBoost,
Gradient Boost and XGBoost - its gradient boost with regularization and its faster. R2 score on validation in range 70%-87% with RMSE in range
76000-107000. The model is showing better results.Lets hypertune to see if results could be improved further. Will use Random Forest,
Gradient Boosting, XGBoost and AdaBoost hypertuning. Dropping features which are zero or very close to zero in all above 4 algos -
quality_12, quality_3, quality_4.

Kindly refer Excel sheet to compare the results.

In [ ]:

#Dropping features
X_train_ht=X_train.drop(['quality_5', 'quality_3', 'quality_4'],1)
X_test_ht=X_test.drop(['quality_5', 'quality_3', 'quality_4'],1)
X_val_ht=X_val.drop(['quality_5', 'quality_3', 'quality_4'],1)
In [ ]:

skf = KFold(n_splits=5, random_state=12)

RANDOM FOREST HYPERTUNE

In [65]:

#Tuning of Random Forest


RF_ht = RandomForestRegressor()

params = {"n_estimators": np.arange(76,84,1),"max_depth": np.arange(16,20,1),


"max_features":np.arange(6,9,1),'min_samples_leaf': range(5, 8, 1),
'min_samples_split': range(18, 20, 1)}

RF_GV_1 = GridSearchCV(estimator = RF_ht, param_grid = params,cv=skf,verbose=1,return_train_score=True,n_jobs=2)


RF_GV_1.fit(X_train_ht,y_train)

Fitting 5 folds for each of 576 candidates, totalling 2880 fits

[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.


[Parallel(n_jobs=2)]: Done 46 tasks | elapsed: 35.0s
[Parallel(n_jobs=2)]: Done 196 tasks | elapsed: 2.3min
[Parallel(n_jobs=2)]: Done 446 tasks | elapsed: 5.3min
[Parallel(n_jobs=2)]: Done 796 tasks | elapsed: 10.0min
[Parallel(n_jobs=2)]: Done 1246 tasks | elapsed: 15.5min
[Parallel(n_jobs=2)]: Done 1796 tasks | elapsed: 22.5min
[Parallel(n_jobs=2)]: Done 2446 tasks | elapsed: 30.7min
[Parallel(n_jobs=2)]: Done 2880 out of 2880 | elapsed: 36.6min finished

Out[65]:

GridSearchCV(cv=KFold(n_splits=5, random_state=12, shuffle=False),


error_score='raise-deprecating',
estimator=RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
max_features='auto', max_leaf_nodes=None,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, n_estimators='warn', n_jobs=None,
oob_score=False, random_state=None, verbose=0, warm_start=False),
fit_params=None, iid='warn', n_jobs=2,
param_grid={'n_estimators': array([76, 77, 78, 79, 80, 81, 82, 83]), 'max_depth': array([16,
17, 18, 19]), 'max_features': array([6, 7, 8]), 'min_samples_leaf': range(5, 8), 'min_samples_split'
: range(18, 20)},
pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
scoring=None, verbose=1)

In [66]:

# results of grid search CV


RF_results = pd.DataFrame(RF_GV_1.cv_results_)

#parameters best value


best_score_rf = RF_GV_1.best_score_
best_rf = RF_GV_1.best_params_
best_rf

Out[66]:

{'max_depth': 18,
'max_features': 8,
'min_samples_leaf': 5,
'min_samples_split': 18,
'n_estimators': 81}
In [67]:

rf_best = RandomForestRegressor(max_depth= 18, max_features= 8,n_estimators=80,min_samples_leaf=5,min_samples_spl


it=18,
random_state=14)

result_dff=pd.concat([result_dff,result('RF_ht',rf_best,X_train_ht,y_train,X_val_ht,y_val)])
result_dff

Out[67]:

Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426

0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158

0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012

0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034

0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898

0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898

0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866

0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388

0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443

0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107

0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605

0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395

0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105

0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662

0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947

0 BAG2 0.87271 121403.44959 14738797572.78922 71079.17635 0.95589 73206.74410 5359227381.64863 48075.77140

0 RF_ht 0.86043 127122.92645 16160238428.30151 72573.17472 0.90876 105283.74570 11084667108.75019 57545.17898


In [68]:

#Feature importance
feat_imp(rf_best,X_train_ht)

Imp
living_measure 0.20762
furnished_1 0.16841
lat 0.15958
living_measure15 0.08067
ceil_measure 0.07752
long 0.05371
room_bath 0.04081
yr_built 0.03216
sight 0.02628
zipcode 0.02266
quality_9 0.02174
coast_1 0.01627
basement 0.01216
quality_8 0.01187
total_area 0.01125
lot_measure15 0.01110
quality_11 0.00995
lot_measure 0.00946
quality_10 0.00673
quality_7 0.00643
condition 0.00437
quality_12 0.00272
quality_6 0.00216
room_bed 0.00135
yr_renovated 0.00130
ceil 0.00118
quality_13 0.00057

First 8 feature importance: Imp 82.04737


dtype: float64

First 12 feature importance: Imp 90.74157


dtype: float64

<Figure size 720x720 with 0 Axes>

GRADIENT BOOST HYPERTUNE


In [69]:

GB_ht=GradientBoostingRegressor()
params = {"n_estimators": [138,142,1],"learning_rate":[0.08,0.09],"max_depth": np.arange(8, 11,1),
"max_features":np.arange(5,8,1),'min_samples_leaf': range(16, 21, 1)}
GB_GV_1 = GridSearchCV(estimator = GB_ht, param_grid = params,cv=skf,verbose=1,return_train_score=True,n_jobs=2)
GB_GV_1.fit(X_train_ht,y_train)

# results of grid search CV


GB_results = pd.DataFrame(GB_GV_1.cv_results_)
#parameters best value
best_score_rf = GB_GV_1.best_score_
best_gb = GB_GV_1.best_params_
best_gb

Fitting 5 folds for each of 270 candidates, totalling 1350 fits

[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.


[Parallel(n_jobs=2)]: Done 46 tasks | elapsed: 20.2s
[Parallel(n_jobs=2)]: Done 196 tasks | elapsed: 1.5min
[Parallel(n_jobs=2)]: Done 446 tasks | elapsed: 3.9min
[Parallel(n_jobs=2)]: Done 796 tasks | elapsed: 7.1min
[Parallel(n_jobs=2)]: Done 1246 tasks | elapsed: 11.2min
[Parallel(n_jobs=2)]: Done 1350 out of 1350 | elapsed: 12.4min finished

Out[69]:

{'learning_rate': 0.09,
'max_depth': 8,
'max_features': 7,
'min_samples_leaf': 17,
'n_estimators': 142}

In [70]:

gb_best = GradientBoostingRegressor(learning_rate= 0.09, n_estimators= 150,max_depth= 10,


max_features= 7,min_samples_leaf=19)

result_dff=pd.concat([result_dff,result('GB_ht',gb_best,X_train_ht,y_train,X_val_ht,y_val)])
result_dff
Out[70]:

Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426

0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158

0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012

0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034

0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898

0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898

0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866

0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388

0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443

0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107

0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605

0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395

0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105

0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662

0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947

0 BAG2 0.87271 121403.44959 14738797572.78922 71079.17635 0.95589 73206.74410 5359227381.64863 48075.77140

0 RF_ht 0.86043 127122.92645 16160238428.30151 72573.17472 0.90876 105283.74570 11084667108.75019 57545.17898

0 GB_ht 0.89602 109723.26914 12039195791.19686 64651.56480 0.96775 62595.11219 3918148069.74015 41729.66244


In [71]:

#Feature importance
feat_imp(gb_best,X_train_ht)

Imp
living_measure 0.21739
lat 0.15607
furnished_1 0.13874
living_measure15 0.10908
long 0.06063
ceil_measure 0.05424
room_bath 0.04955
sight 0.03128
yr_built 0.02943
coast_1 0.02644
zipcode 0.02530
lot_measure15 0.01702
quality_9 0.01336
total_area 0.01053
lot_measure 0.01020
basement 0.00833
condition 0.00829
quality_7 0.00649
quality_12 0.00592
quality_8 0.00494
quality_11 0.00487
quality_10 0.00320
quality_6 0.00318
room_bed 0.00243
yr_renovated 0.00185
ceil 0.00125
quality_13 0.00000

First 8 feature importance: Imp 81.69770


dtype: float64

First 12 feature importance: Imp 91.51647


dtype: float64

<Figure size 720x720 with 0 Axes>

ADABOOST HYPERTUNE
In [72]:

ADAB_ht=AdaBoostRegressor(DecisionTreeRegressor(max_depth=28))
params = {"n_estimators": [176,182,1],"learning_rate":[0.4,0.5,0.6],'loss':['linear','square']}
ADAB_GV_1 = GridSearchCV(estimator = ADAB_ht, param_grid = params,cv=skf,verbose=1,return_train_score=True,n_jobs
=2)
ADAB_GV_1.fit(X_train_ht,y_train)

Fitting 5 folds for each of 18 candidates, totalling 90 fits

[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.


[Parallel(n_jobs=2)]: Done 46 tasks | elapsed: 3.9min
[Parallel(n_jobs=2)]: Done 90 out of 90 | elapsed: 7.5min finished

Out[72]:

GridSearchCV(cv=KFold(n_splits=5, random_state=12, shuffle=False),


error_score='raise-deprecating',
estimator=AdaBoostRegressor(base_estimator=DecisionTreeRegressor(criterion='mse', max_depth=2
8, max_features=None,
max_leaf_nodes=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
presort=False, random_state=None, splitter='best'),
learning_rate=1.0, loss='linear', n_estimators=50,
random_state=None),
fit_params=None, iid='warn', n_jobs=2,
param_grid={'n_estimators': [176, 182, 1], 'learning_rate': [0.4, 0.5, 0.6], 'loss': ['linear
', 'square']},
pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
scoring=None, verbose=1)

In [73]:

# results of grid search CV


ADAB_results = pd.DataFrame(ADAB_GV_1.cv_results_)
#parameters best value
best_score_rf = ADAB_GV_1.best_score_
best_adab = ADAB_GV_1.best_params_
best_adab

Out[73]:

{'learning_rate': 0.5, 'loss': 'linear', 'n_estimators': 176}

In [74]:

adab_best = AdaBoostRegressor(DecisionTreeRegressor(max_depth=28),n_estimators=180,learning_rate=0.5,loss='linear
',
random_state=15)

result_dff=pd.concat([result_dff,result('ADAB_ht',adab_best,X_train_ht,y_train,X_val_ht,y_val)])
result_dff
Out[74]:

Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426

0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158

0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012

0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034

0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898

0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898

0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866

0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388

0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443

0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107

0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605

0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395

0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105

0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662

0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947

0 BAG2 0.87271 121403.44959 14738797572.78922 71079.17635 0.95589 73206.74410 5359227381.64863 48075.77140

0 RF_ht 0.86043 127122.92645 16160238428.30151 72573.17472 0.90876 105283.74570 11084667108.75019 57545.17898

0 GB_ht 0.89602 109723.26914 12039195791.19686 64651.56480 0.96775 62595.11219 3918148069.74015 41729.66244

0 ADAB_ht 0.89201 111821.98299 12504155880.21770 67601.23755 0.99349 28117.84264 790613074.89588 14412.70670


In [75]:

#Feature importance
feat_imp(adab_best,X_train_ht)

Imp
living_measure 0.48898
furnished_1 0.10561
lat 0.09726
long 0.05701
coast_1 0.03784
living_measure15 0.03747
sight 0.02427
ceil_measure 0.02000
yr_built 0.01993
lot_measure15 0.01562
zipcode 0.01534
room_bath 0.01240
total_area 0.01094
lot_measure 0.00964
basement 0.00810
quality_9 0.00798
quality_11 0.00571
quality_12 0.00445
room_bed 0.00382
yr_renovated 0.00322
condition 0.00291
quality_10 0.00286
ceil 0.00268
quality_8 0.00253
quality_13 0.00252
quality_7 0.00073
quality_6 0.00017

First 8 feature importance: Imp 86.84492


dtype: float64

First 12 feature importance: Imp 93.17415


dtype: float64

<Figure size 720x720 with 0 Axes>

XGBoost Regressor
In [76]:

#Regularization using GridSearchCV - 1st Iteration


XGB_ht_1=XGBRegressor(objective='reg:squarederror')
params1 = {
"colsample_bytree": [i/100.0 for i in range(66,74,2)],
"learning_rate": [0.2,0.22,0.24],
"n_estimators": [185,188,1],
"subsample": [i/100.0 for i in range(62,68,1)]
}
XGB_GV_1 = GridSearchCV(estimator = XGB_ht_1, param_grid = params1,
cv=skf,
verbose = 1,
return_train_score=True,n_jobs=2)
XGB_GV_1.fit(X_train_ht,y_train)

Fitting 5 folds for each of 216 candidates, totalling 1080 fits

[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.


[Parallel(n_jobs=2)]: Done 46 tasks | elapsed: 27.5s
[Parallel(n_jobs=2)]: Done 257 tasks | elapsed: 1.8min
[Parallel(n_jobs=2)]: Done 617 tasks | elapsed: 4.3min
[Parallel(n_jobs=2)]: Done 1077 out of 1080 | elapsed: 7.5min remaining: 1.2s
[Parallel(n_jobs=2)]: Done 1080 out of 1080 | elapsed: 7.5min finished

Out[76]:

GridSearchCV(cv=KFold(n_splits=5, random_state=12, shuffle=False),


error_score='raise-deprecating',
estimator=XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, gamma=0,
importance_type='gain', learning_rate=0.1, max_delta_step=0,
max_depth=3, min_child_weight=1, missing=None, n_estimators=100,
n_jobs=1, nthread=None, objective='reg:squarederror',
random_state=0, reg_alpha=0, reg_lambda=1, scale_pos_weight=1,
seed=None, silent=None, subsample=1, verbosity=1),
fit_params=None, iid='warn', n_jobs=2,
param_grid={'colsample_bytree': [0.66, 0.68, 0.7, 0.72], 'learning_rate': [0.2, 0.22, 0.24],
'n_estimators': [185, 188, 1], 'subsample': [0.62, 0.63, 0.64, 0.65, 0.66, 0.67]},
pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
scoring=None, verbose=1)

In [77]:

# results of grid search CV


XGB_results_1 = pd.DataFrame(XGB_GV_1.cv_results_)
#parameters best value
best_score_xgb_1 = XGB_GV_1.best_score_
best_xgb_1 = XGB_GV_1.best_params_
best_xgb_1

Out[77]:

{'colsample_bytree': 0.68,
'learning_rate': 0.2,
'n_estimators': 185,
'subsample': 0.67}

In [78]:

#Choosing best parameter from 1st Iteration


xgb_best_1 = XGBRegressor(colsample_bytree=0.7,learning_rate=0.22,n_estimators=186,subsample=0.65,objective='reg:
squarederror',
random_state=16)

result_dff=pd.concat([result_dff,result('xgb_1_ht',xgb_best_1,X_train_ht,y_train,X_val_ht,y_val)])
result_dff
Out[78]:

Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426

0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158

0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012

0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034

0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898

0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898

0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866

0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388

0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443

0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107

0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605

0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395

0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105

0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662

0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947

0 BAG2 0.87271 121403.44959 14738797572.78922 71079.17635 0.95589 73206.74410 5359227381.64863 48075.77140

0 RF_ht 0.86043 127122.92645 16160238428.30151 72573.17472 0.90876 105283.74570 11084667108.75019 57545.17898

0 GB_ht 0.89602 109723.26914 12039195791.19686 64651.56480 0.96775 62595.11219 3918148069.74015 41729.66244

0 ADAB_ht 0.89201 111821.98299 12504155880.21770 67601.23755 0.99349 28117.84264 790613074.89588 14412.70670

0 xgb_1_ht 0.88282 116484.15375 13568558075.71564 70335.11906 0.93052 91871.93035 8440451586.23162 61776.77670


In [79]:

#Feature importance
feat_imp(xgb_best_1,X_train_ht)

Imp
furnished_1 0.45991
living_measure 0.08495
quality_9 0.07530
lat 0.05495
sight 0.04581
coast_1 0.04306
quality_8 0.02980
long 0.02709
quality_12 0.02143
living_measure15 0.01880
quality_6 0.01387
quality_11 0.01343
quality_13 0.01221
zipcode 0.01197
room_bath 0.01166
yr_built 0.01035
condition 0.01023
quality_10 0.00940
ceil_measure 0.00728
lot_measure15 0.00666
basement 0.00661
total_area 0.00550
ceil 0.00534
yr_renovated 0.00469
lot_measure 0.00348
room_bed 0.00313
quality_7 0.00312

First 8 feature importance: Imp 82.08646


dtype: float32

First 12 feature importance: Imp 88.83791


dtype: float32

<Figure size 720x720 with 0 Axes>


In [80]:

#Regularization using GridSearchCV - 2nd Iteration

params2 = {
'min_child_weight':[6,7,8,9,10],"max_depth": [3,4,5],
}

xgb_best_2 = GridSearchCV(estimator = xgb_best_1, param_grid = params2,


cv=skf,
verbose = 1,
return_train_score=True,n_jobs=2)

xgb_best_2.fit(X_train_ht, y_train)

# results of grid search CV


XGB_results_2 = pd.DataFrame(xgb_best_2.cv_results_)
XGB_results_2

#parameters best value


best_score_xgb_2 = xgb_best_2.best_score_
best_xgb_2 = xgb_best_2.best_params_
best_xgb_2

Fitting 5 folds for each of 15 candidates, totalling 75 fits

[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.


[Parallel(n_jobs=2)]: Done 46 tasks | elapsed: 31.6s
[Parallel(n_jobs=2)]: Done 75 out of 75 | elapsed: 59.3s finished

Out[80]:

{'max_depth': 5, 'min_child_weight': 7}

In [81]:

#Choosing best parameter from 2nd Iteration


xgb_best_2 = XGBRegressor(colsample_bytree=0.7,learning_rate=0.22,n_estimators=186,subsample=0.65,objective='reg:
squarederror',
random_state=17,max_depth=4,min_child_weight=8)
result_dff=pd.concat([result_dff,result('xgb_2_ht',xgb_best_2,X_train_ht,y_train,X_val_ht,y_val)])
result_dff
Out[81]:

Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426

0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158

0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012

0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034

0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898

0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898

0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866

0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388

0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443

0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107

0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605

0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395

0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105

0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662

0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947

0 BAG2 0.87271 121403.44959 14738797572.78922 71079.17635 0.95589 73206.74410 5359227381.64863 48075.77140

0 RF_ht 0.86043 127122.92645 16160238428.30151 72573.17472 0.90876 105283.74570 11084667108.75019 57545.17898

0 GB_ht 0.89602 109723.26914 12039195791.19686 64651.56480 0.96775 62595.11219 3918148069.74015 41729.66244

0 ADAB_ht 0.89201 111821.98299 12504155880.21770 67601.23755 0.99349 28117.84264 790613074.89588 14412.70670

0 xgb_1_ht 0.88282 116484.15375 13568558075.71564 70335.11906 0.93052 91871.93035 8440451586.23162 61776.77670

0 xgb_2_ht 0.88877 113483.76321 12878564511.84933 69351.16691 0.94593 81048.97730 6568936720.98572 55372.52924


In [82]:

#Feature importance
feat_imp(xgb_best_2,X_train_ht)

Imp
furnished_1 0.46173
quality_9 0.08325
living_measure 0.07545
coast_1 0.06976
lat 0.04473
sight 0.02880
room_bath 0.02802
quality_8 0.02780
long 0.02034
quality_10 0.01916
quality_7 0.01823
yr_built 0.01647
living_measure15 0.01506
quality_12 0.01277
zipcode 0.01129
quality_11 0.01026
quality_13 0.00990
lot_measure15 0.00685
condition 0.00637
ceil_measure 0.00625
lot_measure 0.00535
room_bed 0.00455
total_area 0.00408
ceil 0.00406
basement 0.00383
yr_renovated 0.00322
quality_6 0.00243

First 8 feature importance: Imp 81.95339


dtype: float32

First 12 feature importance: Imp 89.37298


dtype: float32

<Figure size 720x720 with 0 Axes>


In [83]:

#Regularization using GridSearchCV - 3rd Iteration

params3 = {
'gamma':[i/1.0 for i in range(50,55,1)]
}

xgb_best_3 = GridSearchCV(estimator = xgb_best_2, param_grid = params3,


cv=skf,
verbose = 1,
return_train_score=True)

xgb_best_3.fit(X_train_ht, y_train)

# results of grid search CV


XGB_results_3 = pd.DataFrame(xgb_best_3.cv_results_)
XGB_results_3

#parameters best value


best_score_xgb_3 = xgb_best_3.best_score_
best_xgb_3 = xgb_best_3.best_params_
best_xgb_3

Fitting 5 folds for each of 5 candidates, totalling 25 fits

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


[Parallel(n_jobs=1)]: Done 25 out of 25 | elapsed: 39.0s finished

Out[83]:

{'gamma': 50.0}

In [84]:

#Choosing best parameter from 3rd Iteration


xgb_best_3 = XGBRegressor(colsample_bytree=0.7,learning_rate=0.22,n_estimators=186,subsample=0.65,objective='reg:
squarederror',
random_state=18,max_depth=4,min_child_weight=8,reg_lambda=52)
result_dff=pd.concat([result_dff,result('xgb_3_ht',xgb_best_3,X_train_ht,y_train,X_val_ht,y_val)])
result_dff
Out[84]:

Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426

0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158

0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012

0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034

0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898

0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898

0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866

0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388

0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443

0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107

0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605

0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395

0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105

0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662

0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947

0 BAG2 0.87271 121403.44959 14738797572.78922 71079.17635 0.95589 73206.74410 5359227381.64863 48075.77140

0 RF_ht 0.86043 127122.92645 16160238428.30151 72573.17472 0.90876 105283.74570 11084667108.75019 57545.17898

0 GB_ht 0.89602 109723.26914 12039195791.19686 64651.56480 0.96775 62595.11219 3918148069.74015 41729.66244

0 ADAB_ht 0.89201 111821.98299 12504155880.21770 67601.23755 0.99349 28117.84264 790613074.89588 14412.70670

0 xgb_1_ht 0.88282 116484.15375 13568558075.71564 70335.11906 0.93052 91871.93035 8440451586.23162 61776.77670

0 xgb_2_ht 0.88877 113483.76321 12878564511.84933 69351.16691 0.94593 81048.97730 6568936720.98572 55372.52924

0 xgb_3_ht 0.89860 108356.33811 11741096009.16987 67404.86000 0.93004 92192.80765 8499513782.44646 60276.22610


In [85]:

#Feature importance
feat_imp(xgb_best_3,X_train_ht)

Imp
furnished_1 0.55026
living_measure 0.11252
coast_1 0.04955
sight 0.04906
lat 0.04026
quality_8 0.03380
long 0.01515
quality_6 0.01480
quality_11 0.01418
quality_12 0.01409
living_measure15 0.01227
quality_9 0.00959
zipcode 0.00910
condition 0.00887
quality_10 0.00752
ceil_measure 0.00718
yr_built 0.00665
total_area 0.00638
yr_renovated 0.00607
quality_13 0.00578
room_bath 0.00527
room_bed 0.00497
ceil 0.00429
lot_measure 0.00399
basement 0.00379
lot_measure15 0.00335
quality_7 0.00128

First 8 feature importance: Imp 86.54022


dtype: float32

First 12 feature importance: Imp 91.55212


dtype: float32

<Figure size 720x720 with 0 Axes>

We have executed many models and post comparing results we hyper tuned four models. All models are working well with R2 score greater
than 86% RMSE is below 132600.

But best of of all is Xtreme Gradient boost - which is enhanced version of gradient boost. It includes regularisation and is faster too. Its giving
R2 score of around 89.5% with RMSE of around 109000.

Moving forward this model can be improved further as dont have much data for very high priced houses. So when more data comes in we can
revisit our model and make mecessary changes to accommodate more variation in data to deliver better results, maybe try to decrease RMSE.

Finally lets run our model on test data, which we havent used till now and see how it performs.

Executing xgb_3_ht on test data set


In [86]:

result_dff=pd.concat([result_dff,result('xgb_test',xgb_best_3,X_test_ht,y_test,X_val_ht,y_val)])
result_dff

Out[86]:

Method val score RMSE_val MSE_val MAE_vl train Score RMSE_tr MSE_tr MAE_tr

0 Linear Reg 0.71763 180818.45737 32695314526.97058 117107.93415 0.72770 181882.36852 33081195977.82660 116936.92426

0 Ridge_Reg_1 0.71810 180667.04663 32640581737.12561 117098.44994 0.72763 181906.28901 33089897982.93209 116967.24158

0 Ridge_Reg_2 0.71767 180805.99847 32690809082.18658 117107.34765 0.72770 181882.56950 33081269088.65812 116939.62012

0 Lasso_Reg_1 0.71779 180767.09977 32676744360.84573 117084.23084 0.72769 181885.74852 33082425513.47692 116934.69034

0 KNN Reg 0.49520 241764.35295 58450002356.08145 151384.98932 0.99935 8894.71112 79115885.95018 727.31898

0 DT1 0.73194 176176.01494 31037988240.89530 98457.42069 0.99935 8894.71112 79115885.95018 727.31898

0 RF1 0.86124 126756.92547 16067318155.28575 72879.33149 0.97466 55482.79772 3078340842.83595 29823.87866

0 RF2 0.84822 132566.13475 17573780083.03317 73418.66929 0.90321 108435.52475 11758263028.16183 58027.92388

0 GB1 0.87310 121214.62351 14692984951.50343 75637.11090 0.89488 113007.60730 12770719307.93105 72392.61443

0 GB2 0.89502 110253.77869 12155895714.51771 65759.78139 0.95298 75577.21752 5711915808.16859 52291.11107

0 XGB1 0.86956 122894.91019 15103158949.78271 75968.92396 0.89173 114686.85442 13153074577.05989 72521.95605

0 XGB2 0.89601 109731.66479 12041038257.43423 65855.25071 0.94929 78488.91326 6160509504.93648 53128.68395

0 ADAB1 0.87908 118325.81807 14000999222.57086 68781.99128 0.99723 18355.19915 336913335.99659 7470.87105

0 ADAB2 0.87267 121421.26846 14743124435.21639 69328.10223 0.99907 10652.94303 113485195.21390 2048.17662

0 BAG1 0.85817 128147.07266 16421672232.30286 73506.65535 0.97328 56979.84577 3246702823.86075 30296.58947

0 BAG2 0.87271 121403.44959 14738797572.78922 71079.17635 0.95589 73206.74410 5359227381.64863 48075.77140

0 RF_ht 0.86043 127122.92645 16160238428.30151 72573.17472 0.90876 105283.74570 11084667108.75019 57545.17898

0 GB_ht 0.89602 109723.26914 12039195791.19686 64651.56480 0.96775 62595.11219 3918148069.74015 41729.66244

0 ADAB_ht 0.89201 111821.98299 12504155880.21770 67601.23755 0.99349 28117.84264 790613074.89588 14412.70670

0 xgb_1_ht 0.88282 116484.15375 13568558075.71564 70335.11906 0.93052 91871.93035 8440451586.23162 61776.77670

0 xgb_2_ht 0.88877 113483.76321 12878564511.84933 69351.16691 0.94593 81048.97730 6568936720.98572 55372.52924

0 xgb_3_ht 0.89860 108356.33811 11741096009.16987 67404.86000 0.93004 92192.80765 8499513782.44646 60276.22610

0 xgb_test 0.87484 120381.47322 14491699093.75940 72694.97007 0.94998 78343.92038 6137769859.95983 53777.32335


In [87]:

#Feature importance
feat_imp(xgb_best_3,X_test_ht)

Imp
furnished_1 0.53507
living_measure 0.13423
sight 0.05095
coast_1 0.03399
lat 0.03394
quality_9 0.02366
quality_8 0.02151
long 0.01854
quality_7 0.01645
ceil_measure 0.01541
room_bath 0.01410
living_measure15 0.01192
condition 0.01068
yr_renovated 0.00998
yr_built 0.00925
quality_11 0.00810
zipcode 0.00625
lot_measure15 0.00602
total_area 0.00579
quality_6 0.00567
basement 0.00538
quality_12 0.00534
quality_10 0.00493
lot_measure 0.00478
ceil 0.00428
room_bed 0.00379
quality_13 0.00000

First 8 feature importance: Imp 85.18983


dtype: float32

First 12 feature importance: Imp 90.97836


dtype: float32

<Figure size 720x720 with 0 Axes>

CALCULATING CONFIDENCE INTERVAL ON THE FINAL SELECTED MODEL at 95% ALPHA


In [88]:

from sklearn.model_selection import KFold


from sklearn.model_selection import cross_val_score

num_folds = 200
seed = 7

kfold = KFold(n_splits=num_folds, random_state=seed)


results = cross_val_score(xgb_best_3, X_test_ht, y_test, cv=kfold)
print(results)
print("Accuracy: %.3f%% (%.3f%%)" % (results.mean()*100.0, results.std()*100.0))

[ 0.90391651 0.90916102 0.90837406 0.92102958 0.84137223 0.85154394


0.79054082 0.97225657 0.62255882 0.95655541 0.87971233 0.54938847
0.919873 0.90449203 0.91960521 0.82845636 0.85087562 0.84624357
0.84949297 0.79902964 0.88093633 0.7965441 0.85767605 0.89117899
0.87695964 0.81590065 0.77554087 0.82172976 0.89524705 0.60028268
0.91819488 0.7676954 0.92467382 0.76400042 -0.01087648 0.94301005
0.7988163 0.8973989 0.80375734 0.87449297 0.95865757 0.9275524
0.9097657 0.91836083 0.92456681 0.96787804 0.8355066 0.97563326
0.90399211 0.89793941 0.85086961 0.89391916 0.59636222 0.94398635
0.53656514 0.87802398 0.86956142 0.86946016 0.82775075 0.90893744
0.92036889 0.92163685 0.81946895 0.9143283 0.81252437 0.92824432
0.75878566 0.81404196 0.87121462 0.73438774 0.80718153 0.88708332
0.91354842 0.52667519 0.94112667 0.93731003 0.94483886 0.97033654
0.76244928 0.93123175 0.77286008 0.87546557 0.60705664 0.72760754
0.82665212 0.91951727 0.94649817 0.93530476 0.91908615 0.94478304
0.93804561 0.80743798 0.95095218 0.84086034 0.94263966 0.85434296
0.8939842 0.91195926 0.89329183 0.94217187 0.92094018 0.92534352
0.84231454 0.80070691 0.78969709 0.89154176 0.75224552 0.98563106
0.96707234 0.90153511 0.77089402 0.89182195 0.89960071 0.85305716
0.94549166 0.86431631 0.85722134 0.67693538 0.90097462 0.92198301
0.78518065 0.76819692 0.88903017 0.90340532 0.89964216 0.71263816
0.98670033 0.85944924 0.81788499 0.90645091 0.77838803 0.86403478
0.85040232 0.73824728 0.93391523 0.89215502 0.9170631 0.86449047
0.81659417 0.87965375 0.89630691 0.75384405 0.91273398 0.90846708
0.98175881 0.89090127 0.87495474 0.94566111 0.88549609 0.78429757
0.8835784 0.83106831 0.71277922 0.92337898 0.96179742 0.70433655
0.87525256 0.62843049 0.92354528 0.93623984 0.88524244 0.86559362
0.78977878 0.93659078 0.92459342 0.89326338 0.77853101 0.88929344
0.75543453 0.76270482 0.91536853 0.77264839 0.73741813 0.96582459
0.89034114 0.81234031 0.81053727 0.86102493 0.97418468 0.94098004
0.90470082 0.89779213 0.77860791 0.92766247 0.66861 0.30180163
0.7851057 0.91198086 0.87794581 0.84816996 0.93551467 0.97131443
0.93234322 0.74688263 0.69960959 0.93554804 0.94104945 0.92845367
0.82424248 0.77653242]
Accuracy: 85.137% (11.459%)

In [89]:

from matplotlib import pyplot


# plot scores
pyplot.hist(results)
pyplot.show()
# confidence intervals
alpha = 0.95 # for 95% confidence
p = ((1.0-alpha)/2.0) * 100 # tail regions on right and left .25 on each side indicated by P value (border)
lower = max(0.0, np.percentile(results, p))
p = (alpha+((1.0-alpha)/2.0)) * 100
upper = min(1.0, np.percentile(results, p))
print('%.1f confidence interval %.1f%% and %.1f%%' % (alpha*100, lower*100, upper*100))
print('Average accuracy result on test data is %.3f%%:' % (np.mean(results)*100))

95.0 confidence interval 59.5% and 97.2%


Average accuracy result on test data is 85.137%:
In [92]:

sns.set(style="darkgrid", color_codes=True)

with sns.axes_style("white"):

sns.jointplot(x=y_val, y=xgb_best_3.predict(X_val_ht), kind="reg", color="k")


plt.title('Actual and Predicted', fontsize=20) # Plot heading
plt.xlabel('Actual', fontsize=10) # X-label
plt.ylabel('Predicted', fontsize=10)
plt.tight_layout()

Dataset-2 Final summary


Finally we have the result, our final selected model is performing well on the test data R2 score of around 87.0% with RMSE of around 120000.
</i>
Most important feature for pricing is furnished.The furnished house is priced higher.

Some other important features that affect price the most are living measure, latitude, above average quality of house and coastal house. So,
one needs to thoroughly introspect its property on parameters suggested and list its price accordingly, similarly if one wants buy house -
needs to check the features suggested above in house and calculate the predicted price. The same can than be compared to listed price.

Dataset-1 Final summary:


The ensemble models have performed well compared to that of linear,KNN,SVR models
The best performance is given by Gradient boosting model with training (score-0.89,RMSE-81372), Validation (score-0.80,RSME-115867),
Testing(score-0.79,RMSE-114695) The 95% confidence interval scores range from 0.72 to 0.85.
The top key features that drive the price of the property are: 'furnished_1', 'yr_built', 'living_measure','quality_8', 'HouseLandRatio',
'lot_measure15', 'quality_9', 'ceil_measure', 'total_area'.
The above data is also reinforced by the analysis done during bivariate analysis.

For further improvization, the datasets can be made by treating outliers in different ways and hypertuning the ensemble models.
</b>

CONCLUSION:
We have build different models on 2 datasets. The performance (score and 95% confidence interval scores) of the model build on dataset-1 is
better than dataset-2 as the 95% confidence interval of dataset-1 is very narrow compared to that of dataset-2. Even though the score of
dataset-2 model is higher, the model has very vast range of performance scores.

The top key features to consider for pricing a property are:'furnished_1', 'yr_built', 'living_measure','quality_8', 'lot_measure15', 'quality_9',
'ceil_measure', 'total_area'. These are almost similar in both the models

So, one needs to thoroughly introspect its property on parameters suggested and list its price accordingly, similarly if one wants buy house -
needs to check the features suggested above in house and calculate the predicted price. The same can than be compared to listed price.

For further improvization, the datasets can be made by treating outliers in different ways and hypertuning the ensemble models. Making
polynomial features and improvising the model performance can also be explored further.
Pickle file Creation
First we will define the function for data-preprocessing that is required to run through the model. Then we will recall the same for predicting the
price(target) of the property.

The pickle file is created as per the steps followed for dataset-2.

In [9]:

#Defining Funcation to process all required steps as done in model


def model(data):
import pandas as pd
import numpy as np

X_test = pd.read_excel(data)

#Removing outliers
X_test_1=X_test[(X_test['living_measure']<=9000) & (X_test['price']<=4000000) &
(X_test['room_bed']<=10) & (X_test['room_bath']<=6)]

cols=['cid','dayhours']
X_test_1=X_test.drop(cols, inplace = False, axis = 1)

#columns to be converted to category


categ=['coast', 'furnished','quality']
#X_test_2=X_test_1[categ].astype('category')

# Concatenate X_test_dummy_1 variables with X_test_2


#X_test_final = pd.concat([X_test_1, X_test_2], axis=1)
X_test_final=X_test_1.copy()

for i in range(1,2):
X_test_final['coast_'+str(i)]=0
X_test_final['furnished_'+str(i)]=0

for i in range(1,14):
X_test_final['quality_'+str(i)]=0

for i in range(1,2):
if ((X_test_final['coast']==i).bool()):
X_test_final['coast_'+str(i)]=1

for i in range(1,2):
if ((X_test_final['furnished']==i).bool()):
X_test_final['furnished_'+str(i)]=1

for i in range(1,14):
if ((X_test_final['quality']==i).bool()):
X_test_final['quality_'+str(i)]=1
X_test_final=X_test_final.drop([ 'quality_3', 'quality_4', 'quality_1', 'quality_2', 'quality_5','price'],1)
# Drop categorical variable columns
X_test_final = X_test_final.drop(X_test_final[categ], axis=1)

return X_test_final

Test run on pickle file:

In [ ]:

import pickle
with open('model_pickle','wb') as f:
pickle.dump(xgb_best_3,f)

In [11]:

with open('model_pickle','rb') as f:
mp=pickle.load(f)

In [14]:

X_test=model('innercity.xlsx')
mp.predict(X_test)
#X_test.columns

Out[14]:

array([314002.16], dtype=float32)
We can see that with the given parameters, pickle file has run through the model and given predicted price of
the property

In [ ]:

You might also like