Professional Documents
Culture Documents
MAJOR PROJECT (Sanket Patil) PDF
MAJOR PROJECT (Sanket Patil) PDF
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt # visualizing data
%matplotlib inline
import seaborn as sns
In [8]: df.shape
In [5]: df.head()
Out[5]: Age
User_ID Cust_name Product_ID Gender Age Marital_Status State Zone Occupation Product_Category Orders Amount Status
Group
0 1002903 Sanskriti P00125942 F 26-35 28 0 Maharashtra Western Healthcare Auto 1 23952.0 NaN
1 1000732 Kartik P00110942 F 26-35 35 1 Andhra Pradesh Southern Govt Auto 3 23934.0 NaN
2 1001990 Bindu P00118542 F 26-35 35 1 Uttar Pradesh Central Automobile Auto 3 23924.0 NaN
3 1001425 Sudevi P00237842 M 0-17 16 0 Karnataka Southern Construction Auto 2 23912.0 NaN
Food
4 1000588 Joni P00057942 M 26-35 28 1 Gujarat Western Auto 2 23877.0 NaN
Processing
In [5]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11251 entries, 0 to 11250
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 User_ID 11251 non-null int64
1 Cust_name 11251 non-null object
2 Product_ID 11251 non-null object
3 Gender 11251 non-null object
4 Age Group 11251 non-null object
5 Age 11251 non-null int64
6 Marital_Status 11251 non-null int64
7 State 11251 non-null object
8 Zone 11251 non-null object
9 Occupation 11251 non-null object
10 Product_Category 11251 non-null object
11 Orders 11251 non-null int64
12 Amount 11239 non-null float64
13 Status 0 non-null float64
14 unnamed1 0 non-null float64
dtypes: float64(3), int64(4), object(8)
memory usage: 1.3+ MB
Out[7]: User_ID 0
Cust_name 0
Product_ID 0
Gender 0
Age Group 0
Age 0
Marital_Status 0
State 0
Zone 0
Occupation 0
Product_Category 0
Orders 0
Amount 12
dtype: int64
In [10]: df['Amount'].dtypes
Out[10]: dtype('int32')
In [11]: df.columns
Out[12]: Age
User_ID Cust_name Product_ID Gender Age Shaadi State Zone Occupation Product_Category Orders Amount
Group
2 1001990 Bindu P00118542 F 26-35 35 1 Uttar Pradesh Central Automobile Auto 3 23924
Food
4 1000588 Joni P00057942 M 26-35 28 1 Gujarat Western Auto 2 23877
Processing
... ... ... ... ... ... ... ... ... ... ... ... ... ...
11246 1000695 Manning P00296942 M 18-25 19 1 Maharashtra Western Chemical Office 4 370
11247 1004089 Reichenbach P00171342 M 26-35 33 0 Haryana Northern Healthcare Veterinary 3 367
Madhya
11248 1001209 Oshin P00201342 F 36-45 40 0 Central Textile Office 4 213
Pradesh
11249 1004023 Noonan P00059442 M 36-45 37 0 Karnataka Southern Agriculture Office 3 206
11250 1002744 Brumley P00281742 F 18-25 19 0 Maharashtra Western Healthcare Office 3 188
In [13]: # describe() method returns description of the data in the DataFrame (i.e. count, mean, std, etc)
df.describe()
From above graphs we can see that most of the buyers are females and even the purchasing power of females are greater than men
Age
In [17]: ax = sns.countplot(data = df, x = 'Age Group', hue = 'Gender')
From above graphs we can see that most of the buyers are of age group between 26-35 yrs female
State
In [19]: # total number of orders from top 10 states
sns.set(rc={'figure.figsize':(15,5)})
sns.barplot(data = sales_state, x = 'State',y= 'Orders')
sns.set(rc={'figure.figsize':(15,5)})
sns.barplot(data = sales_state, x = 'State',y= 'Amount')
From above graphs we can see that most of the orders & total sales/amount are from Uttar Pradesh, Maharashtra and Karnataka respectively
Marital Status
In [21]: ax = sns.countplot(data = df, x = 'Marital_Status')
sns.set(rc={'figure.figsize':(7,5)})
for bars in ax.containers:
ax.bar_label(bars)
sns.set(rc={'figure.figsize':(6,5)})
sns.barplot(data = sales_state, x = 'Marital_Status',y= 'Amount', hue='Gender')
From above graphs we can see that most of the buyers are married (women) and they have high purchasing power
Occupation
In [23]: sns.set(rc={'figure.figsize':(20,5)})
ax = sns.countplot(data = df, x = 'Occupation')
sns.set(rc={'figure.figsize':(20,5)})
sns.barplot(data = sales_state, x = 'Occupation',y= 'Amount')
From above graphs we can see that most of the buyers are working in IT, Healthcare and Aviation sector
Product Category
In [25]: sns.set(rc={'figure.figsize':(20,5)})
ax = sns.countplot(data = df, x = 'Product_Category')
sns.set(rc={'figure.figsize':(20,5)})
sns.barplot(data = sales_state, x = 'Product_Category',y= 'Amount')
From above graphs we can see that most of the sold products are from Food, Clothing and Electronics category
sns.set(rc={'figure.figsize':(20,5)})
sns.barplot(data = sales_state, x = 'Product_ID',y= 'Orders')
REGRESSOR
In [15]: import pandas as pd
In [16]: df=pd.read_csv('Boston.csv')
In [17]: df
Out[17]: Unnamed: 0 crim zn indus chas nox rm age dis rad tax ptratio black lstat medv
0 1 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98 24.0
1 2 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14 21.6
2 3 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.7
3 4 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 33.4
4 5 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33 36.2
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
501 502 0.06263 0.0 11.93 0 0.573 6.593 69.1 2.4786 1 273 21.0 391.99 9.67 22.4
502 503 0.04527 0.0 11.93 0 0.573 6.120 76.7 2.2875 1 273 21.0 396.90 9.08 20.6
503 504 0.06076 0.0 11.93 0 0.573 6.976 91.0 2.1675 1 273 21.0 396.90 5.64 23.9
504 505 0.10959 0.0 11.93 0 0.573 6.794 89.3 2.3889 1 273 21.0 393.45 6.48 22.0
505 506 0.04741 0.0 11.93 0 0.573 6.030 80.8 2.5050 1 273 21.0 396.90 7.88 11.9
In [18]: df.head()
Out[18]: Unnamed: 0 crim zn indus chas nox rm age dis rad tax ptratio black lstat medv
0 1 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98 24.0
1 2 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14 21.6
2 3 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.7
3 4 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 33.4
4 5 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33 36.2
In [21]: x.shape
In [43]: x_train,x_test,y_train,y_test=train_test_split(x,y,random_state=0,test_size=0.25)
In [44]: x_train
Out[44]: Unnamed: 0 crim zn indus chas nox rm age dis rad tax ptratio black lstat
245 246 0.19133 22.0 5.86 0 0.431 5.605 70.2 7.9549 7 330 19.1 389.13 18.46
59 60 0.10328 25.0 5.13 0 0.453 5.927 47.2 6.9320 8 284 19.7 396.90 9.22
276 277 0.10469 40.0 6.41 1 0.447 7.267 49.0 4.7872 4 254 17.6 389.25 6.05
395 396 8.71675 0.0 18.10 0 0.693 6.471 98.8 1.7257 24 666 20.2 391.98 17.12
416 417 10.83420 0.0 18.10 0 0.679 6.782 90.8 1.8195 24 666 20.2 21.57 25.79
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
323 324 0.28392 0.0 7.38 0 0.493 5.708 74.3 4.7211 5 287 19.6 391.13 11.74
192 193 0.08664 45.0 3.44 0 0.437 7.178 26.3 6.4798 5 398 15.2 390.49 2.87
117 118 0.15098 0.0 10.01 0 0.547 6.021 82.6 2.7474 6 432 17.8 394.51 10.30
47 48 0.22927 0.0 6.91 0 0.448 6.030 85.5 5.6894 3 233 17.9 392.74 18.80
172 173 0.13914 0.0 4.05 0 0.510 5.572 88.5 2.5961 5 296 16.6 396.90 14.69
In [45]: x_train.head()
Out[45]: Unnamed: 0 crim zn indus chas nox rm age dis rad tax ptratio black lstat
245 246 0.19133 22.0 5.86 0 0.431 5.605 70.2 7.9549 7 330 19.1 389.13 18.46
59 60 0.10328 25.0 5.13 0 0.453 5.927 47.2 6.9320 8 284 19.7 396.90 9.22
276 277 0.10469 40.0 6.41 1 0.447 7.267 49.0 4.7872 4 254 17.6 389.25 6.05
395 396 8.71675 0.0 18.10 0 0.693 6.471 98.8 1.7257 24 666 20.2 391.98 17.12
416 417 10.83420 0.0 18.10 0 0.679 6.782 90.8 1.8195 24 666 20.2 21.57 25.79
In [46]: x_train.shape
In [47]: x_test.shape
In [50]: regressor.fit(x_train,y_train)
Out[50]: ▾ LinearRegression
LinearRegression()
In [51]: regressor.coef_
In [52]: regressor.intercept_
Out[52]: 36.950771141093725
In [53]: #predictions
y_pred=regressor.predict(x_test)
In [54]: y_pred.shape
Out[54]: (127,)
In [55]: result=pd.DataFrame({'Actual':y_test,'Producted':y_pred})
In [56]: result
78 21.2 21.421473
49 19.4 17.587109
In [57]: residual_errors=abs(y_test-y_pred)
In [58]: residual_errors
In [59]: residual_errors;
Out[60]: 3.660052718913954
In [62]: mean_absolute_percentage_error(y_test,y_pred)
Out[62]: 0.175030596645347
In [63]: regressor.score(x_test,y_test)
Out[63]: 0.6367909663749035
Out[64]: 0.6367909663749035
In [69]: new=[[0.7258,0,8.64,0,0.538,5.727,69.6,3.7965,4,307,22,391.95,11.28,23.65]]
In [70]: new
Out[70]: [[0.7258,
0,
8.64,
0,
0.538,
5.727,
69.6,
3.7965,
4,
307,
22,
391.95,
11.28,
23.65]]
In [72]: regressor.predict(new)
Classifier
In [86]: import pandas as pd
In [90]: df=pd.read_csv('Social_Network_Ads.csv')
In [91]: df
In [96]: x_train
In [97]: y_train
Out[97]: 250 0
63 1
312 0
159 1
283 1
..
323 1
192 0
117 0
47 0
172 0
Name: Purchased, Length: 300, dtype: int64
In [100… classifier.fit(x_train,y_train)
Out[100]: ▾ LogisticRegression
LogisticRegression()
In [102… #predication
y_pred = classifier.predict(x_test)
In [101… y_train.shape
Out[101]: (300,)
In [103… x_train.shape
Out[103]: (300, 2)
In [104… y_pred
Out[104]: array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,
0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1], dtype=int64)
In [105… y_test
Out[105]: 132 0
309 0
341 0
196 0
246 0
..
146 1
135 0
390 1
264 1
364 1
Name: Purchased, Length: 100, dtype: int64
Out[106]: 0.89
In [109… print(classification_report(y_test,y_pred))
In [110… new1=[[26,34000]]
new2=[[57,138000]]
In [111… classifier.predict(scaler.transform(new1))
In [112… classifier.predict(scaler.transform(new2))
In [ ]: