Professional Documents
Culture Documents
SMDM Project Gopala Satish Kumar Jupyter Notebook G8 DSBA
SMDM Project Gopala Satish Kumar Jupyter Notebook G8 DSBA
1 Problem 1
[8]: data1=pd.read_csv('C:/Users/kumar/Desktop/python projects/SMDM/porject/
,→Wholesale+Customers+Data.csv')
datahead=data1.head()
datahe=HTML(datahead.to_html(classes='table table-bordered'))
datahe
1.1 Heatplot
[95]: corr=data1.corr()
plt.subplots(figsize=(8,8))
sns.heatmap(corr,annot=True,cmap='YlGnBu')
plt.show()
1
[86]: datadescribe=data1.describe(include='all')
datade=HTML(datadescribe.to_html(classes='table table-bordered'))
datade
[63]: data1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 440 entries, 0 to 439
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
2
0 Buyer/Spender 440 non-null int64
1 Channel 440 non-null object
2 Region 440 non-null object
3 Fresh 440 non-null int64
4 Milk 440 non-null int64
5 Grocery 440 non-null int64
6 Frozen 440 non-null int64
7 Detergents_Paper 440 non-null int64
8 Delicatessen 440 non-null int64
dtypes: int64(7), object(2)
memory usage: 31.1+ KB
1.2 Pairplot
[97]: sns.pairplot(data1)
3
1.3 1.1 Mean of all variables along with Channel and Region
[88]: datapivot=pd.pivot_table(data=data1,index=['Channel','Region'],aggfunc=np.sum)
datapi=HTML(datapivot.to_html(classes='table table-bordered'))
datapi
4
1.4 1.2 Box plot
[32]: fig,axes=plt.subplots(3,2,figsize=(15,15))
sns.
,→boxplot(ax=axes[0,0],x=data1['Region'],y=data1['Fresh'],hue=data1['Channel'])
sns.boxplot(ax=axes[0,1],x=data1['Region'],y=data1['Milk'],hue=data1['Channel'])
sns.
,→boxplot(ax=axes[1,0],x=data1['Region'],y=data1['Grocery'],hue=data1['Channel'])
sns.
,→boxplot(ax=axes[1,1],x=data1['Region'],y=data1['Frozen'],hue=data1['Channel'])
sns.
,→boxplot(ax=axes[2,0],x=data1['Region'],y=data1['Detergents_Paper'],hue=data1['Channel'])
sns.
,→boxplot(ax=axes[2,1],x=data1['Region'],y=data1['Delicatessen'],hue=data1['Channel'])
5
1.5 1.3 Mean and Median
[119]: round(data1.mean())
[143]: round(data1.median())
[138]: datatemp.head()
[9]: fig,axes=plt.subplots(3,2,figsize=(15,15))
datatemp=data1.drop(['Buyer/Spender'],axis=1)
sns.distplot(ax=axes[0,0],x=datatemp['Fresh'],axlabel='Fresh');
sns.distplot(ax=axes[0,1],x=datatemp['Milk'],axlabel='Milk');
sns.distplot(ax=axes[1,0],x=datatemp['Grocery'],axlabel='Grocery');
sns.distplot(ax=axes[1,1],x=datatemp['Frozen'],axlabel='Frozen');
sns.
,→distplot(ax=axes[2,0],x=datatemp['Detergents_Paper'],axlabel='Detergents_Paper');
,→
sns.distplot(ax=axes[2,1],x=datatemp['Delicatessen'],axlabel='Delicatessen');
plt.show()
C:\Users\kumar\anaconda3\lib\site-packages\seaborn\distributions.py:2557:
FutureWarning: `distplot` is a deprecated function and will be removed in a
6
future version. Please adapt your code to use either `displot` (a figure-level
function with similar flexibility) or `histplot` (an axes-level function for
histograms).
warnings.warn(msg, FutureWarning)
C:\Users\kumar\anaconda3\lib\site-packages\seaborn\distributions.py:2557:
FutureWarning: `distplot` is a deprecated function and will be removed in a
future version. Please adapt your code to use either `displot` (a figure-level
function with similar flexibility) or `histplot` (an axes-level function for
histograms).
warnings.warn(msg, FutureWarning)
C:\Users\kumar\anaconda3\lib\site-packages\seaborn\distributions.py:2557:
FutureWarning: `distplot` is a deprecated function and will be removed in a
future version. Please adapt your code to use either `displot` (a figure-level
function with similar flexibility) or `histplot` (an axes-level function for
histograms).
warnings.warn(msg, FutureWarning)
C:\Users\kumar\anaconda3\lib\site-packages\seaborn\distributions.py:2557:
FutureWarning: `distplot` is a deprecated function and will be removed in a
future version. Please adapt your code to use either `displot` (a figure-level
function with similar flexibility) or `histplot` (an axes-level function for
histograms).
warnings.warn(msg, FutureWarning)
C:\Users\kumar\anaconda3\lib\site-packages\seaborn\distributions.py:2557:
FutureWarning: `distplot` is a deprecated function and will be removed in a
future version. Please adapt your code to use either `displot` (a figure-level
function with similar flexibility) or `histplot` (an axes-level function for
histograms).
warnings.warn(msg, FutureWarning)
C:\Users\kumar\anaconda3\lib\site-packages\seaborn\distributions.py:2557:
FutureWarning: `distplot` is a deprecated function and will be removed in a
future version. Please adapt your code to use either `displot` (a figure-level
function with similar flexibility) or `histplot` (an axes-level function for
histograms).
warnings.warn(msg, FutureWarning)
7
[76]: datatemp=data1.drop(['Buyer/Spender'],axis=1)
data_region=pd.pivot_table(data=datatemp,index=['Region']).T
data_region.to_csv('data_region.csv',index=True)
data_region
8
2 Problem 2
2.1 Sample of dataset
[6]: data2=pd.read_csv('C:/Users/kumar/Desktop/python projects/SMDM/porject/Survey-1.
,→csv')
data2head=data2.head()
data2he=HTML(data2head.to_html(classes='table table-bordered'))
data2he
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 62 entries, 0 to 61
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 62 non-null int64
1 Gender 62 non-null object
2 Age 62 non-null int64
3 Class 62 non-null object
4 Major 62 non-null object
5 Grad Intention 62 non-null object
6 GPA 62 non-null float64
7 Employment 62 non-null object
8 Salary 62 non-null float64
9 Social Networking 62 non-null int64
10 Satisfaction 62 non-null int64
11 Spending 62 non-null int64
12 Computer 62 non-null object
13 Text Messages 62 non-null int64
dtypes: float64(2), int64(6), object(6)
memory usage: 6.9+ KB
9
2.4 2.1 Contingency table
2.4.1 2.1.1 Gender v/s Major
data2_ct1=HTML(data2_crosstab1.to_html(classes='table table-bordered'))
data2_ct1
data2_ct2=HTML(data2_crosstab2.to_html(classes='table table-bordered'))
data2_ct2
data2_ct3=HTML(data2_crosstab3.to_html(classes='table table-bordered'))
data2_ct3
data2_ct4=HTML(data2_crosstab4.to_html(classes='table table-bordered'))
data2_ct4
10
2.6 2.7
[35]: data2lessgpa=data2[data2['GPA']<3]
data2lessgpa.groupby(['Gender'])['Gender'].count()
[35]: Gender
Female 8
Male 9
Name: Gender, dtype: int64
[42]: data2moresalary=data2[data2['Salary']>50]
data2moresalary.groupby(['Gender'])['Gender'].count()
[42]: Gender
Female 13
Male 10
Name: Gender, dtype: int64
[44]: data2equalsalary=data2[data2['Salary']==50]
data2equalsalary.groupby(['Gender'])['Gender'].count()
[44]: Gender
Female 5
Male 4
Name: Gender, dtype: int64
[47]: ID 31.500000
Age 21.129032
GPA 3.129032
Salary 48.548387
Social Networking 1.516129
Satisfaction 3.741935
Spending 482.016129
Text Messages 246.209677
dtype: float64
[49]: data2.median()
[49]: ID 31.50
Age 21.00
GPA 3.15
Salary 50.00
Social Networking 1.00
Satisfaction 4.00
11
Spending 500.00
Text Messages 200.00
dtype: float64
[51]: fig,axes=plt.subplots(2,2,figsize=(15,15))
sns.distplot(ax=axes[0,0],x=data2['GPA'],axlabel='GPA')
sns.distplot(ax=axes[0,1],x=data2['Salary'],axlabel='Salary')
sns.distplot(ax=axes[1,0],x=data2['Spending'],axlabel='Spending')
sns.distplot(ax=axes[1,1],x=data2['Text Messages'],axlabel='Text Messages')
plt.show()
C:\Users\kumar\anaconda3\lib\site-packages\seaborn\distributions.py:2557:
FutureWarning: `distplot` is a deprecated function and will be removed in a
future version. Please adapt your code to use either `displot` (a figure-level
function with similar flexibility) or `histplot` (an axes-level function for
histograms).
warnings.warn(msg, FutureWarning)
C:\Users\kumar\anaconda3\lib\site-packages\seaborn\distributions.py:2557:
FutureWarning: `distplot` is a deprecated function and will be removed in a
future version. Please adapt your code to use either `displot` (a figure-level
function with similar flexibility) or `histplot` (an axes-level function for
histograms).
warnings.warn(msg, FutureWarning)
C:\Users\kumar\anaconda3\lib\site-packages\seaborn\distributions.py:2557:
FutureWarning: `distplot` is a deprecated function and will be removed in a
future version. Please adapt your code to use either `displot` (a figure-level
function with similar flexibility) or `histplot` (an axes-level function for
histograms).
warnings.warn(msg, FutureWarning)
C:\Users\kumar\anaconda3\lib\site-packages\seaborn\distributions.py:2557:
FutureWarning: `distplot` is a deprecated function and will be removed in a
future version. Please adapt your code to use either `displot` (a figure-level
function with similar flexibility) or `histplot` (an axes-level function for
histograms).
warnings.warn(msg, FutureWarning)
12
3 Problem 3
3.0.1 Sample of given data
data3.head()
[2]: A B
0 0.44 0.14
1 0.61 0.15
2 0.47 0.31
3 0.30 0.16
4 0.15 0.37
13
3.0.2 Nulls and datatypes of the given data
[6]: data3.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36 entries, 0 to 35
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 A 36 non-null float64
1 B 31 non-null float64
dtypes: float64(2)
memory usage: 704.0 bytes
14