Professional Documents
Culture Documents
Group-6 Assignment 1
Group-6 Assignment 1
[6]: df.head()
1
1 1 1.0 No
2 5 5.0 Yes
3 1 2.0 No
4 4 3.0 No
[7]: df.tail()
[9]: df.sample(10)
2
94 Student 3 Indian Often Often 2
9 Student 4 Chinese Often Often 5
49 Student 5 Japanese Socially Socially 2
153 Professional 5 Filipino Socially Often 4
112 Professional 5 Chinese Socially Never 5
172 Professional 4 Indian Never Never 1
[10]: df.columns
[10]: Index(['User ID', 'Area code', 'Location', 'Gender', 'YOB', 'Marital Status',
'Activity', 'Budget', 'Cuisines', 'Alcohol ', 'Smoker', 'Food Rating',
'Service Rating', 'Overall Rating', 'Often A S'],
dtype='object')
[ ]:
df.info()
df.describe()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 User ID 200 non-null int64
1 Area code 200 non-null int64
2 Location 200 non-null object
3 Gender 200 non-null object
4 YOB 200 non-null int64
5 Marital Status 200 non-null object
6 Activity 200 non-null object
3
7 Budget 200 non-null int64
8 Cuisines 200 non-null object
9 Alcohol 200 non-null object
10 Smoker 200 non-null object
11 Food Rating 200 non-null int64
12 Service Rating 200 non-null int64
13 Overall Rating 200 non-null float64
14 Often A S 200 non-null object
dtypes: float64(1), int64(6), object(8)
memory usage: 23.6+ KB
df.duplicated().sum()
[13]: 0
[15]: #Datatypes
df.dtypes
4
Budget int64
Cuisines object
Alcohol object
Smoker object
Food Rating int64
Service Rating int64
Overall Rating float64
Often A S object
dtype: object
[16]: df.Cuisines.value_counts()
[16]: Japanese 36
French 34
Filipino 34
Indian 32
Chinese 24
Seafood 22
Italian 18
Name: Cuisines, dtype: int64
[17]: df.Cuisines.value_counts().plot(kind="bar")
plt.title("Cuisines")
plt.xlabel("Cuisine type")
plt.xticks(rotation=0)
plt.ylabel("Counts")
plt.show()
5
[18]: #Find null values
df.isnull().sum()
[18]: User ID 0
Area code 0
Location 0
Gender 0
YOB 0
Marital Status 0
Activity 0
Budget 0
Cuisines 0
Alcohol 0
Smoker 0
Food Rating 0
Service Rating 0
Overall Rating 0
Often A S 0
dtype: int64
6
plt.show()
Skewness: 0.05531324050499118
Kurtosis: -0.7037624403662659
[21]: #Correlation
df.corr()
7
User ID 0.111227 0.076208
Area code -0.011942 -0.008142
YOB 0.043651 0.057508
Budget -0.135542 -0.058049
Food Rating 0.079056 0.709562
Service Rating 1.000000 0.758532
Overall Rating 0.758532 1.000000
sns.heatmap(df.corr())
[22]: <AxesSubplot:>
[ ]: