Mall Customer Data Analysis PDF

12/6/21, 1:19 PM Mall Customer Data Analysis.
ipynb - Colaboratory
Mall Customer Data Analysis –
By – Sarthak Aneja and Sakshi

Chaudhary.
Introduction – The goal of this project is to find the relation

of age and annual income. In this data we have different
columns such as customer ID, gender, age, annual income,
spending score . In this we make a modle which find the relation
between age and annual income of a customer which predict
spending score based on the data. First we read data and check
records given in the data.
Then check any null values is present or not, if present then we
have to replace or remove it. Then we rename the data
frames, then we scale raw data. We perform descriptive
statistics.
Project planning –
 Read client data and check records.

 Check null values if exist and remove/replace null
values if required.
 Rename data frame column if required.
 Scale Raw data as per model requirement.
 Perform descriptive statistics and calculate mean,
median etc.
https://colab.research.google.com/drive/1wFtP4v8SzFKKYBfFiUZlIe1gRHZ6NmSe#printMode=true 1/10
12/6/21, 1:19 PM Mall Customer Data Analysis.ipynb - Colaboratory
 Create box plot for numerical column.

 Group data and create box plot for grouped data if
required.
 Check correlation between variable and draw
correlation matrix.
 Draw histogram of data and check density (KDE) is
required.
 Check type of data for regression or classification.
 Perform train and test split for client data and fit into
required model.
 Create model as per requirement and perform
classification/regression/clustering.
 Try to apply some other model and check the best
model.
 Create confusion matrix and classification report for
these model.
 Write your conclusion.
1-Read Client Data and Check Records.
import numpy as np import pandas

as pd import m a t p l o t l i b . p y p l o t
as p l t import seaborn as sns
data = pd.read_csv("Mall_Customers.csv")
data.head(10)
CustomerID Genre Ag Annual Income ( k$) Spending Score (1-100)
e
0 1 Male 19 15 39
1 2 Male 21 15 81
2 3 Female 20 16 6
3 4 Female 23 16 77
4 5 Female 31 17 40
5 6 Female 22 17 76
6 7 Female 35 18 6
7 8 Female 23 18 94
8 9 Male 64 19 3
9 10 Female 30 19 72
2- Check Null Values if exist and remove/replace Null values if

required.
data.isnull().sum()
CustomerID 0
Genre 0
Age 0
Annual Income ( k $ ) 0
Spending Score ( 1- 100)
0 dtype: i n t 64
3- Rename Data Frame
column names if
required.
data.rename(columns = {'Genre':'Gender','Spending Score (1-100)':'SpendingScore','Annual I
data.head(3)
CustomerID Gender Age AnnualIncome(k$) SpendingScore
0 1 Male19 15 39
1 2 Male21 15 81
2 3 Female20 16 6
4- Scale Raw data as per Model requirement.
x = d f . i l o c [ : , [ 2 , 3 , 4 ] ] # s p l i t i n g columns x
Age AnnualIncome(k$) SpendingScore
0 19 15 39
1 21 15 81
2 20 16 6
3 23 16 77
4 31 17 40
... ... ... ...
195 35 120 79
196 45 126 28
197 32 126 74
198 32 137 18
199 30 137 83
200 rows × 3 columns
x.dtypes
Age int64
AnnualIncome(k$)
SpendingScore int64
dtype: obj ect
int64
5- Perform descriptive statistics and calculate mean, median etc.
import s t a t i s t i c s as s t
df=pd.DataFrame(x) df.mean()
Age 38.85
AnnualIncome(k$) 60.56
SpendingScore 50.20
dtype: f l oa t 64
st.median(x)
'AnnualIncome(k$)'
st.mode(x)
' A ge'
df.mode()
Age AnnualIncome(k$) SpendingScore
0 32.0 54 42.0
1 NaN 78 NaN
df.std()
Age 13.969007
AnnualIncome(k$) 26.264721
SpendingScore 25.823522
dtype: f l o at 64
6- Create boxplot for numerical columns.
sns.boxplot(data =x)
<AxesSubplot: >
7 Group data and Create boxplot for Grouped Data if required.
8 Check Correlation b/w variables and draw correlation matrix.
data.corr()
CustomerID Age AnnualIncome(k$) SpendingScore
CustomerID 1.000000 -0.026763 0.977548 0.013835
Age -0.026763 1.000000 -0.012398 -0.327227

AnnualIncome(k$) 0.977548 -0.012398 1.000000 0.009903
SpendingScore 0.013835 -0.327227 0.009903 1.000000

c o r r e l a t i o n = data.Age.corr(data.SpendingScore) c o r r e l a t i o n
-0.32722684603909014
sns.heatmap(data.corr(),annot=True)
<AxesSubplot:>
sns.heatmap(x.corr(),annot=True)
<AxesSubplot:>
9- Draw histogram for data and check density (KDE) if required.
color s = ' r e d ' , ' g r e e n '

p l t . h i s t ( [ d a t a [ ' S p e n d i n g S c o r e ' ] , d a t a [ ' A g e ' ] ] , c ol o r = c o l o r s , edgecolor = ' b l a c k ' ,
bins = i n t ( 9 0 / 1 0 ) )
(array ([[17., 20., 12., 30., 43., 21., 24., 16., 17.],
[ 0 . , 25., 59., 47., 40., 19., 10., 0., 0.]]),
array([ 1. , 11.88888889, 22.77777778, 33.66666667, 44.55555556,
55.44444444, 66.33333333, 77.22222222, 88.11111111, 99. ] ) , <a
list
o f 2 BarContainer objects>)
x.columns
I n d e x ( [ ' A g e ' , 'AnnualIncome(k$)', 'SpendingScore'], d t y p e = ' o b j e c t ' )
x.columns = x . c ol umns . s t r. r epl ac e( ' ' , ' _ ' )
x
Age AnnualIncome(k$)
SpendingScore
0 19 15 39
1 21 15 81
2 20 16 6
3 23 16 77
4 31 17 40
... ... ...

...
195 35 120 79
196 45 126 28
197 32 126 74
198 32 137 18
199 30 137 83
s ns .di s pl ot ( dat a = data, x = "AnnualIncome(k$)",hue = 'Gender')
200 rows × 3 columns
<seaborn.axisgrid.FacetGrid a t 0x1f8ff324220>
s n s . d i s t p l o t ( ( d a t a [ ' A g e ' ] ) , h i s t = Tr u e ,kde=True,

c ol or = ' r e d ' ,
hist_kws={'edgecolor':'black'},
k de_k w s = { ' l i new i dt h' : 4} )
s ns .di s t pl ot ( ( dat a[ ' Spendi ngS c or e' ] ) , hi s t = Tr u e ,kde=True,

color = ' g r e e n ' , hist_kws ={'edg e c o l o r ' : ' b l a c k ' } ,
k de_k w s = { ' l i new i dt h' : 4} ) C:\ProgramData\Anaconda3\lib\site-
packages\seaborn\distributions.py:2557: FutureWa
warnings.warn(msg, FutureWarning)
<AxesSubplot:xlabel='SpendingScore', y l abe l = ' D ens i t y ' >
10- Check type of Data for regression or classi cation.
data.dtypes
CustomerID i nt 64
Gender object
Age
AnnualIncome(k$) int64
SpendingScore
dtype: object int64
int64
11- Perform Train and Test split for Client data and t into required
Model.
## Separate t r a i n dataset and t e s t dataset x _ t r a i n , x _ t e s t , y _ t r a i n , y_ t es t =

t r ai n _t es t _s pl i t ( x [ [ " A ge " , " A n n ual I nc o me ( k $) " ] ] , x.Spend x _ t r a i n
Age AnnualIncome(k$)
88 34 58
58 27 46
113 19 64
149 34 78
42 34
12- Create model as per requirement and perform36
...
classi... cation/regression/clustering....
151 39 78
67 68 48
from sklearn.linear_model import LinearRegression from
sklearn.model_selection 25 29 import28
train_test_split
l r = LinearRegression()
lr.fit(data[[ 1 9 6 4 5 "Age","AnnualIncome(k$)" 1 2 6 ]],data.SpendingScore)

175 30 88
y_predicted = l r. p r e d i c t ( x _ t e s t )
y_predicted140 rows × 2 columns
array([40.41222091,
61.34408401, 50.75108989, 54.33957352, 47.61793098, 40.95952258,
57.40374973, 45.67135234,
32.58447792, 45.26776466, 62.2822877 , 62.73761242, 49.14367625,
52.28952082, 52.8701253 ,
57.46698388, 40.85029995, 61.60276918, 50.81432404, 43.3384317 ,
38.20002002, 52.60569158, 54.43155047, 37.97582621, 41.26994478,
54.49478462, 44.06968726, 54.71897843, 33.79405243, 57.52446947,
31.87046803, 33.79405243, 39.93390196, 60.957742 , 59.84014444,
44.62848604, 47.75014784, 56.37238055, 39.18540072, 35.61991133,
50.77408413, 38.07355171, 31.27717789, 49.55301249, 52.34125785,
34.25512571, 59.05715184, 54.58676157, 60.81977658, 45.37123872,
41.48383001, 58.94792922, 44.08693294, 59.74816749, 50.16929687,
49.32307012, 42.03113167, 54.42580191, 61.67175189, 57.36350981])

Mall Customer Data Analysis PDF

Uploaded by

Copyright:

Available Formats

You might also like

Mall Customer Data Analysis PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mall Customer Data Analysis PDF

Uploaded by

Copyright:

Available Formats

12/6/21, 1:19 PM Mall Customer Data Analysis.

Mall Customer Data Analysis –

By – Sarthak Aneja and Sakshi

Introduction – The goal of this project is to find the relation

 Read client data and check records.

 Create box plot for numerical column.

1-Read Client Data and Check Records.

import numpy as np import pandas

2- Check Null Values if exist and remove/replace Null values if

CustomerID Gender Age AnnualIncome(k$) SpendingScore

4- Scale Raw data as per Model requirement.

Age AnnualIncome(k$) SpendingScore

... ... ... ...

200 rows × 3 columns

Age AnnualIncome(k$) SpendingScore

6- Create boxplot for numerical columns.

7 Group data and Create boxplot for Grouped Data if required.

8 Check Correlation b/w variables and draw correlation matrix.

CustomerID Age AnnualIncome(k$) SpendingScore

CustomerID 1.000000 -0.026763 0.977548 0.013835

Age -0.026763 1.000000 -0.012398 -0.327227

SpendingScore 0.013835 -0.327227 0.009903 1.000000

9- Draw histogram for data and check density (KDE) if required.

color s = ' r e d ' , ' g r e e n '

I n d e x ( [ ' A g e ' , 'AnnualIncome(k$)', 'SpendingScore'], d t y p e = ' o b j e c t ' )

x.columns = x . c ol umns . s t r. r epl ac e( ' ' , ' _ ' )

... ... ...

s n s . d i s t p l o t ( ( d a t a [ ' A g e ' ] ) , h i s t = Tr u e ,kde=True,

s ns .di s t pl ot ( ( dat a[ ' Spendi ngS c or e' ] ) , hi s t = Tr u e ,kde=True,

10- Check type of Data for regression or classi cation.

## Separate t r a i n dataset and t e s t dataset x _ t r a i n , x _ t e s t , y _ t r a i n , y_ t es t =

lr.fit(data[[ 1 9 6 4 5 "Age","AnnualIncome(k$)" 1 2 6 ]],data.SpendingScore)

You might also like