Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

CHURN RATE EDA

Prashant 180538
Problem Statement
Apply basic data understanding and try to find some interpretations from the Telco-Churn-Rate
Dataset

Knowing The Dataset

There are a total of 7043 entries in our Dataset having class labels as Churn Yes or Churn No

1). customerID, MultipleLines, InternetService, OnlineSecurity, OnlineBackup, DeviceProtection,


TechSupport, StreamingTV, StreamingMovies, Contract, PaymentMethod: Nominal Attributes

2). gender, SeniorCitizens, Partner, Dependents, PhoneService, PaperlessBilling, Churn: Symmetric


Binary Attributes

3). Tenure, MonthlyCharges, TotalCharges: Numeric Attributes

So Total (3) Numeric, (7) Binary and (11) Nominal Attributes present in the dataset

Data Preprocessing

Data Cleaning

Missing Values
There are 11 entries in the TotalCharges column where Total Charges is empty so we drop those
entries from the dataset and the indexes of those 11 entries are [488, 753, 936, 1082, 1340, 3331, 3826,
4380, 5218, 6670, 6754]

Null Values
There are no null values in the dataset

Outlier Analysis
Upon plotting the boxplots for the 3 numeric attributes i.e. tenure, MonthlyCharges, TotalCharges
the IQR analysis was performed and no outliers were found.
EDA

Observations from the above Countplots:

1). Churn Rate for Senior Citizens is high as compared to Non-Senior Citizens
(plot-1).

2). The people having partners or dependents have a lower Churn Rate as compared
to people with No Partners or No dependents (plot-2).

3). Out of the two Internet Services, people having Fiber Optic have a higher Churn
Rate as compared to people having DSL (plot-4).

4). Customers with longer contract terms i.e. one year or two-year contract have a lower Churn
Rate as compared to those with month-to-month contracts.

Observations from above boxplots:


1). Customers associated with the company for a longer duration have a lower churn rate for them
the median of tenure lies between 35-40 whereas for the Churn yes customers that range of
median is down to 10-15. (plot-1)

2). Customers having higher monthly charges have higher Churn Rate as and the median of
charges lies close to 80 dollars whereas for the non-churning customers the median of monthly
charges is close to 70 dollars. (plot-2)

3). An interesting observation comes in plot-3 where we can see that the Total Charges of Non-
Churning Customers is higher as compared to Churning Customers. The median for non Churning
Customers is close to 2000 dollars and that of Churning Customers is close to 1000 dollars. (plot-3)

Attribute Generation

People using one or more of the additional services (OnlineSecurity, OnlineBackup,


DeviceProtection, TechSupport, StreamingTV, StreamingMovies) have a lower Churn Rate as
compared to people with no subscription to the additional services

Hence looking at this observation we create a new attribute called “count_of_services_used” which
takes into account the number of additional services (out of OnlineSecurity, OnlineBackup,
DeviceProtection, TechSupport, StreamingTV, StreamingMovies, InternetService) used by the
customer.
Attribute Importance

The table on the left contains the chi-squared


test scores and p-values for all the categorical
attributes of our dataset arranged in the
decreasing order of their importance.

So the contract and the


count_of_services_used are the two most
important nominal attributes for predicting
whether the customer will churn or not.

You might also like