Professional Documents
Culture Documents
Mid Semester Project Review UditSoni
Mid Semester Project Review UditSoni
ALGORITHMS ON
CUSTOMER CHURN PREDICTION
Seminar Presentation by
Udit Soni
Roll No: 22MAC2R30
(SEMESTER 4)
• Customer churn means detecting which customers are likely to leave a service
or cancel a subscription to a service.
• Critical prediction for many businesses
• The telecommunications business has an annual churn rate of 15-25 percent in
this highly competitive market
• Can we reduce customer churn?
CLASSIFICATION ALGORITHMS
• Used to predict the class of a data instance based on its input features
• Aims to learn a mapping between the input features and the output class
labels, which can be binary (e.g., yes/no)
• Three important classification algorithms
• Logistic Regression
• Random Forest
• Support Vector Machine
CUSTOMER CHURN DATASET
• The dataset's categorical column values provide insights into various aspects of customer behavior and
preferences, facilitating exploratory data analysis and predictive modeling:
• Gender: Categorized as 'Female' or 'Male'
• Partner: Indicating whether the customer has a partner ('Yes' or 'No')
• Dependents: Reflecting the presence of dependents ('Yes' or 'No')
• PhoneService: Specifies if the customer has phone service ('Yes', 'No', or 'No phone service')
• MultipleLines: Indicates whether the customer has multiple lines ('Yes', 'No', or 'No phone service')
• InternetService: Describes the type of internet service subscribed ('DSL', 'Fiber optic', or 'No')
• OnlineSecurity, OnlineBackup, DeviceProtection, TechSupport, StreamingTV, StreamingMovies: Representing
various additional services with values 'Yes', 'No', or 'No internet service'
• Contract: Specifies the contract type ('Month-to-month', 'One year', or 'Two year')
• PaperlessBilling: Reflects the preference for paperless billing ('Yes' or 'No')
• PaymentMethod: Specifies the payment method chosen by the customer ('Electronic check', 'Mailed check',
'Bank transfer (automatic)', or 'Credit card (automatic)')
• The customer churn dataset comprises 21 columns, including the target
variable 'Churn' and excluding the 'ID' column
• The dataset's dimensions are (7043, 20), where '7043' represents the number of
customer records.
• Ensuring data integrity and completeness is crucial for building reliable
predictive models.
PAIR PLOT
• A pair plot is a useful tool for visualizing relationships
and patterns in multivariate data
This equation will produce values beyond 1. Since Linear Regression doesn’t suit values beyond 1, the output values may be non-
linear. Thus, an algorithm for non-linear output variables is required to process.
Let an event E happen. The odds of an event are equal to P(E) / P(E’)
For P(E) = p(occurring), P(E’) = 1-p(occurring).
The range of P(E) / P(E’) will be [0,∞). Since we need to create a better model, so we will convert the odds range to (−∞,∞). Thus,
we will apply Logarithmic Functions i.e. logit(odds):
logit(odds) = log(odds) = log(p / 1−p)
Also, we know that:
logit(odds) = a + bX
p / (1 − p) = ea+bX
P = 1/ (1+e-(a+bX))
The above equation is the Sigmoid Function for Binary Logistic Regression. In general, this type of function, which is used for
non-linear output variables, is known as the Activation Function.
Note: The inverse of the logit function is the Sigmoid function. The range of this function will be
[0,1].
Therefore, it is a vital tool for solving binary classification problems. Its smoothness and simple
derivative make it easy to compute, which helps to ensure efficient and effective training of the
model
ADVANTAGES OF LOGISTICS REGRESSION