5 - Logistic Regression - Lemai Nguyen 2022

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 45

Supervised Machine Learning

Logistic Regression
Associate Professor Lemai Nguyen
Associate Professor Lemai Nguyen
Information Systems and Business Analytics

Expertise areas and research interests:

Artificial Intelligence and Business Analytics
Health Informatics and Digital Health
Socio-technical Analysis and Evaluation

External affiliations / professional activities:

Section Editor, Australasian Journal of Information Systems (2013 – Present)
Honorary Senior Research Affiliate, Epworth Healthcare (2015-2017)
Member of Association for Information Systems (AIS) and AAIS
Member of Australasian Institute of Digital Health (formerly HISA) (2009 -
Senior Certified Professional, Australian Computer Society (ACS) (2017-Present)
Track Co-Chair of Digital Healthcare Systems, Australasian Conference on
Information Systems, 2019-2022

Email: lemai.nguyen@deakin.edu.au


Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 2
Tell me about you…!

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 3
Kotu V., Deshpande B. Data Science : Concepts and Practice, chapters
1and 4. Second edition. Morgan Kaufmann Publishers; 2019.

Google Colab

https://colab.research.google.com https://jupyter.org/


Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 4
Predictive Machine Learning with Logistic Regression

Regression –
Key concepts

Exercises in


Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 5
Regression –
Key concepts

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 6
Supervised Machine Learning

Kotu and Deshpande, 2019, chapter 1

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen MIS716 AI for Business Slide 7
Linear Regression – revision (1)
What it is and How it works

Simple linear regression

y’= f(𝑥) = 𝒃𝟎 + 𝒃𝟏 𝒙
𝒃𝟎 is the intercept of the line

𝒃𝟏 is the slope of the line

𝑥=independent variable/predictor
𝑦=dependent variable/label

Image source: http://www.sthda.com/english/articles/40-


Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen

Image source: https://www.miabellaai.net/regression.html
If we have more than one independent variables:

𝒚′ = 𝒃𝟎 + 𝒃𝟏 𝒙𝟏 + 𝒃𝟐 𝒙𝟐 + ⋯ + 𝒃𝒏 𝒙𝒏

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen

Linear Regression – Revision (2) 𝑦! =actual
Model evaluation metrics and cost functions 𝑦!! =estimated

𝒚′ = 𝒃𝟎 + 𝒃𝟏 𝒙

S𝐚𝐥𝐞𝐬 = $𝟏𝟎, 𝟎𝟎𝟎 + $𝟑𝟏, 𝟓𝟔𝟓 (Ad_spend)

Mean absolute error (MAE)

∑&#$% |𝑦# − 𝑦E# |

Root mean square error/deviation (RMSE/RMSD)

∑&#$%(𝑦# − 𝑦E# )'
Image source: http://www.sthda.com/english/articles/40-

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


𝒚′ = 𝒃𝟎 + 𝒃𝟏 𝒙𝟏 + 𝒃𝟐 𝒙𝟐 + ⋯ + 𝒃𝒏 𝒙𝒏

• Continuous predictors and target

• No missing data
• No outliers
• No multi-collinearity
• Normal distribution and constant variance
of residuals
• Not good for non-linear relationships,
complex relationships

Image source: https://www.miabellaai.net/regression.html

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen

For example
Kotu and Deshpande, 2019

Continuous target data

Discrete target data

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen

Fitting Linear Regression to a binary target
Kotu and Deshpande, 2019

• Target is categorical
• Predictors can be continuous or

One way is to convert

into y ∈ {0,1}

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen

The Logistic function - logit

A common Sigmoid function

x is continuous from - ∞ to + ∞

If x= - ∞ then Sigmoid (x) = 0

If x= + ∞ then Sigmoid (x) = 1

v Aha!! S(x) can be function

to convert


Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen

Logistic Regression
How it works
Kotu and Deshpande, 2019

p is the probability of the event y happening

then (1-p) is the probability of the event not happening

p/(1-p) -> the odds (odds ratio) of the event happening

log of the odds log (p/(1-p)) is called is called the logit.

Given predictors X, logit is a linear regression

Logit is continuous from - ∞ to + ∞

Probability (y) can be computed using the logistic function (Sigmoid):

= 1 / (1+ e-logit)

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen

Training the logistic model
Model training and loss, cost functions

Given predictors X (x1, x2, ..., xn),

Find the logit, a linear regression to the predictors X:

Computer probability of y using the Sigmoid function:

Training will involve a search for the coefficients bi to maximise the likelihood of estimations
for each datapoint using a simplified likelihood function

y – original target data (training dataset) v Cost function is Sum of all likelihood values.
p – estimated probability v Gradient descent can be utilised to search for
coefficients to maximise the likelihood of
correct estimations

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen

Are you keeping pace?

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 17
Predictive Machine Learning with Logistic Regression


Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 18
Business problem framing – business needs and application
• Assist pathologists in
interpretating test data ->
reduced time and improved
• Training novice pathologists

• Predictive analytics to
classify diagnosis
• Past biopsy data and Application
• To predict cancer diagnosis
• Long delay in returning
pathology results
• Novice pathologists need Analytics (ML)

Business needs

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 19
ML Problem Framing: Classification
Predict if a datapoint belongs to one of the predefined classes, based on learning from a labelled

§ Business Context: Pathology Lab

§ Business Problem: To make cancer diagnoses in less time, with same or higher accuracy
§ Business Data: Historical datasets of biopsy data and results (diagnoses)
§ Machine Learning Problem: Classification

Data Preparation
Model Training Model Evaluation
& Exploration


Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


V1, V2, V7-V9: biological variables
Diagnosis: healthy or cancerous

Source: adapted from a dataset

provided by Dr Mark Griffin, Industry
Fellow, University of Queensland

Sample size: 699

Number of columns: 7

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 21
Loading and Exploring the Dataset

# load dataset
records = pd.read_csv("/content/drive/MyDrive/VNU2022/biopsy_ln.csv")



Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 22

sns.countplot('class', data=records, hue='class')

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 23
Data preparation

#convert categorical data to numerical

def coding_diagnosis(x):
if x=='cancerous': return 1
if x=='healthy': return 0

records['Diagnosis'] = records['class'].apply(coding_diagnosis)


Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 24

for i in records.iloc[:,1:5]:
sns.regplot(x=records[i], y=records['Diagnosis'], logistic=True, ci=None)

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 25
Data Preparation

#Selecting predictors

features =['V1', 'V2', 'V7', 'V8', 'V9']

X=records[features] #Input data
y=records['Diagnosis'] # Target variable


Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 26
Data Splitting

from sklearn.model_selection import train_test_split # Import train_test_split function

# Split dataset into training set and test set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1) # 80% training and 20% testing

#inspect the split datasets


print('Training dataset size:',X_train.shape[0])

print('Test dataset size:',X_test.shape[0])

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 27
Model Training

from sklearn.linear_model import LogisticRegression

#Create an initial Logistic Regression model

logreg = LogisticRegression(max_iter=100)

# Train Logistic Regression Classifier with the training dataset

logreg = logreg.fit(X_train, y_train)


Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 28
Model Testing

#Make predictions for the test dataset

y_pred = logreg.predict(X_test)

inspection=pd.DataFrame({'Actual':y_test, 'Predicted':y_pred})

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 29
Model Testing

import matplotlib.pyplot as plt

from sklearn.metrics import precision_recall_curve
from sklearn.metrics import plot_precision_recall_curve
from sklearn.metrics import plot_confusion_matrix

#Calculate metrics: Accuracy, Precision, Recall, F1,

print("Accuracy: ", metrics.accuracy_score(y_test,y_pred))
print("Precision: ", metrics.precision_score(y_test,y_pred))
print("Recall: ", metrics.recall_score(y_test,y_pred))
print("F1: ", metrics.f1_score(y_test,y_pred))

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 30
Model Testing

#print confusion matrix and evaluation report

from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 31
Plot ROC curve and Confusion Matrix

#Plot ROC (Receiver operating characteristic) curve and confusion matrix

from sklearn.metrics import RocCurveDisplay
from sklearn.metrics import ConfusionMatrixDisplay

RocCurveDisplay.from_estimator(logreg,X_test, y_test)
ConfusionMatrixDisplay.from_estimator(logreg, X_test, y_test)

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 32
Cross Validation


from sklearn.model_selection import KFold

from sklearn.model_selection import cross_val_score
kfold = KFold(n_splits=5, random_state=2022, shuffle=True)
evaluation = cross_val_score(logreg, X, y, cv=kfold)

print("Accuracy and standard evaluation: %.3f%% (%.3f%%)" % (evaluation.mean()*100.0, evaluation.std()*100.0))

Accuracy and standard evaluation: 92.983% (2.721%)

from sklearn.linear_model import LogisticRegressionCV

logreg2=LogisticRegressionCV(cv=10, random_state=2022).fit(X, y)
print("Accuracy: %.3f" % logreg2.score(X,y))
Accuracy: 0.930

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 33

#visualise logistric regression S-curve for a single predictor

sns.regplot(x=X_train['V7'], y=y_train, logistic=True, ci=None)

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 34

Logit = -8.553 + 0.304V1 + 0.194V2 + 1.185V7 + 0.170 V8 + 0.092 V9

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 35
Tuning the model parameters


• Select different predictors and feature engineering

• Tuning the decision tree model hyper parameters
• LogisticRegression(class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='warn', n_jobs=None,
penalty='l1', random_state=None, solver='warn', tol=0.0001, verbose=0,
• penalty can be “l1“, “l2“, “elasticnet” ( both) (adjust loss function)
• C close to 1.0: Light penalty; close to 0.0: Strong penalty
• not all solvers support all penalty types
• Change threshold
y_test2 = logreg.predict_proba(X_test)
y_test2 = logreg.predict_proba(X_test)[:,1] >= 0.3
• Retrain and retest the model

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 36
Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 37

• The target should be categorical

• The datapoints are independent
• No extreme outliers
• No severe collinearity among the predictors
• There exists a linear relationship between each predictors and the logit of the
target i.e. log(p / (1-p))
• Sample size is large enough

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen

Linear Regression and Logistic Regression

Linear Regression Logistic Regression

• Supervised ML • Supervised ML
• Linear Regression equations • Linear Regression equations

• Estimation: target is continuous • Classification: target is categorical

• Best-fit line • S-curve (Fit the regression values to the
sigmoid curve)
• Loss function: Prediction Error
• Cost function: Mean squared error • Lost function: maximum likelihood estimation
Cost function: maximum likelihood
estimation values

• Assume linear relationships between • Not assume linear relationships between

predictors and target predictors and target

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


• Explainable: Easy to interpret Cons
• Visual representation
• Non-parametric: No assumptions on data • Target must be categorical, best with binary
distribution (linearity, normality) (dichotomous)
• Less effort for data preparation, no need for • Work best if predictors are linearly separable
normalisation by the target
• Work for both numerical and categorical • Require large datasets. Overfitting if datasets
predictors are small
• Complex when having multi-class targets

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


• The target should be categorical

• The datapoints are independent
• No extreme outliers
• No severe collinearity among the predictors
• There exists a linear relationship between each predictors and the logit of the
target i.e. log(p / (1-p))
• Sample size is large enough

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen

Are you keeping pace?

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 42
Predictive Machine Learning with Logistic Regression

Exercises in

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 43
Google Colab

https://colab.research.google.com https://jupyter.org/

Exercise 1: Cancer Diagnosis

• Biopsy dataset
Exercise 2: Cancer Survivability Prediction
• Breast Cancer dataset
Exercise 3: Diabetes Diagnosis
• Pima Indians Diabetes dataset
Exercise 4: Titanic Survivability Prediction
• Titanic dataset
Exercise 5: Churn Prediction
• Telco Churn dataset

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 44
Additional resources

• Molnar, C., (2022) Interpretable Machine Learning - A Guide for Making Black Box Models Explainable,

• Brownlee, J, Multinomial Logistic Regression With Python,


• Huyen, C., (2022) Designing Machine Learning Systems.


Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 45

You might also like