5 - Logistic Regression - Lemai Nguyen 2022

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 45

Supervised Machine Learning

Logistic Regression
Associate Professor Lemai Nguyen
Associate Professor Lemai Nguyen
Information Systems and Business Analytics

Expertise areas and research interests:


Artificial Intelligence and Business Analytics
Health Informatics and Digital Health
Socio-technical Analysis and Evaluation

External affiliations / professional activities:


Section Editor, Australasian Journal of Information Systems (2013 – Present)
Honorary Senior Research Affiliate, Epworth Healthcare (2015-2017)
Member of Association for Information Systems (AIS) and AAIS
Member of Australasian Institute of Digital Health (formerly HISA) (2009 -
Present)
Senior Certified Professional, Australian Computer Society (ACS) (2017-Present)
Track Co-Chair of Digital Healthcare Systems, Australasian Conference on
Information Systems, 2019-2022

Email: lemai.nguyen@deakin.edu.au

UPCOMING EVENTS:
• DATA ANALYTICS IN AUSTRALIAN ORGANISATIONS
• EV DETECTION CHALLENGE

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 2
Tell me about you…!

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 3
Kotu V., Deshpande B. Data Science : Concepts and Practice, chapters
1and 4. Second edition. Morgan Kaufmann Publishers; 2019.

Google Colab

https://colab.research.google.com https://jupyter.org/

https://rapidminer.com

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 4
Predictive Machine Learning with Logistic Regression

Logistic
Regression –
Key concepts

Exercises in
Python

Illustrative
example

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 5
Logistic
Regression –
Key concepts

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 6
Supervised Machine Learning

Kotu and Deshpande, 2019, chapter 1

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen MIS716 AI for Business Slide 7
Linear Regression – revision (1)
What it is and How it works

Simple linear regression

y’= f(𝑥) = 𝒃𝟎 + 𝒃𝟏 𝒙
𝒃𝟎 is the intercept of the line

𝒃𝟏 is the slope of the line

𝑥=independent variable/predictor
𝑦=dependent variable/label

Image source: http://www.sthda.com/english/articles/40-


regression-analysis/167-simple-linear-regression-in-r/

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


8
Image source: https://www.miabellaai.net/regression.html
If we have more than one independent variables:

𝒚′ = 𝒃𝟎 + 𝒃𝟏 𝒙𝟏 + 𝒃𝟐 𝒙𝟐 + ⋯ + 𝒃𝒏 𝒙𝒏

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


9
Linear Regression – Revision (2) 𝑦! =actual
Model evaluation metrics and cost functions 𝑦!! =estimated

𝒚′ = 𝒃𝟎 + 𝒃𝟏 𝒙

S𝐚𝐥𝐞𝐬 = $𝟏𝟎, 𝟎𝟎𝟎 + $𝟑𝟏, 𝟓𝟔𝟓 (Ad_spend)

Mean absolute error (MAE)


∑&#$% |𝑦# − 𝑦E# |
𝑀𝐴𝐸 =
𝑛

Root mean square error/deviation (RMSE/RMSD)


∑&#$%(𝑦# − 𝑦E# )'
𝑅𝑀𝑆𝐸 =
𝑛
Image source: http://www.sthda.com/english/articles/40-
regression-analysis/167-simple-linear-regression-in-r/

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


10
Observations

𝒚′ = 𝒃𝟎 + 𝒃𝟏 𝒙𝟏 + 𝒃𝟐 𝒙𝟐 + ⋯ + 𝒃𝒏 𝒙𝒏

• Continuous predictors and target


• No missing data
• No outliers
• No multi-collinearity
• Normal distribution and constant variance
of residuals
• Not good for non-linear relationships,
complex relationships

Image source: https://www.miabellaai.net/regression.html

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


11
For example
Kotu and Deshpande, 2019

Continuous target data

Discrete target data

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


12
Fitting Linear Regression to a binary target
Kotu and Deshpande, 2019

• Target is categorical
• Predictors can be continuous or
categorical

One way is to convert

into y ∈ {0,1}

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


13
The Logistic function - logit

A common Sigmoid function

x is continuous from - ∞ to + ∞

If x= - ∞ then Sigmoid (x) = 0


If x= + ∞ then Sigmoid (x) = 1

v Aha!! S(x) can be function


to convert

https://en.wikipedia.org/wiki/Sigmoid_function

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


14
Logistic Regression
How it works
Kotu and Deshpande, 2019

p is the probability of the event y happening


then (1-p) is the probability of the event not happening

p/(1-p) -> the odds (odds ratio) of the event happening

log of the odds log (p/(1-p)) is called is called the logit.

Given predictors X, logit is a linear regression

Logit is continuous from - ∞ to + ∞

Probability (y) can be computed using the logistic function (Sigmoid):


= 1 / (1+ e-logit)

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


15
Training the logistic model
Model training and loss, cost functions

Given predictors X (x1, x2, ..., xn),

Find the logit, a linear regression to the predictors X:

Computer probability of y using the Sigmoid function:

Training will involve a search for the coefficients bi to maximise the likelihood of estimations
for each datapoint using a simplified likelihood function

y – original target data (training dataset) v Cost function is Sum of all likelihood values.
p – estimated probability v Gradient descent can be utilised to search for
coefficients to maximise the likelihood of
correct estimations

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


16
Are you keeping pace?

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 17
Predictive Machine Learning with Logistic Regression

Illustrative
example

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 18
Business problem framing – business needs and application
• Assist pathologists in
interpretating test data ->
reduced time and improved
accuracy
• Training novice pathologists

• Predictive analytics to
classify diagnosis
• Past biopsy data and Application
results
• To predict cancer diagnosis
• Long delay in returning
pathology results
• Novice pathologists need Analytics (ML)
training
Data

Business needs

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 19
ML Problem Framing: Classification
Predict if a datapoint belongs to one of the predefined classes, based on learning from a labelled
dataset

§ Business Context: Pathology Lab


§ Business Problem: To make cancer diagnoses in less time, with same or higher accuracy
§ Business Data: Historical datasets of biopsy data and results (diagnoses)
§ Machine Learning Problem: Classification

Data Preparation
Model Training Model Evaluation
& Exploration

20

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


Dataset

Data:
V1, V2, V7-V9: biological variables
Diagnosis: healthy or cancerous

Source: adapted from a dataset


provided by Dr Mark Griffin, Industry
Fellow, University of Queensland

Sample size: 699


Number of columns: 7

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 21
Loading and Exploring the Dataset

# load dataset
records = pd.read_csv("/content/drive/MyDrive/VNU2022/biopsy_ln.csv")

records.info()

records.describe()

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 22
Exploration

sns.countplot('class', data=records, hue='class')

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 23
Data preparation

#convert categorical data to numerical


def coding_diagnosis(x):
if x=='cancerous': return 1
if x=='healthy': return 0

records['Diagnosis'] = records['class'].apply(coding_diagnosis)

records.head(10)

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 24
Exploration

for i in records.iloc[:,1:5]:
sns.regplot(x=records[i], y=records['Diagnosis'], logistic=True, ci=None)
plt.title(i)
plt.show()

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 25
Data Preparation

#Selecting predictors

features =['V1', 'V2', 'V7', 'V8', 'V9']


X=records[features] #Input data
y=records['Diagnosis'] # Target variable

print(X.head())
print(y.head())

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 26
Data Splitting

from sklearn.model_selection import train_test_split # Import train_test_split function

# Split dataset into training set and test set


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1) # 80% training and 20% testing

#inspect the split datasets


print(X_train.head())
print(y_train.head())

print('Training dataset size:',X_train.shape[0])


print('Test dataset size:',X_test.shape[0])

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 27
Model Training

from sklearn.linear_model import LogisticRegression

#Create an initial Logistic Regression model


logreg = LogisticRegression(max_iter=100)

# Train Logistic Regression Classifier with the training dataset


logreg = logreg.fit(X_train, y_train)

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 28
Model Testing

#Make predictions for the test dataset


y_pred = logreg.predict(X_test)

#inspection
inspection=pd.DataFrame({'Actual':y_test, 'Predicted':y_pred})
inspection.head(20)

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 29
Model Testing

import matplotlib.pyplot as plt


from sklearn.metrics import precision_recall_curve
from sklearn.metrics import plot_precision_recall_curve
from sklearn.metrics import plot_confusion_matrix

#Calculate metrics: Accuracy, Precision, Recall, F1,


print("Accuracy: ", metrics.accuracy_score(y_test,y_pred))
print("Precision: ", metrics.precision_score(y_test,y_pred))
print("Recall: ", metrics.recall_score(y_test,y_pred))
print("F1: ", metrics.f1_score(y_test,y_pred))

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 30
Model Testing

#print confusion matrix and evaluation report


from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 31
Plot ROC curve and Confusion Matrix

#Plot ROC (Receiver operating characteristic) curve and confusion matrix


from sklearn.metrics import RocCurveDisplay
from sklearn.metrics import ConfusionMatrixDisplay

RocCurveDisplay.from_estimator(logreg,X_test, y_test)
ConfusionMatrixDisplay.from_estimator(logreg, X_test, y_test)
plt.show()

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 32
Cross Validation

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegressionCV.html

from sklearn.model_selection import KFold


from sklearn.model_selection import cross_val_score
kfold = KFold(n_splits=5, random_state=2022, shuffle=True)
evaluation = cross_val_score(logreg, X, y, cv=kfold)

print("Accuracy and standard evaluation: %.3f%% (%.3f%%)" % (evaluation.mean()*100.0, evaluation.std()*100.0))

Accuracy and standard evaluation: 92.983% (2.721%)

from sklearn.linear_model import LogisticRegressionCV

logreg2=LogisticRegressionCV(cv=10, random_state=2022).fit(X, y)
print("Accuracy: %.3f" % logreg2.score(X,y))
Accuracy: 0.930

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 33
Recap

#visualise logistric regression S-curve for a single predictor


sns.regplot(x=X_train['V7'], y=y_train, logistic=True, ci=None)

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 34
Rapidminer

Logit = -8.553 + 0.304V1 + 0.194V2 + 1.185V7 + 0.170 V8 + 0.092 V9

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 35
Tuning the model parameters

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

• Select different predictors and feature engineering


• Tuning the decision tree model hyper parameters
• LogisticRegression(class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='warn', n_jobs=None,
penalty='l1', random_state=None, solver='warn', tol=0.0001, verbose=0,
warm_start=False)
• penalty can be “l1“, “l2“, “elasticnet” ( both) (adjust loss function)
• C close to 1.0: Light penalty; close to 0.0: Strong penalty
• not all solvers support all penalty types
• Change threshold
y_test2 = logreg.predict_proba(X_test)
y_test2 = logreg.predict_proba(X_test)[:,1] >= 0.3
• Retrain and retest the model

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 36
Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 37
Assumptions

• The target should be categorical


• The datapoints are independent
• No extreme outliers
• No severe collinearity among the predictors
• There exists a linear relationship between each predictors and the logit of the
target i.e. log(p / (1-p))
• Sample size is large enough
https://www.statology.org/assumptions-of-logistic-regression/

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


38
Linear Regression and Logistic Regression

Linear Regression Logistic Regression


• Supervised ML • Supervised ML
• Linear Regression equations • Linear Regression equations

• Estimation: target is continuous • Classification: target is categorical


• Best-fit line • S-curve (Fit the regression values to the
sigmoid curve)
• Loss function: Prediction Error
• Cost function: Mean squared error • Lost function: maximum likelihood estimation
Cost function: maximum likelihood
estimation values

• Assume linear relationships between • Not assume linear relationships between


predictors and target predictors and target

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


39
Reflections

Pros
• Explainable: Easy to interpret Cons
• Visual representation
• Non-parametric: No assumptions on data • Target must be categorical, best with binary
distribution (linearity, normality) (dichotomous)
• Less effort for data preparation, no need for • Work best if predictors are linearly separable
normalisation by the target
• Work for both numerical and categorical • Require large datasets. Overfitting if datasets
predictors are small
• Complex when having multi-class targets

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


40
Assumptions

• The target should be categorical


• The datapoints are independent
• No extreme outliers
• No severe collinearity among the predictors
• There exists a linear relationship between each predictors and the logit of the
target i.e. log(p / (1-p))
• Sample size is large enough
https://www.statology.org/assumptions-of-logistic-regression/

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen


41
Are you keeping pace?

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 42
Predictive Machine Learning with Logistic Regression

Exercises in
Python

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 43
Google Colab

https://colab.research.google.com https://jupyter.org/

Exercise 1: Cancer Diagnosis


• Biopsy dataset
Exercise 2: Cancer Survivability Prediction
• Breast Cancer dataset
Exercise 3: Diabetes Diagnosis
• Pima Indians Diabetes dataset
Exercise 4: Titanic Survivability Prediction
• Titanic dataset
Exercise 5: Churn Prediction
• Telco Churn dataset

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 44
Additional resources

• Molnar, C., (2022) Interpretable Machine Learning - A Guide for Making Black Box Models Explainable,
https://christophm.github.io/interpretable-ml-book/logistic.html

• Brownlee, J, Multinomial Logistic Regression With Python,


https://machinelearningmastery.com/multinomial-logistic-regression-with-python/

• Huyen, C., (2022) Designing Machine Learning Systems.


https://learning.oreilly.com/library/view/designing-machine-learning/9781098107956/ch05.html

Deakin University CRICOS Provider Code: 00113B - A/Prof Lemai Nguyen Slide 45

You might also like