Aml CS 4 PRV

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

Applied Machine Learning

Raja vadhana P
BITS Pilani Assistant Professor – BITS CSIS
Pilani Campus
Course Plan

M1 Introduction to Machine Learning

M2-M3 End-to-end Machine Learning Pipeline

M4 Linear Prediction Models

M5 Classification Models I

M6 Classification Models II

M7 Unsupervised Learning

M8 Neural Networks

M9 Deep Networks

M10 FAccT Machine Learning

BITS Pilani, Pilani Campus

M2: End-to-end Machine Learning Pipeline

1 Framing the ML Problem

2 Data Types

3 Pre-processing

4 Visualization and Analysis

BITS Pilani, Pilani Campus

Housing example - Book

BITS Pilani, Pilani Campus

Data Binarization

pandas.get_dummies(dataframe[“COLNAME”]) pd.get_dummies(df[“Fuel”])

from sklearn.preprocessing import LabelEncoder


df[“Fuel”] = LabelEncode().fit_transform(df[“Fuel”])

BITS Pilani, Pilani Campus

Binary Transformation
Housing example - Book

Replace each category with a learnable low dimensional vector called an embedding. Each
category’s representation would be learned during training: : Representation learning

BITS Pilani, Pilani Campus

Transformation Pipelines
Housing example - Book

BITS Pilani, Pilani Campus

Visualization & Analysis

BITS Pilani, Pilani Campus

Data Visualization

Data Visualization is the art and practice of gathering, analyzing and

graphically representing empirical information

• Understand Data Dynamics

• Gain Insight
• Search for interpretation
• Quantitative Analysis
• Proof for Inference

BITS Pilani, Pilani Campus

Statistical Visualization – Interpretation Criteria

1. Frequency – Repeatability Vs Consistency Vs Periodicity

2. Deviation - Rarity

3. Correlation – Similarity

Note: It’s is an iterative process: once you get a prototype up and running, you can
analyse its output to gain more insights and come back to this exploration step

BITS Pilani, Pilani Campus

Data View

BITS Pilani, Pilani Campus

Box Plot

BITS Pilani, Pilani Campus


BITS Pilani, Pilani Campus

Scatter Plot

BITS Pilani, Pilani Campus

Correlation Analysis

BITS Pilani, Pilani Campus

Correlation Analysis

Note : May completely miss out on nonlinear relationships

BITS Pilani, Pilani Campus

Scatter Matrix

BITS Pilani, Pilani Campus

Correlation Analysis

BITS Pilani, Pilani Campus


• Understand the importance of data quality

• Identify the application based data quality problems propagating the need

for pre-processing

• Identify right pre-processing technique for the requirements

• Apply appropriate visualization technique

BITS Pilani, Pilani Campus

Course Plan

M1 Introduction to Machine Learning

M2-M3 End-to-end Machine Learning Pipeline

M4 Linear Prediction Models

M5 Classification Models I

M6 Classification Models II

M7 Unsupervised Learning

M8 Neural Networks

M9 Deep Networks

M10 FAccT Machine Learning

BITS Pilani, Pilani Campus

M3: End-to-end Machine Learning Pipeline

1 Model Selection and Training

2 Model Evaluation

3 Machine Learning Pipeline

BITS Pilani, Pilani Campus

End-to-end Machine Learning Pipeline
Module Learning Objectives

• Get a fair idea on the components of a Machine Learning Pipeline

• Identify & implement the use case specific model selection

• To compare model performances using evaluation measures

• Understand the overall perspective of ML-Ops design

BITS Pilani, Pilani Campus


Business Objective

Existing Solution as
Reference Performance

Level of Supervision

Data Dynamicity
Computational Population
M-House Value
Performance Measure
Total Rooms
Check the Design
Assumptions Total Bedrooms
Ocean Proximity
Post Processing

BITS Pilani, Pilani Campus

ML Pipeline Process

Data Extraction
• Data Bank

EDA – Exploratory Data • Scheme  Features


Data Preparation • Engineered Data Splits

Model Training • Tuned Trained Model

Model Evaluation
• Metric Set

Model Validation • Baseline Vs Predicted Performance

Model Serving • Micro services | Embedded Model | Batch System

Model Monitoring • Trigger Point Monitoring

BITS Pilani, Pilani Campus

Model Selection & Training

BITS Pilani, Pilani Campus


Prefer to sample based on stable features

ID must be immune to change

M-House Value
Total Rooms
Total Bedrooms
Ocean Proximity

BITS Pilani, Pilani Campus

Model Selection & Training
Housing Price Prediction – Book


BITS Pilani, Pilani Campus

Housing example - Book

BITS Pilani, Pilani Campus

Model Selection & Training
Housing Price Prediction – Book


BITS Pilani, Pilani Campus

Model Selection & Training
Housing Price Prediction – Book

In class there were queries in above and below code snippets. Please refer to your book.
Above example was to illustrate the label copying and below example in the second
parameter to train , the copied labels are appended in the linear regression fit. Refer here
for the scikit library documentation:

BITS Pilani, Pilani Campus

Model Checking
Housing Price Prediction – Book

BITS Pilani, Pilani Campus

Model Testing
Housing Price Prediction – Book

BITS Pilani, Pilani Campus

Model Evaluation
• Cost Function
• Loss Function
• Objective Function
• Error Function

BITS Pilani, Pilani Campus

Model Evaluation

<Xi>: Y-Actual Y-Predicted

<Income, Bedrooms, Distance>
<5000, 3, 5> 200 250
<1000, 2, 2> 150 140
<6000, 3, 10> 200 150


BITS Pilani, Pilani Campus

Model Evaluation

<Xi>: Y-Actual Y-Predicted

<Income, Bedrooms, Distance>
<5000, 3, 5> High High
<1000, 2, 2> Medium Medium
<6000, 3, 10> High Medium


BITS Pilani, Pilani Campus

Model Evaluation
Classification – Confusion Matrix


C(i|j) + - Accuracy is proportional to cost
CLASS + -1 100
1. C(Yes|No)=C(No|Yes) = q
- 1 0 2. C(Yes|Yes)=C(No|No) = p

Accuracy = 80%
Cost = 3910 Accuracy = (a + d)/N

Cost = p (a + d) + q (b + c)
Accuracy = 90% = p (a + d) + q (N – a – d)
+ -
Cost = 4255 = q N – (q – p)(a + d)
CLASS + 250 45 = N [q – (q-p)  Accuracy]
- 5 200

BITS Pilani, Pilani Campus

Model Evaluation
Classification – Confusion Matrix


C(i|j) + -
CLASS + -1 100
- 1 0

Accuracy = 80%
Cost = 3910


Accuracy = 90%
+ -
Cost = 4255
CLASS + 250 45
- 5 200

BITS Pilani, Pilani Campus

Model Evaluation

 No model consistently outperform

the other
 M1 is better for small FPR
 M2 is better for large FPR

 Area Under the ROC curve

 Ideal:
 Area = 1
 Random guess:
 Area = 0.5

BITS Pilani, Pilani Campus

Model Evaluation

Instance P(+|A) True Class • Use classifier that produces posterior probability for
each test instance P(+|A)

1 0.95 + • Sort the instances according to P(+|A) in decreasing

2 0.93 +
• Apply threshold at each unique value of P(+|A)
3 0.87 -
• Count the number of TP, FP,
4 0.85 - TN, FN at each threshold
5 0.85 - • TP rate, TPR = TP/(TP+FN)
6 0.85 + • FP rate, FPR = FP/(FP + TN)
7 0.76 -
8 0.53 +
9 0.43 -
10 0.25 +

BITS Pilani, Pilani Campus

Model Evaluation
Instance P(+|A) True Class

1 0.95 +
2 0.93 +
3 0.87 -
4 0.85 -
5 0.85 -
6 0.85 +
7 0.76 -
8 0.53 +
9 0.43 -
10 0.25 +
Class + - + - - - + - + +
0.25 0.43 0.53 0.76 0.85 0.85 0.85 0.87 0.93 0.95 1.00

TP 5 4 4 3 3 3 3 2 2 1 0

FP 5 5 4 4 3 2 1 1 0 0 0

TN 0 0 1 1 2 3 4 4 5 5 5

FN 0 1 1 2 2 2 2 3 3 4 5

TPR 1 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.2 0

FPR 1 1 0.8 0.8 0.6 0.4 0.2 0.2 0 0 0

BITS Pilani, Pilani Campus

Model Evaluation
Housing Price Prediction – Book

• Under fitting
• Features may not have provided enough information to make good predictions
• Model may not have been powerful
• May be there are more constraints in the model
BITS Pilani, Pilani Campus
Next Class Plan
• Model Validation
• Hyper parameter Optimization
• Brief overview about MLOps & Measures of
• Linear Regression

BITS Pilani, Pilani Campus

You might also like