Professional Documents
Culture Documents
Aml CS 4 PRV
Aml CS 4 PRV
Aml CS 4 PRV
SE/SS ZG568
Raja vadhana P
BITS Pilani Assistant Professor – BITS CSIS
Pilani Campus raja.vadhana@pilani.bits-pilani.ac.in
Course Plan
M5 Classification Models I
M6 Classification Models II
M7 Unsupervised Learning
M8 Neural Networks
M9 Deep Networks
2 Data Types
3 Pre-processing
pandas.get_dummies(dataframe[“COLNAME”]) pd.get_dummies(df[“Fuel”])
df[“Fuel”] = LabelEncode().fit_transform(df[“Fuel”])
Replace each category with a learnable low dimensional vector called an embedding. Each
category’s representation would be learned during training: : Representation learning
2. Deviation - Rarity
3. Correlation – Similarity
Note: It’s is an iterative process: once you get a prototype up and running, you can
analyse its output to gain more insights and come back to this exploration step
• Identify the application based data quality problems propagating the need
for pre-processing
M5 Classification Models I
M6 Classification Models II
M7 Unsupervised Learning
M8 Neural Networks
M9 Deep Networks
2 Model Evaluation
Business Objective
Existing Solution as
Reference Performance
Level of Supervision
Data Dynamicity
Latitude
Longitude
Computational Population
Infrastructure
M-Income
M-House Value
Performance Measure
M-Age
Total Rooms
Check the Design
Assumptions Total Bedrooms
Ocean Proximity
Post Processing
Requirements
Data Extraction
• Data Bank
Model Evaluation
• Metric Set
Latitude
Longitude
Population
M-Income
M-House Value
M-Age
Total Rooms
Total Bedrooms
Ocean Proximity
(or)
(or)
In class there were queries in above and below code snippets. Please refer to your book.
Above example was to illustrate the label copying and below example in the second
parameter to train , the copied labels are appended in the linear regression fit. Refer here
for the scikit library documentation:
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
N=a+b+c+d
Accuracy = 80%
Cost = 3910 Accuracy = (a + d)/N
Cost = p (a + d) + q (b + c)
Model M2 PREDICTED CLASS
Accuracy = 90% = p (a + d) + q (N – a – d)
+ -
Cost = 4255 = q N – (q – p)(a + d)
ACTUAL
CLASS + 250 45 = N [q – (q-p) Accuracy]
- 5 200
Accuracy = 80%
Cost = 3910
Instance P(+|A) True Class • Use classifier that produces posterior probability for
each test instance P(+|A)
1 0.95 +
2 0.93 +
3 0.87 -
4 0.85 -
5 0.85 -
6 0.85 +
7 0.76 -
8 0.53 +
9 0.43 -
10 0.25 +
Class + - + - - - + - + +
P
0.25 0.43 0.53 0.76 0.85 0.85 0.85 0.87 0.93 0.95 1.00
TP 5 4 4 3 3 3 3 2 2 1 0
FP 5 5 4 4 3 2 1 1 0 0 0
TN 0 0 1 1 2 3 4 4 5 5 5
FN 0 1 1 2 2 2 2 3 3 4 5
TPR 1 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.2 0
Observation:
• Under fitting
• Features may not have provided enough information to make good predictions
• Model may not have been powerful
• May be there are more constraints in the model
BITS Pilani, Pilani Campus
Next Class Plan
• Model Validation
• Hyper parameter Optimization
• Brief overview about MLOps & Measures of
dissimilarity
• Linear Regression