Developing A Machining Learning Models From Start To Finish.

Developing a Machining
Learning Models from

Start to Finish..
Dr. A. OBULESU
Assoc. Professor
1
Learn by Doing…
2
A little about me
 Certified By NVDIA and Leadingindia.ai

 In charge- MALAI Club in Association with
Wave Lab
 Convener- MALAI Research Wing
 BoS Member of CSE & AI
... @ Anurag University
1982 – 2020 . . .
3
Agenda
① Key Terms of Machine Learning
② Steps for Designing a Models
③ Machine learning Models
④ Clarifications on Models
4
Key Terms in Machine

Learning
5
What is Machine Learning? P

Ideall
UL B
y I
Field of study that gives "computers the
S H
ability to learn without being explicitly
programmed"
-- Arthur Samuel, 1959

6
Ma
 Machine learning is about predicting the future
chinbased on
the past.
-- Hal Daume III e
Lear
past future
ning ic t
rn e d
lea r
Training model/ Testing
is…
model/
p
Data predictor Data predictor

7
AI can Achieve
Robotics
Domain Rule Based

Specific Expert
Artificial Intelligence System
Computing
Machine
Learning
DL
8
Machine Learning Types

Classification
Classification
Supervised
Regression
Human Unsupervised
Unsupervised
Clustering
Supervision?
Reinforcement Clustering
Machine Learn Online

nline
Learning Incrementally ?
Batch Processing
Instance based
How They Generalize? Modl based
Model Based
9
Supervised learning
label
• Feed The Label Dataset to The Model…
apple
apple
banana
In My Words: Supervised Learning is
banana Learning By Teacher…
Supervised learning: given labeled examples

Training Data 10
Ma
chin
rn e i ct
lea pr
e d
Lear
model/
predictor
ning
is…
Supervised
Learning
11
Supervised learning Contd..
model/ predicted label

predictor
Test Data
12
Classification
Classification : Predicting of Discrete Value .i.e. Apple or Banana or 0 or 1
Applications
• Face recognition
• Character recognition ,
• Spam detection,
• Medical diagnosis: From symptoms to illnesses,
• Biometrics: Recognition/authentication using physical and/or behavioral
characteristics: Face, iris, signature, etc..
14
Supervised learning: Regression

label
Regression : Predicting label is
-4.5
real-valued
10.1
3.2
4.3
Supervised learning: given labeled examples

13
Regression Applications
• Economics/Finance: predict the value of a stock
• Epidemiology
• Car/plane navigation: angle of the steering wheel,

acceleration, …
• Temporal trends: weather over time

•…
15
Unsupervised learning
In My Words: Un-Supervised
Learning is Learning With Out
Teacher…
• Feed The Dataset With out Label to The Model…
Unsupervised learning: given data, i.e. examples, But NO LABELS

16
Unsupervised learning
•Learn clusters/groups without any label
•Customer segmentation (i.e. grouping)
•Image compression
•Bioinformatics: learn motifs,

• …..
17
Reinforcement learning
•Given a sequence of examples/states and a reward after
completing that sequence, learn to predict the action to take in
for an individual example/state
… WIN! In My Words:
Feedback System
… LOSE!
…
18
Lets us understand with use cases?
P
UL B
I
S H
Collect the data automatically Self driving car on the road Accurate results in Google search
Speech recognition in Mobile
Netflix Movie recommendation Product recommendation

!!! Many…!!!
19
Machine Learning Tools
MLlib - Distributed Simpl

e
40
Steps for Designing a ML

Model
20
1. Define Appropriately the

Problem
• What is the main objective? What are we trying to predict?
• What are the target features?
• What is the input data? Is it available?
• What kind of problem are we facing? Binary classification?
Clustering?
• What is the expected improvement?
21
* Define Appropriately the Problem : Case Study –

Predicting of Cardiovascular Risk
• Objective : Predicting of risk level !!!
• Input data is available from Kaggle, UCI etc.
• This problem is multi-class classification
• Predicting of Accurate class among:
• 0 – No Risk
• 1- Mild Risk
• 2- Acute Risk
• 3- Serious Risk 4 –Very serious Risk
22
2. Collecting Data
This is the first real step towards the real development of a machine learning
model
The more and better data that we get , The better model we will design
There are many sources calling Web scraping like
 Kaggle
 UCI Machine Learning Repository: Created as an ftp archive in 1987 by
David Aha and fellow graduate students at UC Irvine
23
2. Case study :
 Taken files from Kaggle
 features.csv
 target.csv
24
2. Case study :
 Taken files from Kaggle
 features.csv : Features Names
 target.csv : Features Name
25
3. Choose the Measure of Success
• “If you can’t measure it you can’t improve it”.
•Regression problems use certain evaluation

metrics such as mean squared error (MSE).
•Classification problems use evaluation metrics as

precision, accuracy and recall.
26
4. Setting an Evaluation Protocol

•
 The reason to split data in three parts is to avoid information
leaks.
 The main inconvenient of this method is that if there is little data
available, the validation and test sets will contain so few samples
that the tuning and evaluatation processes of the model will not be
effective
27
4. Setting an Evaluation Protocol
•
28
4. Setting an Evaluation Protocol : Case study- K-fold

validation
•
29
5. Preparing the Data

•
 We should transform our data in a way that can be fed into a Machine
Learning model
 Dealing with missing data
Handling Caterogical Data
 Feature Scaling : Normalization, Standardization
Selecting Meaningful Features : PCA

31
4.1 Over fitting and Under fitting

•
•The model is starting to underfit: it has learned so not well the
training data. Small Dataset, Simple Model
•The model is starting to overfit: it has learned so well the training
data that has learned patterns that are too specific to training data and
irrelevant to new data.

30

•
While training the Model, two issues we should consider
•Optimization is the process of adjusting a model to get the best
performance possible on training data (the learning process).
•Generalization is how well the model performs on unseen data. The
goal is to obtain the best generalization ability.

32

•
33
4.1 Two ways to avoid this overfitting
•
 Getting more data is usually the best solution, a model trained on more data
will naturally generalize better.
 Regularization
•L1 regularization: The cost is proportional to the absolute the value of the
weights coefficients (L1 norm of the weights).
•L2 regularization: The cost is proportional to the square of the value of the
34
5. Developing a Benchmark model
•
 To develop a benchamark model that serves us as a baseline
 Benchmarking requires experiments to be comparable,
measurable, and reproducible

35
6. Developing a Better Model & Tunning its

Hyperparameters
•
 Finding Good model
 A number of folds in which we will split our data.
 A scoring method (that will vary depending on the problem’s
nature — regression, classification…).
 Some appropriate algorithms that we want to check .

40
Machine Learning Models

36
Regression
• Predict future scores on Y based on measured scores on X

 Predictions are based on a correlation from a sample where
both X and Y were measured.
Equation is linear:
y = bx + a
y = predicted score on y
x = measured score on x
b = slope
a = y-intercept
37
Relationship between Variables

1. Relationship Between Variables In a Linear Function
Population Population Slope Random Error

Y-Intercept
Y i   0   1X i  i
Dependent Independent (Explanatory) Variable
(Response) Variable
(e.g., COVID-19.) (e.g., age)
38
Linear Vs. Logistic
 When the prediction of outcome for the given input is

in a range or continuous – Linear regression
 When the prediction of the outcome is in a discrete
form – Logistic Regression
39
Linear Vs. Logistic Regression

40
Logistic regression
That the probability can not be negative, so we introduce a term called
exponential in our normal regression model to make it logistic regression.
 Since the probability can never be greater than 1, we need to divide our
outcome by something bigger than itself
 Regression formula give us Y using formula Yi = β0 + β1X+ εi.

41
Logistic regression Contd…

 We have to use exponential so that it does not become negative and hence we get P =
exp(β0 + β1X+ εi).
We divide that P by something bigger than itself so that it remains less than one and hence
we get P = e( β0 + β1X+ εi)/e( β0 + β1X+ εi) +1
After doing some calculations that formula in 3rd step can be re-written as
log(p/(1-p)) = β0 + β1X+ εi.
 log(p/(1-p)) is called the odds of probability.
If you look closely it is the probability of desired outcome being true divided by the
probability of desired outcome not being true and this is called LOGIT FUNCTION.
43
Knn Classifier
• Data set:
• Training (labeled) data: T = {(x i , yi)}
• x i ∈ Rp
• Test (unlabeled) data: x0 ∈ Rp
• Tasks:
• Classification: yi ∈ {1, . . . , J}
• Regression: yi ∈ R
• Given new x0 predict y0
• Methods:
• Model-based
• Memory-based
44
Classification
credit risk assessment (source: Alpaydin ...)

45
Regression
source: O’Reilly ...

46
KNN Classifier
• 1 NN
• Predict the same value/class as the nearest instance
in the training set
• k NN
• Find the k closest training points (small ǁxi − x0ǁ
according to some metric, for ex. euclidean,
manhattan, etc.)
• predicted class: majority vote
• Predicted value: average weighted by inverse
distance
47
K NN
49
k NN - Example
source: Duda, Hart ...

51
k NN Classification
• Calculate distances of all training vectors to

test vector
• Pick k closest vectors
• Calculate average/majority
52
k NN
• + Easy to understand and program

• + Explicit reject option
• if there is no majority agreement
• + Easy handling of missing values
• restrict distance calculation to
subspace
• + asymptotic misclassification rate (as the number of data
points n → ∞ ) is bounded above by twice the Bayes
error rate. (see Duda, Hart...)
53
k NN
• - affected by local structure

• - sensitive to noise, irrelevant features
• - computationally expensive O(nd)
• - large memory requirements
• - more frequent classes dominate result (if distance not
weighed in)
• - curse of dimensionality: high nr. of dimensions and low nr. of
training samples:
• "nearest" neighbor might be very far
• in high dimensions "nearest" becomes meaningless
54
• Choice of k
• smaller k ⇒ higher variance (less stable)
• larger k ⇒ higher bias (less precise)
• Proper choice of k dependends on the data:

• Adaptive methods, heuristics
• Cross-validation
Support Vector Machine
•SVM is to find a hyper plane in an N-dimensional space (N-
Number of features) that distinctly classifies the data points.
Support Vector Machine Contd..
56
Geometric Margin
Definition: The margin of example 𝑥 w.r.t. a linear sep. 𝑤 is the distance
from 𝑥 to the plane 𝑤 ⋅ 𝑥 = 0.
Definition: The margin 𝛾𝑤 of a set of examples 𝑆 wrt a linear separator 𝑤

is the smallest margin over points 𝑥 ∈ 𝑆.
Definition: The margin 𝛾 of a set of examples 𝑆 is the maximum
𝛾𝑤 over all linear separators 𝑤 . w
𝛾
- 𝛾 +
+
- +
- + +
- -
-
-
-
57
Margin Important Theme in ML

Both sample complexity and algorithmic implications.
Sample/Mistake Bound complexity:
• If large margin, # mistakes Peceptron makes is small (independent on
the dim of the space)!
• If large margin 𝛾 and if alg. produces a large
w
margin classifier, then amount of data needed 𝛾 ++
depends only on R/𝛾 [Bartlett & Shawe-Taylor ’99]. - 𝛾
+
+ +
Algorithmic Implications - - --- -
- -
Suggests searching for a large margin classifier… SVMs
55
Thank you !
obuleshcse@cvsr.ac.in

Developing A Machining Learning Models From Start To Finish.

Uploaded by

Copyright:

Available Formats

You might also like

Developing A Machining Learning Models From Start To Finish.

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Developing A Machining Learning Models From Start To Finish.

Uploaded by

Copyright:

Available Formats

Developing a Machining

Learning Models from

 Certified By NVDIA and Leadingindia.ai

① Key Terms of Machine Learning

② Steps for Designing a Models

③ Machine learning Models

Key Terms in Machine

What is Machine Learning? P

ability to learn without being explicitly

-- Arthur Samuel, 1959

Data predictor Data predictor

Domain Rule Based

Machine Learning Types

Machine Learn Online

Supervised learning: given labeled examples

Supervised learning Contd..

model/ predicted label

Supervised learning: Regression

Supervised learning: given labeled examples

• Car/plane navigation: angle of the steering wheel,

• Temporal trends: weather over time

• Feed The Dataset With out Label to The Model…

Unsupervised learning: given data, i.e. examples, But NO LABELS

•Customer segmentation (i.e. grouping)

•Bioinformatics: learn motifs,

Speech recognition in Mobile

Netflix Movie recommendation Product recommendation

Machine Learning Tools

MLlib - Distributed Simpl

Steps for Designing a ML

1. Define Appropriately the

* Define Appropriately the Problem : Case Study –

3. Choose the Measure of Success

• “If you can’t measure it you can’t improve it”.

•Regression problems use certain evaluation

•Classification problems use evaluation metrics as

4. Setting an Evaluation Protocol

4. Setting an Evaluation Protocol

4. Setting an Evaluation Protocol : Case study- K-fold

5. Preparing the Data

 Dealing with missing data

Handling Caterogical Data

 Feature Scaling : Normalization, Standardization

Selecting Meaningful Features : PCA

4.1 Over fitting and Under fitting

training data. Small Dataset, Simple Model

•The model is starting to overfit: it has learned so well the training

irrelevant to new data.

4.1 Over fitting and Under fitting

•Optimization is the process of adjusting a model to get the best

performance possible on training data (the learning process).

•Generalization is how well the model performs on unseen data. The

goal is to obtain the best generalization ability.

4.1 Over fitting and Under fitting

4.1 Two ways to avoid this overfitting

will naturally generalize better.

weights coefficients (L1 norm of the weights).