Developing A Machining Learning Models From Start To Finish.

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 59

Developing a Machining

Learning Models from


Start to Finish..

Dr. A. OBULESU
Assoc. Professor
1

Learn by Doing…
2

A little about me

 Certified By NVDIA and Leadingindia.ai


 In charge- MALAI Club in Association with
Wave Lab
 Convener- MALAI Research Wing
 BoS Member of CSE & AI
... @ Anurag University

1982 – 2020 . . .
3

Agenda

① Key Terms of Machine Learning

② Steps for Designing a Models

③ Machine learning Models

④ Clarifications on Models
4

Key Terms in Machine


Learning
5

What is Machine Learning? P


Ideall
UL B
y I
Field of study that gives "computers the
S H

ability to learn without being explicitly

programmed"

-- Arthur Samuel, 1959


6

Ma
 Machine learning is about predicting the future
chinbased on
the past.
-- Hal Daume III e
Lear
past future
ning ic t
rn e d
lea r
Training model/ Testing
is…
model/
p

Data predictor Data predictor


7

AI can Achieve

Robotics

Domain Rule Based


Specific Expert
Artificial Intelligence System
Computing

Machine
Learning
DL
8

Machine Learning Types


Classification
Classification
Supervised
Regression
Human Unsupervised
Unsupervised
Clustering
Supervision?
Reinforcement Clustering

Machine Learn Online


nline
Learning Incrementally ?
Batch Processing

Instance based
How They Generalize? Modl based
Model Based
9

Supervised learning
label
• Feed The Label Dataset to The Model…

apple

apple

banana
In My Words: Supervised Learning is
banana Learning By Teacher…

Supervised learning: given labeled examples


Training Data 10

Ma
chin
rn e i ct
lea pr
e d
Lear
model/
predictor
ning
is…
Supervised
Learning
11

Supervised learning Contd..

model/ predicted label


predictor

Test Data
12

Classification
Classification : Predicting of Discrete Value .i.e. Apple or Banana or 0 or 1

Applications
• Face recognition
• Character recognition ,
• Spam detection,
• Medical diagnosis: From symptoms to illnesses,
• Biometrics: Recognition/authentication using physical and/or behavioral
characteristics: Face, iris, signature, etc..
14

Supervised learning: Regression


label
Regression : Predicting label is
-4.5
real-valued
10.1

3.2

4.3

Supervised learning: given labeled examples


13

Regression Applications
• Economics/Finance: predict the value of a stock

• Epidemiology

• Car/plane navigation: angle of the steering wheel,


acceleration, …

• Temporal trends: weather over time


•…
15

Unsupervised learning
In My Words: Un-Supervised
Learning is Learning With Out
Teacher…

• Feed The Dataset With out Label to The Model…

Unsupervised learning: given data, i.e. examples, But NO LABELS


16

Unsupervised learning
•Learn clusters/groups without any label

•Customer segmentation (i.e. grouping)

•Image compression

•Bioinformatics: learn motifs,


• …..
17

Reinforcement learning
•Given a sequence of examples/states and a reward after
completing that sequence, learn to predict the action to take in
for an individual example/state
… WIN! In My Words:
Feedback System
… LOSE!

18
Lets us understand with use cases?
P
UL B
I
S H
Collect the data automatically Self driving car on the road Accurate results in Google search

Speech recognition in Mobile

Netflix Movie recommendation Product recommendation


!!! Many…!!!
19

Machine Learning Tools

MLlib - Distributed Simpl


e
40

Steps for Designing a ML


Model
20

1. Define Appropriately the


Problem
• What is the main objective? What are we trying to predict?
• What are the target features?
• What is the input data? Is it available?
• What kind of problem are we facing? Binary classification?
Clustering?
• What is the expected improvement?
21

* Define Appropriately the Problem : Case Study –


Predicting of Cardiovascular Risk
• Objective : Predicting of risk level !!!
• Input data is available from Kaggle, UCI etc.
• This problem is multi-class classification
• Predicting of Accurate class among:
• 0 – No Risk
• 1- Mild Risk
• 2- Acute Risk
• 3- Serious Risk 4 –Very serious Risk
22

2. Collecting Data
This is the first real step towards the real development of a machine learning
model
The more and better data that we get , The better model we will design
There are many sources calling Web scraping like
 Kaggle
 UCI Machine Learning Repository: Created as an ftp archive in 1987 by
David Aha and fellow graduate students at UC Irvine
23

2. Case study :
 Taken files from Kaggle
 features.csv
 target.csv
24

2. Case study :
 Taken files from Kaggle
 features.csv : Features Names
 target.csv : Features Name
25

3. Choose the Measure of Success

• “If you can’t measure it you can’t improve it”.

•Regression problems use certain evaluation


metrics such as mean squared error (MSE).

•Classification problems use evaluation metrics as


precision, accuracy and recall.
26

4. Setting an Evaluation Protocol



 The reason to split data in three parts is to avoid information
leaks.
 The main inconvenient of this method is that if there is little data
available, the validation and test sets will contain so few samples
that the tuning and evaluatation processes of the model will not be
effective
27

4. Setting an Evaluation Protocol


28

4. Setting an Evaluation Protocol : Case study- K-fold


validation

29

5. Preparing the Data



 We should transform our data in a way that can be fed into a Machine

Learning model

 Dealing with missing data

Handling Caterogical Data

 Feature Scaling : Normalization, Standardization

Selecting Meaningful Features : PCA


31

4.1 Over fitting and Under fitting



•The model is starting to underfit: it has learned so not well the

training data. Small Dataset, Simple Model

•The model is starting to overfit: it has learned so well the training

data that has learned patterns that are too specific to training data and

irrelevant to new data.


30

4.1 Over fitting and Under fitting



While training the Model, two issues we should consider

•Optimization is the process of adjusting a model to get the best

performance possible on training data (the learning process).

•Generalization is how well the model performs on unseen data. The

goal is to obtain the best generalization ability.


32

4.1 Over fitting and Under fitting



33

4.1 Two ways to avoid this overfitting


 Getting more data is usually the best solution, a model trained on more data

will naturally generalize better.

 Regularization 

•L1 regularization: The cost is proportional to the absolute the value of the

weights coefficients (L1 norm of the weights).

•L2 regularization: The cost is proportional to the square of the value of the
34

5. Developing a Benchmark model


 To develop a benchamark model that serves us as a baseline

 Benchmarking requires experiments to be comparable,

measurable, and reproducible


35

6. Developing a Better Model & Tunning its


Hyperparameters

 Finding Good model

 A number of folds in which we will split our data.

 A scoring method (that will vary depending on the problem’s

nature — regression, classification…).

 Some appropriate algorithms that we want to check .


40

Machine Learning Models


36

Regression
• Predict future scores on Y based on measured scores on X

 Predictions are based on a correlation from a sample where
both X and Y were measured.
Equation is linear:
y = bx + a
y = predicted score on y
x = measured score on x
b = slope
a = y-intercept
37

Relationship between Variables


1. Relationship Between Variables In a Linear Function

Population Population Slope Random Error


Y-Intercept

Y i   0   1X i  i
Dependent Independent (Explanatory) Variable
(Response) Variable
(e.g., COVID-19.) (e.g., age)
38

Linear Vs. Logistic

 When the prediction of outcome for the given input is


in a range or continuous – Linear regression
 When the prediction of the outcome is in a discrete
form – Logistic Regression
39

Linear Vs. Logistic Regression


40

Logistic regression 
That the probability can not be negative, so we introduce a term called

exponential in our normal regression model to make it logistic regression.

 Since the probability can never be greater than 1, we need to divide our

outcome by something bigger than itself

 Regression formula give us Y using formula Yi = β0 + β1X+ εi.


41

Logistic regression Contd…


 We have to use exponential so that it does not become negative and hence we get P =
exp(β0 + β1X+ εi).
We divide that P by something bigger than itself so that it remains less than one and hence
we get P = e( β0 + β1X+ εi)/e( β0 + β1X+ εi) +1
After doing some calculations that formula in 3rd step can be re-written as
log(p/(1-p)) = β0 + β1X+ εi.
 log(p/(1-p)) is called the odds of probability.
If you look closely it is the probability of desired outcome being true divided by the
probability of desired outcome not being true and this is called LOGIT FUNCTION.
43

Knn Classifier
• Data set:
• Training (labeled) data: T = {(x i , yi)}
• x i ∈ Rp
• Test (unlabeled) data: x0 ∈ Rp
• Tasks:
• Classification: yi ∈ {1, . . . , J}
• Regression: yi ∈ R
• Given new x0 predict y0
• Methods:
• Model-based
• Memory-based
44

Classification

credit risk assessment (source: Alpaydin ...)


45

Regression

source: O’Reilly ...


46

KNN Classifier
• 1 NN
• Predict the same value/class as the nearest instance
in the training set
• k NN
• Find the k closest training points (small ǁxi − x0ǁ
according to some metric, for ex. euclidean,
manhattan, etc.)
• predicted class: majority vote
• Predicted value: average weighted by inverse
distance
47

K NN
49

k NN - Example

source: Duda, Hart ...


51

k NN Classification

• Calculate distances of all training vectors to


test vector
• Pick k closest vectors
• Calculate average/majority
52

k NN

• + Easy to understand and program


• + Explicit reject option
• if there is no majority agreement
• + Easy handling of missing values
• restrict distance calculation to
subspace
• + asymptotic misclassification rate (as the number of data
points n → ∞ ) is bounded above by twice the Bayes
error rate. (see Duda, Hart...)
53

k NN

• - affected by local structure


• - sensitive to noise, irrelevant features
• - computationally expensive O(nd)
• - large memory requirements
• - more frequent classes dominate result (if distance not
weighed in)
• - curse of dimensionality: high nr. of dimensions and low nr. of
training samples:
• "nearest" neighbor might be very far
• in high dimensions "nearest" becomes meaningless
54

• Choice of k

• smaller k ⇒ higher variance (less stable)

• larger k ⇒ higher bias (less precise)

• Proper choice of k dependends on the data:


• Adaptive methods, heuristics
• Cross-validation
Support Vector Machine
•SVM is to find a hyper plane in an N-dimensional space (N-
Number of features) that distinctly classifies the data points.
Support Vector Machine Contd..
56

Geometric Margin
Definition: The margin of example 𝑥 w.r.t. a linear sep. 𝑤 is the distance
from 𝑥 to the plane 𝑤 ⋅ 𝑥 = 0.

Definition: The margin 𝛾𝑤 of a set of examples 𝑆 wrt a linear separator 𝑤


is the smallest margin over points 𝑥 ∈ 𝑆.
Definition: The margin 𝛾 of a set of examples 𝑆 is the maximum
𝛾𝑤 over all linear separators 𝑤 . w
𝛾
- 𝛾 +
+
- +
- + +
- -
-
-
-
57

Margin Important Theme in ML


Both sample complexity and algorithmic implications.
Sample/Mistake Bound complexity:
• If large margin, # mistakes Peceptron makes is small (independent on
the dim of the space)!
• If large margin 𝛾 and if alg. produces a large
w
margin classifier, then amount of data needed 𝛾 ++
depends only on R/𝛾 [Bartlett & Shawe-Taylor ’99]. - 𝛾
+
+ +
Algorithmic Implications - - --- -
- -
Suggests searching for a large margin classifier… SVMs
55

Thank you !

obuleshcse@cvsr.ac.in

You might also like