Developing A Machining Learning Models From Start To Finish.

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 60

Developing a Machining

Learning Models from


Start to Finish..

Dr. A. OBULESU
Assoc. Professor
1

Learn by Doing…
2

A little about me

 Certified By NVDIA and Leadingindia.ai


 In charge- MALAI Club in Association with
Wave Lab
 Convener- MALAI Research Wing
 BoS Member of CSE & AI
... @ Anurag University

1982 – 2020 . . .
3

Agenda

① Key Terms of Machine Learning

② Steps for Designing a Models

③ Machine learning Models

④ Clarifications on Models
4

Key Terms in Machine


Learning
5

What is Machine Learning? P


Ideall
UL B
y I
Field of study that gives "computers the
S H

ability to learn without being explicitly

programmed"

-- Arthur Samuel, 1959


6

Ma
 Machine learning is about predicting the future
chinbased on
the past.
-- Hal Daume III e
Lear
past future
ning ic t
rn e d
lea r
Training model/ Testing
is…
model/
p

Data predictor Data predictor


7

Machine Learning Types


Classification
Classification
Supervised
Regression
Human Unsupervised
Unsupervised
Clustering
Supervision?
Reinforcement Clustering

Machine Learn Online


nline
Learning Incrementally ?
Batch Processing

Instance based
How They Generalize? Modl based
Model Based
8

Supervised learning
label
• Feed The Label Dataset to The Model…

apple

apple

banana
In My Words: Supervised Learning is
banana Learning By Teacher…

Supervised learning: given labeled examples


Training Data 10

Ma
chin
rn e i ct
lea pr
e d
Lear
model/
predictor
ning
is…
Supervised
Learning
11

Supervised learning Contd..

model/ predicted label


predictor

Test Data
12

Unsupervised learning
In My Words: Un-Supervised
Learning is Learning With Out
Teacher…

• Feed The Dataset With out Label to The Model…

Unsupervised learning: given data, i.e. examples, But NO LABELS


13

Reinforcement learning
•Given a sequence of examples/states and a reward after
completing that sequence, learn to predict the action to take in
for an individual example/state
… WIN! In My Words:
Feedback System
… LOSE!

14
Lets us understand with use cases?
P
UL B
I
S H
Collect the data automatically Self driving car on the road Accurate results in Google search

Speech recognition in Mobile

Netflix Movie recommendation Product recommendation


!!! Many…!!!
15

Machine Learning Tools

MLlib - Distributed Simpl


e
16

Steps for Designing a ML


Model
17

1. Define Appropriately the


Problem
• What is the main objective? What are we trying to predict?
• What are the target features?
• What is the input data? Is it available?
• What kind of problem are we facing? Binary classification?
Clustering?
• What is the expected improvement?
18

* Define Appropriately the Problem : Case Study –


Predicting of Cardiovascular Risk
• Objective : Predicting of risk level !!!
• Input data is available from Kaggle, UCI etc.
• This problem is multi-class classification
• Predicting of Accurate class among:
• 0 – No Risk
• 1- Mild Risk
• 2- Acute Risk
• 3- Serious Risk 4 –Very serious Risk
19

2. Collecting Data
This is the first real step towards the real development of a machine learning
model
The more and better data that we get , The better model we will design
There are many sources calling Web scraping like
 Kaggle
 UCI Machine Learning Repository: Created as an ftp archive in 1987 by
David Aha and fellow graduate students at UC Irvine
20

2. Case study :
 Taken files from Kaggle
 features.csv
 target.csv
21

2. Case study :
 Taken files from Kaggle
 features.csv : Features Names
 target.csv : Features Name
22

3. Choose the Measure of Success

• “If you can’t measure it you can’t improve it”.

•Regression problems use certain evaluation


metrics such as mean squared error (MSE).

•Classification problems use evaluation metrics as


precision, accuracy and recall.
23

4. Setting an Evaluation Protocol



 The reason to split data in three parts is to avoid information
leaks.
 The main inconvenient of this method is that if there is little data
available, the validation and test sets will contain so few samples
that the tuning and evaluatation processes of the model will not be
effective
24

4. Setting an Evaluation Protocol


25

4. Setting an Evaluation Protocol : Case study- K-fold


validation

26

5. Preparing the Data



 We should transform our data in a way that can be fed into a Machine

Learning model

 Dealing with missing data

Handling Caterogical Data

 Feature Scaling : Normalization, Standardization

Selecting Meaningful Features : PCA


27

4.1 Over fitting and Under fitting



•The model is starting to underfit: it has learned so not well the

training data. Small Dataset, Simple Model

•The model is starting to overfit: it has learned so well the training

data that has learned patterns that are too specific to training data and

irrelevant to new data.


28

4.1 Over fitting and Under fitting



While training the Model, two issues we should consider

•Optimization is the process of adjusting a model to get the best

performance possible on training data (the learning process).

•Generalization is how well the model performs on unseen data. The

goal is to obtain the best generalization ability.


29

4.1 Over fitting and Under fitting



30

4.1 Two ways to avoid this overfitting


 Getting more data is usually the best solution, a model trained on more data

will naturally generalize better.

 Regularization 

•L1 regularization: The cost is proportional to the absolute the value of the

weights coefficients (L1 norm of the weights).

•L2 regularization: The cost is proportional to the square of the value of the
31

5. Developing a Benchmark model


 To develop a benchamark model that serves us as a baseline

 Benchmarking requires experiments to be comparable,

measurable, and reproducible


32

6. Developing a Better Model & Tunning its Hyper


parameters

 Finding Good model

 A number of folds in which we will split our data.

 A scoring method (that will vary depending on the problem’s

nature — regression, classification…).

 Some appropriate algorithms that we want to check .


33

Machine Learning Models


34

Regression
• Predict future scores on Y based on measured scores on X

 Predictions are based on a correlation from a sample where both
X and Y were measured.
Equation is linear:
y = bx + a
y = predicted score on y
x = measured score on x
b = slope
a = y-intercept
35

Relationship between Variables


1. Relationship Between Variables In a Linear Function

Population Population Slope Random Error


Y-Intercept

Y i   0   1X i  i
Dependent Independent (Explanatory) Variable
(Response) Variable
(e.g., COVID-19.) (e.g., age)
36

Linear Vs. Logistic

 When the prediction of outcome for the given


input is in a range or continuous – Linear regression
 When the prediction of the outcome is in a
discrete form – Logistic Regression
37

Linear Vs. Logistic Regression


38

Logistic regression 
That the probability can not be negative, so we introduce a term called

exponential in our normal regression model to make it logistic regression.

 Since the probability can never be greater than 1, we need to divide our

outcome by something bigger than itself

 Regression formula give us Y using formula Yi = β0 + β1X+ εi.


39

Logistic regression Contd…


 We have to use exponential so that it does not become negative and hence we get P =
exp(β0 + β1X+ εi).
We divide that P by something bigger than itself so that it remains less than one and hence
we get P = e( β0 + β1X+ εi)/e( β0 + β1X+ εi) +1
After doing some calculations that formula in 3rd step can be re-written as
log(p/(1-p)) = β0 + β1X+ εi.
 log(p/(1-p)) is called the odds of probability.
If you look closely it is the probability of desired outcome being true divided by the
probability of desired outcome not being true and this is called LOGIT FUNCTION.
40

Knn Classifier
• Data set:
• Training (labeled) data: T = {(x i , yi)}
• x i ∈ Rp
• Test (unlabeled) data: x0 ∈ Rp
• Tasks:
• Classification: yi ∈ {1, . . . , J}
• Regression: yi ∈ R
• Given new x0 predict y0
• Methods:
• Model-based
• Memory-based
41

Classification

credit risk assessment (source: Alpaydin ...)


42

Regression

source: O’Reilly ...


43

KNN Classifier
• 1 NN
• Predict the same value/class as the nearest instance in
the training set
• k NN
• Find the k closest training points (small ǁxi − x0ǁ
according to some metric, for ex. euclidean,
manhattan, etc.)
• predicted class: majority vote
• Predicted value: average weighted by inverse distance
44

K NN
45

k NN - Example

source: Duda, Hart ...


46

k NN Classification

• Calculate distances of all training vectors to


test vector
• Pick k closest vectors
• Calculate average/majority
47

k NN

• + Easy to understand and program


• + Explicit reject option
• if there is no majority agreement
• + Easy handling of missing values
• restrict distance calculation to
subspace
• + asymptotic misclassification rate (as the number of data
points n → ∞ ) is bounded above by twice the Bayes
error rate. (see Duda, Hart...)
48

k NN

• - affected by local structure


• - sensitive to noise, irrelevant features
• - computationally expensive O(nd)
• - large memory requirements
• - more frequent classes dominate result (if distance not
weighed in)
• - curse of dimensionality: high nr. of dimensions and low nr. of
training samples:
• "nearest" neighbor might be very far
• in high dimensions "nearest" becomes meaningless
49

• Choice of k

• smaller k ⇒ higher variance (less stable)

• larger k ⇒ higher bias (less precise)

• Proper choice of k dependends on the data:


• Adaptive methods, heuristics
• Cross-validation
50

Stochastic Gradient Descent Classifier

• Gradient descent is an iterative algorithm, that starts from


a random point on a function and travels down its slope in
steps until it reaches the lowest point of that function.
Capable of handling large datasets
Deals with training instances independently
Well suited for online training
51

Support Vector Machine


•SVM is to find a hyper plane in an N-dimensional space (N-
Number of features) that distinctly classifies the data points.
52

Support Vector Machine Contd..


53

Geometric Margin
Definition: The margin of example 𝑥 w.r.t. a linear sep. 𝑤 is the distance
from 𝑥 to the plane 𝑤 ⋅ 𝑥 = 0.

Definition: The margin 𝛾𝑤 of a set of examples 𝑆 wrt a linear separator 𝑤


is the smallest margin over points 𝑥 ∈ 𝑆.
Definition: The margin 𝛾 of a set of examples 𝑆 is the maximum
𝛾𝑤 over all linear separators 𝑤 . w
𝛾
- 𝛾 +
+
- +
- + +
- -
-
-
-
54

Margin Important Theme in ML


Both sample complexity and algorithmic implications.
Sample/Mistake Bound complexity:
• If large margin, # mistakes Peceptron makes is small (independent on
the dim of the space)!
• If large margin 𝛾 and if alg. produces a large
w
margin classifier, then amount of data needed 𝛾 ++
depends only on R/𝛾 [Bartlett & Shawe-Taylor ’99]. - 𝛾
+
+ +
Algorithmic Implications - - --- -
- -
Suggests searching for a large margin classifier… SVMs
55

Thank you !

obuleshcse@cvsr.ac.in
7

AI can Achieve

Robotics

Domain Rule Based


Specific Expert
Artificial Intelligence System
Computing

Machine
Learning
DL
12

Classification
Classification : Predicting of Discrete Value .i.e. Apple or Banana or 0 or 1

Applications
• Face recognition
• Character recognition ,
• Spam detection,
• Medical diagnosis: From symptoms to illnesses,
• Biometrics: Recognition/authentication using physical and/or behavioral
characteristics: Face, iris, signature, etc..
14

Supervised learning: Regression


label
Regression : Predicting label is
-4.5
real-valued
10.1

3.2

4.3

Supervised learning: given labeled examples


13

Regression Applications
• Economics/Finance: predict the value of a stock

• Epidemiology

• Car/plane navigation: angle of the steering wheel,


acceleration, …

• Temporal trends: weather over time


•…
16

Unsupervised learning
• Learn clusters/groups without any label

•Customer segmentation (i.e. grouping)

•Image compression

•Bioinformatics: learn motifs,


• …..

You might also like