Faculty Development Programme On (Sponsored by DBT) : Bioinspired Machine Learning

FACULTY DEVELOPMENT PROGRAMME ON
(SPONSORED BY DBT)
BIOINSPIRED MACHINE LEARNING
PRESENTED BY
Dr. K. MEENA
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
SCHOOL OF COMPUTING
VELTECH RANHARAJAN DR SAGUNTHALA R&D INSTITUTE OF SCIENCE
AND TECHNOLOGY
OUTLINE
•Introduction to Machine Learning
3/5/22
•Applications
•Machine Learning Types
•Guidelines for Designing ML experiments
•ANN, KNN (Supervised learning)
•Hierarchical (Unsupervised)
2
•Practical demo
ARTIFICIAL INTELLIGENCE, MACHINE
LEARNING & DEEP LEARNING
3/5/22
3
3/5/22
4
3/5/22
5
3/5/22
6
3/5/22
7
WHAT IS MACHINE LEARNING
3/5/22
8
MACHINE LEARNING IS…
3/5/22
9
3/5/22
10
3/5/22
11
3/5/22
12
3/5/22
13
3/5/22
14
ML TYPES
3/5/22
15
LEARNING SYSTEM MODEL
3/5/22
16
SUPERVISED LEARNING ALGORITHM
3/5/22
17
SUPERVISED LEARNING CONTD…
 Training data include the desired outputs.
3/5/22
 where the algorithm generates a function that maps
inputs to desired outputs.
 Example - classification problem: the learner is required

to learn a function which maps a vector into one of
several classes by looking at several
 input-output examples of the function.

18
SUPERVISED LEARNING CONTD…
3/5/22
19
SUPERVISED LEARNING ALGORITHM
3/5/22
• Linear Regression
• Nearest Neighbor
• Artificial Neural Network (ANN)
• Gaussian Naive Bayes
• Decision Trees
• Support Vector Machine (SVM) 20
• Random Forest
3/5/22
21
UNSUPERVISED LEARNING CONTD…
3/5/22
22
CLUSTERING
3/5/22
23
UNSUPERVISED LEARNING
 Training data do not include the desired outputs.
3/5/22
 Clustering is an unsupervised learning task.
 There is no target value to shoot for.
 Identify groups of “similar” data points, that are
“dissimilar” from others.
 Partition the data into groups (clusters) that satisfy these
constraints
 Points in the same cluster should be similar.
 Points in different clusters should be dissimilar
24
UNSUPERVISED LEARNING ALGORITHM
• k-means clustering
3/5/22
• Association Rules
• Hierarchical Clustering
25
DIFFERENCES:
 Supervised learning: discover patterns in the data that relate
3/5/22
data attributes with a target (class) attribute.
 These patterns are then utilized to predict the values of the

target attribute in future data instances.
 Unsupervised learning: The data have no target attribute.
 We want to explore the data to find some intrinsic structures in

them.
26
SEMI-SUPERVISED LEARNING
 Training data includes a few desired outputs
3/5/22
 Unlabeled data - when used in conjunction with a small amount of
labeled data, can produce considerable improvement in learning
accuracy.
 Labeling is expensive and difficult
 Labeling is unreliable
 Ex. Segmentation applications
 Need for multiple experts.

27
3/5/22
28
REINFORCEMENT LEARNING
 Rewards from sequence of actions
3/5/22
 Decision making (robot, chess machine)
 Learn action to maximize payoff
 Not much information in a payoff signal
 Payoff is often delayed
 learn from reinforcement or (occasional) rewards --- most general
form of learning
 We only get feedback in the form of how well we are doing
(not what we should be doing)
 No supervised output but delayed reward
29
REINFORCEMENT ALGORITHM
3/5/22
30
REINFORCEMENT LEARNING
CONTD…
3/5/22
 Receive rewards from sequential actions.
 Learns a policy of how to act given an observation of the
world.
 Every action has some impact in the environment
 Environment provides feedback that guides the learning
algorithm.
31
3/5/22
32

GUIDELINES FOR DESIGNING ML
1. Aim of the StudyEXPERIMENTS
3/5/22
 Objective
 Expected error
2. Selection of the Response Variable

 Performance matrix
 Confusion matrix
 recall
 Accuracy 33
 precision

CONFUSION MATRIX
3/5/22
34

CONFUSION MATRIX
3/5/22
35

CONFUSION MATRIX : EXAMPLE
3/5/22
36

ACCURACY
3/5/22
37

PRECISION
3/5/22
38
3/5/22
39

RECEIVER OPERATING CHACTERISTIC
3/5/22
40

RECALL
3/5/22
41
3/5/22
42

GUIDELINES FOR DESIGNING ML …

3. Choice of Factors and Levels
4. Choice of Experimental Design
3/5/22
5. Performing the Experiment
6. Statistical Analysis of the Data
43
7. Conclusions and Recommendations

APPLICATIONS OF MACHINE LEARNING
3/5/22
45
MACHINE LEARNING IN COMPUTER
SCIENCE
3/5/22
Speech/ Planning
Audio Locomotion
Processing
Vision/
Natural
Language
Processing
Machine Image
Processing
Learning
Biomedical/ Financial
Chemical Modeling
Informatics
Human
Computer Analytics
Interaction
46
Sample Applications
• Web Search
• Computational Biology
3/5/22
• Finance
• E-commerce
• Space Exploration
• Robotics
• Information Extraction
• Social Networks
• Debugging Software 47
• [Your Favorite Area]

SUCCESSFUL APPLICATIONS OF ML
Learning to recognize spoken words - SPHINX (Lee 1989)
3/5/22

 Learning to drive an autonomous vehicle - ALVINN (Pomerleau 1989)

 Learning to classify celestial objects - (Fayyad et al 1995)
 Learning to play world-class backgammon - TD-GAMMON (Tesauro 1992)
 Designing the morphology and control structure of electro-mechanical artifacts -
GOLEM (Lipton, Pollock 2000)
48
MACHINE LEARNING APPLICATIONS
 Web search
3/5/22
 Computational biology
 Finance
 E-commerce
 Space exploration
 Robotics
 Information extraction
 Social networks
49
 Computer vision and robotics:
3/5/22
 detection, recognition and categorization of objects
 face recognition
 tracking objects (rigid and articulated) in video modeling
visual attention
 Speech recognition
 Information retrieval, Web search, Google ads...
50
 Biology and medicine:
3/5/22
 drugdiscovery
 computational genomics (analysis and design)
 medical imaging and diagnosis.
 Financial industry:
 Fraud detection
 Credit approval
 Price and market prediction
51
 Automate employee access granting and revocation
3/5/22
 Amazon using its large dataset of employee roles and
employee access levels - Machine Learning algorithm that
will predict which employees should be granted access to
what resources
 to minimize the human involvement required to grant or

revoke employee access
52
3/5/22
 Protecting Animals
 Cornell University – algorithm
 to identify whales in the ocean based on audio recordings so that
ships can avoid hitting them.
 Oregon State University - algorithm
 that will determine which bird species is/are on a given audio
recording collected in field conditions.
53
 Identifying Heart Failure -
 machine learning algorithm that combs through physicians free-
3/5/22
form text notes (in the electronic health records) and synthesize the
text using Natural Language Processing (NLP)- similar to a
cardiologist can read through another physician’s notes and figure
out the same
 Predicting Hospital Re-admissions -Additive Analytics -

predictive model
 thatidentifies which patients are at high risk of readmission- can
predict emergency room admissions before they happen—
improving care outcomes and reducing costs.
54
3/5/22
55
03/05/2022
ALGORITHMS: K NEAREST NEIGHBORS
Dr K Meena
56
SIMPLE ANALOGY..
• Tell me about your friends(who your neighbors
are) and I will tell you who you are.
Dr K Meena 03/05/2022 57
INSTANCE-BASED LEARNING
Its very similar to a

Desktop!!
Dr K Meena 03/05/2022 58
KNN – DIFFERENT NAMES
• K-Nearest Neighbors
• Memory-Based Reasoning
• Example-Based Reasoning
• Instance-Based Learning
• Lazy Learning
Dr K Meena 03/05/2022 59
WHAT IS KNN?
• A powerful classification algorithm used in pattern

recognition.
• K nearest neighbors stores all available cases and

classifies new cases based on a similarity measure(e.g
distance function)
• One of the top data mining algorithms used today.
• A non-parametric lazy learning algorithm (An Instance-

based Learning method).
Dr K Meena 03/05/2022 60
KNN: CLASSIFICATION APPROACH
• An object (a new instance) is classified by a

majority votes for its neighbor classes.
• The object is assigned to the most common class
amongst its K nearest neighbors.(measured by a
distant function )
Dr K Meena 03/05/2022 61
Knn: Classification Approach
03/05/2022
Dr K Meena
62
DISTANCE MEASURE
Compute
Distance
Test
Record
Training
Records Choose k of the
“nearest” records
Dr K Meena 03/05/2022 63
03/05/2022 Dr K Meena
64
DISTANCE MEASURE
DISTANCE BETWEEN NEIGHBORS
• Calculate the distance between new example

(E) and all examples in the training set.
• Euclidean distance between two examples.

– X = [x1,x2,x3,..,xn]
– Y = [y1,y2,y3,...,yn]
– The Euclidean distance between X and Y is defined as
D( X ,Y )  (x  y )
ni i
2
 i1
Dr K Meena 11
03/05/2022 6
5
K-NEAREST NEIGHBOR
ALGORITHM
• All the instances correspond to points in an n-dimensional
feature space.
• Each instance is represented with a set of numerical

attributes.
• Each of the training data consists of a set of vectors and a

class label associated with each vector.
• Classification is done by comparing feature vectors of

different K nearest points.
• Select the K-nearest examples to E in the training set.
• Assign E to the most common class among its K-

• nearest neighbors. Dr K Meena 03/05/2022 66
3-KNN: EXAMPLE(1)
sqrt [(35-37)2+(35-50)2 +(3-

2)2]=15.16
sqrt [(22-37)2+(50-50)2 +(2-

2)2]=15
sqrt [(63-37)2+(200-50)2 +(1-
2)2]=152.23
sqrt [(59-37)2+(170-50)2 +(1-
2)2]=122
sqrt [(25-37)2+(40-50)2 +(4-
2)2]=15.74
? YES
Dr K Meena 03/05/2022 67
HOW TO CHOOSE K?
• If K is too small it is sensitive to noise points.
• Larger K works well. But too large K may include majority

points from other classes.
• Rule of thumb is K < sqrt(n), n is number of examples.
Dr K Meena 03/05/2022 68
03/05/2022 Dr K Meena
69
03/05/2022
Dr K Meena
X X X
(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor
K-nearest neighbors of a record x are data points

that have the k smallest distance to x 70
STRENGTHS OF KNN
• Very simple and intuitive.
• Can be applied to the data from any distribution.
• Good classification if the number of samples is large enough.
WEAKNESSES OF KNN
• Takes more time to classify a new example

• ``need to calculate and compare distance from new example
to all other examples.
• Choosing k may be tricky.
• Need large number of samples for accuracy.
Dr K Meena 03/05/2022 71

Faculty Development Programme On (Sponsored by DBT) : Bioinspired Machine Learning

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Faculty Development Programme On (Sponsored by DBT) : Bioinspired Machine Learning

Uploaded by

Copyright:

Available Formats

FACULTY DEVELOPMENT PROGRAMME ON

•Machine Learning Types

•Guidelines for Designing ML experiments

•ANN, KNN (Supervised learning)

 Example - classification problem: the learner is required

 input-output examples of the function.

 These patterns are then utilized to predict the values of the

 Unsupervised learning: The data have no target attribute.

 We want to explore the data to find some intrinsic structures in

labeled data, can produce considerable improvement in learning

 Labeling is expensive and difficult

 Ex. Segmentation applications

 Need for multiple experts.

 Rewards from sequence of actions

GUIDELINES FOR DESIGNING ML

1. Aim of the StudyEXPERIMENTS

2. Selection of the Response Variable

CONFUSION MATRIX : EXAMPLE

RECEIVER OPERATING CHACTERISTIC

GUIDELINES FOR DESIGNING ML …

7. Conclusions and Recommendations

• [Your Favorite Area]

 Learning to drive an autonomous vehicle - ALVINN (Pomerleau 1989)

 to minimize the human involvement required to grant or

 Predicting Hospital Re-admissions -Additive Analytics -

Its very similar to a

• A powerful classification algorithm used in pattern

• K nearest neighbors stores all available cases and

• One of the top data mining algorithms used today.

• A non-parametric lazy learning algorithm (An Instance-

• An object (a new instance) is classified by a

• Calculate the distance between new example

• Euclidean distance between two examples.

• Each instance is represented with a set of numerical

• Each of the training data consists of a set of vectors and a

• Classification is done by comparing feature vectors of

• Select the K-nearest examples to E in the training set.

• Assign E to the most common class among its K-

sqrt [(35-37)2+(35-50)2 +(3-

sqrt [(22-37)2+(50-50)2 +(2-

• Larger K works well. But too large K may include majority

• Rule of thumb is K < sqrt(n), n is number of examples.

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points

• Takes more time to classify a new example

You might also like