Professional Documents
Culture Documents
ML 2021-22 Session 1 & 2
ML 2021-22 Session 1 & 2
20CS3020AA – 3-0-4-4
III-I CSE-H
1
Machine Learning
2
What is Machine Learning?
Learn from Learn from
experience experience Follow
Instructions
Experience == Data
3
Machine Learning
Machine learning is the systematic study of
algorithms and systems that improve their
knowledge or performance with experience.
4
Need for Machine Learning
5
Examples
6
Application of Machine Learning
7
Applications of Machine Learning
8
CO-1: Syllabus &Course Outcome
CO1: Introduction: Learning, Types of Machine Learning,
Supervised Learning: The Machine Learning Process,
Performance Measures, The Bias-Variance Tradeoff,
Learning with Trees: Using Decision Trees, Constructing
Decision Trees, Classification and Regression Trees (CART),
Turning Data into Probabilities: The Naïve Bayes’ Classifier,
Bayesian Networks. The EM Algorithm: Estimate Means of K
Gaussians, General Statement of EM Algorithm.
14
In Semester Formative Evaluation
Evaluation Evaluation Duration
Type Component Weightage/Marks (Hours) CO1 CO2 CO3 CO4 CO5
1.87 1.87
Global Weightage 7.5 1.875 1.875
90 5 5
Challenges
Max Marks 100 25 25 25 25
Weightage 7.5 7.5
Continuous
In Semester Evaluation - Max Marks 100
120
100
Formative Lab Exercise
Evaluation
Total= 30 % MOOCs Weightage 7.5 1.875 1.875 1.87 1.87
90 5 5
Review
Max Marks 100 25 25 25 25
Skilling Weightage 7.5 7.5
Continuous 120
Evaluation Max Marks 100 100
15
In Semester Summative Evaluation
16
Text Books
• Stephen Marsland, “Machine Learning an
Algorithmic Perspective”, CRC Press, (2009).
17
Reference Books :
• 1. Peter Harrington, “Machine Learning in Action”, Manning
Publications
• 2. Ethem lpaydin, “Introduction to Machine Learning”, The
MIT Press, (2010).
• 3. Programming Python by Mark Lutz, O'Reilly
• 4. Chun, J Wesley, Core Python Programming, 2 nd Edition
Pearson 2007 Reprint
Web Links :
• 1. Data Science and Machine Learning:
https://www.edx.org/course/data-science-machinelearning
• 2. Machine Learning: https://www.ocw.mit.edu/courses/6-
867-machine-learning-fall-2006/
18
TYPES OF MACHINE LEARNING
TECHNIQUES
19
Supervised learning
• Supervised learning A training set of
examples with the correct responses
(targets) is provided and, based on
this training set, the algorithm
generalises to respond correctly to all
possible inputs. This is also called
learning from exemplars.
20
SUPERVISED LEARNING
21
Unsupervised learning
• In Unsupervised Learning, Correct responses
are not provided.
• The algorithm tries to identify similarities
between the inputs so that inputs that have
something in common are categorised
together.
• The statistical approach to unsupervised
learning is known as density estimation.
22
UNSUPERVISED LEARNING
23
Reinforcement learning
This is somewhere between supervised and
unsupervised learning.
The algorithm gets told when the answer is wrong,
but does not get told how to correct it. It has to
explore and try out different possibilities until it
works out how to get the answer right.
Reinforcement learning is sometime called learning
with a critic because of this monitor that scores the
answer, but does not suggest improvements.
24
REINFORCEMENT LEARNING
25
Evolutionary learning
Biological evolution can be seen as a learning
process: biological organisms adapt to improve
their survival rates and chance of having
offspring in their environment. We can model
this in a computer, using an idea of fitness,
which corresponds to a score for how good the
current solution is.
26
Evolutionary learning
27
SUPERVISED LEARNING
• There is a set of data (the training data) that
consists of a set of input data that has target
data which is the answer that the algorithm
should produce, attached.
• This is usually written as a set of data (xi , ti),
where the inputs are xi , the targets are ti, and
the i index suggests that we have lots of pieces
of data, indexed by i running from 1 to some
upper limit N.
28
Regression
• regression problem in statistics: fit a
mathematical function describing a curve, so
that the curve passes as close as possible to all
of the datapoints. It is generally a problem of
function approximation or interpolation,
working out the value between values that we
know. The problem is how to work out
29
Regression
• Suppose that we have to tell the value of the
output (which we will call y since it is not a
target datapoint) when x = 0.44.
30
Regression
33
Example : Coin Classifier
35
MACHINE LEARNING STEPS
36
Semi-Supervised Learning
• The area of semi-supervised learning attempts to
deal with the need for large amounts of labelled
data; Initially, it is a combination of both supervised
and unsupervised learning.
• When the available data contains small amounts of
labeled data and large quantities of unlabeled data.
• The procedure starts with clustering the similar
data, then depending on the labeled data available
in each cluster, it will provide labels for the
unlabeled data.
37
38
THE MACHINE LEARNING PROCESS
• Data Collection and Preparation
• Feature Selection
• Algorithm Choice
• Parameter and Model Selection
• Training
• Evaluation
39
TERMINOLOGY
• Inputs: An input vector is the data given as one input to
the algorithm. Written as x, with elements xi , where i
runs from 1 to the number of input dimensions, m.
45
Performance-Measures: Confusion Matrix
46
Bias - Variance
• Bias and variance are used in supervised machine
learning, in which an algorithm learns from training
data or a sample data set of known quantities.
• The correct balance of bias and variance is vital to
building machine-learning algorithms that create
accurate results from their models.
• Bias is the amount that a model’s prediction differs
from the target value, compared to the training data.
• Variance describes how much a random variable
differs from its expected value
47
Underfitting/Overfitting
• Any ML model is said to be underfitting if it cannot
capture the underlying trend of the data, i.e., the model
performs well on training data but performs poorly on
testing data.
• Reasons for Underfitting:
• High bias and low variance
• The size of the training dataset used is not enough.
• The model is too simple.
• Training data is not cleaned and also contains noise in it.
48
Overfitting
• if we train for too long, then we will overfit
the data, When a model gets trained with
large collection of data for a longer, it starts
learning from the noise and inaccurate data
entries from the data set.
• And when testing with test data results in High
variance. Therefore, the model that we learn
will be much too complicated, and won’t be
able to generalise.
49
Underfitting/Overfitting
Overfitting
Underfitting
50
Good Balance
THE BIAS-VARIANCE TRADEOFF
51