Download as pdf or txt
Download as pdf or txt
You are on page 1of 118

FUNDAMENTALS OF MACHINE

LEARNING (ML)
HOW TO BUILD AND EVALUATE A MACHINE LEARNING MODEL: COMMON STEPS"

1
Dr. Ahmed Elngar
Faculty of Computers and Artificial Intelligence,
Beni-Suef University , Egypt
elngar_7@yahoo.co.uk
OUTLINES OF THE COURSE
• Introduction to Machine Learning
1
• Concepts of Learning and its process
2
• Types of Learning and Machine learning methods
3
• Model Building
4
• Evaluation
5
• Applications & Current trends in machine learning
6
2
REFERENCES
 TEXT BOOKS:
1) Ethem Alpaydin, ”Introduction to Machine Learning”, MIT Press,
Prentice Hall of India, 3rd Edition2014.
2) Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar ” Foundations
of Machine Learning”, MIT Press,2012.
3) MACHINE LEARNING - An Algorithmic Perspective, Second Edition,
Stephen Marsland, 2015.
4) Tom Mitchell, “Machine Learning”, McGraw Hill, 3rdEdition, 1997.

REFERENCE BOOKS:
1) CharuC.Aggarwal,“Data Classification Algorithms and Applications”,
CRCPress, 2014.
2) Charu C. Aggarwal, “DATA CLUSTERING Algorithms and
Applications”, CRC Press, 2014.
3) Kevin P. Murphy ”Machine Learning: A Probabilistic Perspective”, The
MIT Press, 2012
4) Jiawei Han and Micheline Kambers and JianPei, “Data Mining Concepts
andTechniques”,3rd edition, Morgan Kaufman Publications, 2012. 3
4
OBJECTIVES OF LEARNING ML?
 The main objectives of learning machine learning is to
know and understand:
 What are the different types in machine learning?
 What are the different algorithms available for developing
machine learning models?
 What tools are available for developing these models?
 What are the programming language choices?
 What platforms support development and deployment of
Machine Learning applications?
 What IDEs (Integrated Development Environment) are
available?
 How to quickly upgrade your skills in this important area? 5
1

INTRODUCTION TO MACHINE LEARNING

6
AI VS. MACHINE LEARNING – “LEARN”
 Machine learning (ML) is a subset of artificial
intelligence (AI), that is all about getting an AI to
accomplish tasks without being given specific
instructions. In essence, it’s about teaching
machines how to learn!

7
AI VS. MACHINE LEARNING – “LEARNING”
 AI is simulated human cognition, that is supposed to do
via learning!
 What is learning?
 Were we born with PhD level intelligence?
 Of course not!
At the beginning of our lives, we have little
understanding of the world around us, but over
time we grow to learn a lot. We use our senses to
take in data, and learn via a combination of interacting
with the world around us, being explicitly taught
certain things by others, finding patterns over time,
and, of course, lots of trial-and-error
 “Learning is any process by which a system improves
performance from experience.” 8
- Herbert Simon
AI VS. MACHINE LEARNING – “LEARNING”
 AI learns in a similar way. When it’s first created, an
AI knows nothing; ML gives AI the ability to learn
about its world.
 AI is all about allowing a system to learn from
examples rather than instructions. ML is what
makes that possible.

9
AI VS. MACHINE LEARNING – “LEARNING”
 AIs are taught, not explicitly programmed. In other words,
instead of spelling out specific rules to solve a problem, we
give them examples of what they will encounter in the real
world and let them find the patterns themselves. Allowing
machines to find patterns is beneficial over spelling out the
instructions when the instructions are hard or unknown or
when the data has many different variables, for example
treating cancer, predicting the stock market.

10
WHAT IS MACHINE LEARNING?
 Arthur Samuel (at IBM )- coined the term “Machine
Learning” in 1959 at the first time .
 He defined machine learning as:

“the field of study that gives computers the ability to


learn without being explicitly programmed.”

 No universally accepted definition for ML.


 Different authors define the term differently.

11
DEFINITION OF ML
 Machine learning is a branch of artificial intelligence (AI) and
computer science which focuses on the use of data and algorithms to
imitate the way that humans learn, gradually improving its accuracy.
 Machine learning is programming computers to optimize a
performance criterion using example data or past experience.
 Machine learning (ML) is defined as a discipline of artificial intelligence
(AI) that provides machines the ability to automatically learn from data
and past experiences to identify patterns and make predictions with
minimal human intervention.
 ML creates a model defined up to some parameters, and learning is
the execution of a computer program to optimize the parameters of the
model using the training data or past experience. The model may be
predictive to make predictions in the future, or descriptive to gain
knowledge from data, or both.
 A model:
 is a compressed version of a database;
 extracts knowledge from it;
12
 does not have perfect performance but is a useful approximation to the
data.
DEFINITION OF ML?

 Definition by Tom Mitchell (1998):


Machine Learning is the study of algorithms that
 improve their performance P
 at some task T
 with experience E.

A well-defined learning task is given by <P,T,E>

 A computer program which learns from experience is


called a machine learning program or simply a learning
program. Such a program is sometimes also referred to
as a learner
13
WHAT IS MACHINE LEARNING?- WHY “MACHINE LEARNING”?
1. “Big data” - models are based on huge amounts of data which is
being produced and stored continuously
 science: genomics, astronomy, materials science, particle accelerators. . .
 sensor networks: weather measurements, traffic. . .
 people: social networks, blogs, mobile phones, purchases, bank
transactions. . . etc
2. Data is not random; it contains structure that can be used to
predict outcomes, or gain knowledge in some way.
 Ex: patterns of Amazon purchases can be used to recommend items.
3. It is more difficult to design algorithms for such tasks
(compared to, say, sorting an array or calculating a payroll). Such
algorithms need data.
 Ex: construct a spam filter, using a collection of email messages labelled as
spam/not spam.
4. Learning isn’t always useful:
 There is no need to “learn” to calculate payroll
5. Data mining – extracting useful knowledge/insights from data
 Ex: Data mining is designed to extract the rules via the application of 14
ML methods from large databases.
WHAT IS MACHINE LEARNING?- WHY “MACHINE LEARNING”?

 Example #1:
 A classic example of a task that requires machine
learning: It is very hard to say what makes a 2

15
WHAT IS MACHINE LEARNING?- WHY “MACHINE LEARNING”?
 Example #2: House price prediction

 After plotting various data points on the XY plot, we draw a best-fit line
to do our predictions for any other house given its size. You will feed
the known data to the machine and ask it to find the best fit line. Once
the best fit line is found by the machine, you will test its suitability by
feeding in a known house size, i.e. the Y-value in the above curve. The 16
machine will now return the estimated X-value, i.e. the expected price
of the house.
WHAT IS MACHINE LEARNING?- WHY “MACHINE LEARNING”?

 Example #3:
Some more examples of tasks that are best solved by
using a learning algorithm
 Recognizing patterns:
 Facial identities or facial expressions
 Handwritten or spoken words
 Medical images
 Generating patterns:
 Generating images or motion sequences
 Recognizing anomalies:
 Unusual credit card transactions
 Unusual patterns of sensor readings in a nuclear power plant
 Prediction:
17
 Future stock prices or currency exchange rates
WHAT IS MACHINE LEARNING?- THE MACHINE LEARNING PROCESS

18
WHAT IS MACHINE LEARNING?- BASIC MACHINE LEARNING STEPS

19
WHAT IS MACHINE LEARNING?- TRADITIONAL
PROGRAMMING VS. MACHINE LEARNING

20
WHAT IS MACHINE LEARNING?- WHEN DO WE USE
MACHINE LEARNING?

 ML is used when:
 Human expertise does not exist (navigating on Mars)
 Humans can’t explain their expertise (speech recognition)
 Models must be customized (personalized medicine)
 Solution needs to be adapted to particular cases (user biometrics)
 Models are based on huge amounts of data (genomics)
 Solution changes in time (routing on a computer network)

21
STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
 Autonomous cars

22
STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
 Autonomous car sensors

23
STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
 Autonomous car technologies

24
STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
 Deep learning emergence

25
STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
 Deep Belief Net on Face Images

26
STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
 Learning of Object Parts

27
STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
 Training on Multiple Objects

28
STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
 Automatic Speech recognition systems

29
STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
 Speech technologies

30
HISTORY OF ML
 1950s
– Samuel’s checker player
– Selfridge’s Pandemonium
 1960s:
– Neural networks: Perceptron
– Pattern recognition
– Learning in the limit theory
– Minsky and Papert prove limitationsof Perceptron
 1970s:
– Symbolic concept induction
– Winston’s arch learner
– Expert systems and the knowledge acquisition bottleneck
– Quinlan’s D3
– Michalski’s AQ and soybean diagnosis
– Scientific discovery with BACON
– Mathematical discovery with AM 31
HISTORY OF ML…
 1980s:
– Advanced decision tree and rule learning
– Explanation-based Learning (EBL)
– Learning and planning and problem solving
– Utility problem
– Analogy
– Cognitive architectures
– Resurgence of neural networks (connectionism, backpropagation)
– Valiant’s PAC Learning Theory
– Focus on experimental methodology
 1990s
– Data mining
– Adaptive software agents and web applications
– Text learning
– Reinforcement learning (RL)
– Inductive Logic Programming (ILP)
– Ensembles: Bagging, Boosting, and Stacking
– Bayes Net learning 32
HISTORY OF ML…
 2000s
– Support vector machines & kernel methods
– Graphical models
– Statistical relational learning
– Transfer learning
– Sequence labeling
– Collective classification and structured outputs
– Computer Systems Applications (Compilers, Debugging, Graphics,
Security)
– E-mail management
– Personalized assistants that learn
– Learning in robotics and vision
 2010s
– Deep learning systems
– Learning for big data
– Bayesian methods
– Multi-task & lifelong learning
– Applications to vision, speech, social networks, learning to read, etc 33
– ???
APPLICATION OF MACHINE LEARNING
 The following is a list of some of the typical applications of machine learning.
1. In retail business, machine learning is used to study consumer behaviour.
2. In finance, banks analyze their past data to build models to use in credit
applications, fraud detection, and the stock market.
3. In manufacturing, learning models are used for optimization, control, and
troubleshooting. 3
4. In medicine, learning programs are used for medical diagnosis.
5. In telecommunications, call patterns are analyzed for network optimization and
maximizing the quality of service.
6. In science, large amounts of data in physics, astronomy, and biology can only be
analyzed fast enough by computers. The World Wide Web is huge; it is constantly
growing and searching for relevant information cannot be done manually.
7. In artificial intelligence, it is used to teach a system to learn and adapt to
changes so that the system designer need not foresee and provide solutions for all
possible situations.
8. It is used to find solutions to many problems in vision, speech recognition, and
robotics.
9. Machine learning methods are applied in the design of computer-controlled
vehicles to steer correctly when driving on a variety of roads. 34
10. Machine learning methods have been used to develop programmes for playing
games such as chess, backgammon and Go
CHAPTER SUMMARY
 Learning can be viewed as using direct or indirect
experience to approximate a chosen target function.
 Learning general models from a data of particular examples
 Data is cheap and abundant (data warehouses, data
marts); knowledge is expensive and scarce.
 Example in retail: Customer transactions to consumer
behavior:
People who bought “Da Vinci Code” also bought “The Five
People You Meet in Heaven” (www.amazon.com)
 Machine Learning builds a model that is a good and
useful approximation to the data.
35
2

LEARNING CONCEPTS AND PROCESSES

36
LEARNING

 Definition
 A computer program is said to learn from experience E
with respect to some class of tasks T and performance
measure P, if its performance at tasks T, as measured by
P, improves with experience E.

37
LEARNING …
 Examples: defining a learning task
I. Handwriting recognition learning problem
 T: Recognizing and classifying handwritten words within images
 P: Percent of words correctly classified
 E: A dataset of handwritten words with given classifications
II. A robot driving learning problem
 T: Driving on highways using vision sensors
 P: Average distance traveled before an error
 E: A sequence of images and steering commands recorded
while observing a human driver
III. A chess learning problem
 T: Playing chess
 P: Percent of games won against opponents
 E: Playing practice games against itself.
IV. Spam filtering
 T: Categorize email messages as spam or legitimate.
 P: Percentage of email messages correctly classified.
 E: Database of emails, some with human-given label 38
COMPONENTS OF LEARNING
 Basic components of learning process
 The learning process, whether by a human or a machine, can
be divided into four components, namely, data storage,
abstraction, generalization and evaluation

Fig. Components of the learning process


39
COMPONENTS OF LEARNING PROCESS

 Data storage (1)


 Facilities for storing and retrieving huge amounts of data are
an important component of the learning process. Humans and
computers alike utilize data storage as a foundation for
advanced reasoning.
 In a human being, the data is stored in the brain and data is retrieved
using electrochemical signals.
 Computers use hard disk drives, flash memory, random access

memory and similar devices to store data and use cables and other
technology to retrieve data.

40
COMPONENTS OF LEARNING PROCESS …
 Abstraction (2)
 The second component of the learning process is known as
abstraction.
 Abstraction is the process of extracting knowledge about stored
data. This involves creating general concepts about the data as a
whole. The creation of knowledge involves application of known
models and creation of new models.
 The process of fitting a model to a dataset is known as training.
When the model has been trained, the data is transformed into an
abstract form that summarizes the original information

41
COMPONENTS OF LEARNING PROCESS …
 Generalization (3)
 The third component of the learning process is known as
generalization.
 The term generalization describes the process of turning the
knowledge about stored data into a form that can be utilized for
future action.
 These actions are to be carried out on tasks that are similar, but
not identical, to those what have been seen before.
 In generalization, the goal is to discover those properties of the
data that will be most relevant to future tasks

42
COMPONENTS OF LEARNING PROCESS …
 Evaluation (4)
 Evaluation is the last component of the learning process. It is the
process of giving feedback to the user to measure the utility of the
learned knowledge.
 This feedback is then utilized to effect improvements in the whole
learning process

43
LEARNING MODELS
 Machine learning is concerned with using the right features to
build the right models that achieve the right tasks.
 For a given problem, the collection of all possible outcomes
represents the sample space or instance space.
 The basic idea of Learning models has divided into three
categories.
 Using a Logical expression. (Logical models)
 Using the Geometry of the instance space. (Geometric models)
 Using Probability to classify the instance space. (Probabilistic
models)
 Grouping and Grading (an orthogonal categorization to
geometric-probabilistic-logical-compositional) 44
LEARNING MODELS : LOGICAL MODELS
 Logical models use a logical expression to divide the instance
space into segments and hence construct grouping models.
 A logical expression is an expression that returns a Boolean
value, i.e., a True or False outcome.
 Once the data is grouped using a logical expression, the data
is divided into homogeneous groupings for the problem we
are trying to solve.
 For example, for a classification problem, all the instances in
the group belong to one class.

45
LEARNING MODELS : LOGICAL MODELS …
 There are mainly two kinds of logical models: Tree models
and Rule models.
 Rule models consist of a collection of implications or IF-THEN
rules.
 For tree-based models, the ‘if-part’ defines a segment and the
‘then-part’ defines the behaviour of the model for this segment.
Rule models follow the same reasoning.
 logical models, such as decision trees, a logical expression is
used to partition the instance space. Two instances are similar
when they end up in the same logical segment.

46
LEARNING MODELS : LOGICAL MODELS …
 Example:

 “Enjoy Sport” as shown above is defined by a set of data from some example days. Each data is
described by six attributes. The task is to learn to predict the value of Enjoy Sport for an
arbitrary day based on the values of its attribute values. The problem can be represented by a
series of hypotheses. Each hypothesis is described by a conjunction of constraints on the
attributes. The training data represents a set of positive and negative examples of the target
function. In the example above, each hypothesis is a vector of six constraints, specifying the
values of the six attributes – Sky, AirTemp, Humidity, Wind, Water, and Forecast. The training
phase involves learning the set of days (as a conjunction of attributes) for which Enjoy Sport =
yes.
 Thus, the problem can be formulated as:
 Given instances X which represent a set of all possible days, each described by the attributes:
 o Sky – (values: Sunny, Cloudy, Rainy),
 o AirTemp – (values: Warm, Cold),
 o Humidity – (values: Normal, High),
 o Wind – (values: Strong, Weak),
47
 o Water – (values: Warm, Cold),
 o Forecast – (values: Same, Change).
 Q. Try to identify a function that can predict the target variable Enjoy Sport as yes/no, i.e., 1 or 0.
LEARNING MODELS : GEOMETRIC MODELS …
 In Geometric models, features could be described as points in
two dimensions (x- and y-axis) or a three-dimensional space
(x, y, and z).
 for example, temperature as a function of time can be modelled in
two axes
 In geometric models, there are two ways we could impose
similarity.
 We could use geometric concepts like lines or planes to segment
(classify) the instance space. These are called Linear models.
 Alternatively, we can use the geometric notion of distance to
represent similarity. In this case, if two points are close together,
they have similar values for features and thus can be classed as
similar. We call such models as Distance-based models.
48
LEARNING MODELS : GEOMETRIC MODELS
 Linear models
 Linear models are relatively simple. In this case, the function is
represented as a linear combination of its inputs.
 In the simplest case where f(x) represents a straight line, we have
an equation of the form f (x) = mx + c where c represents the
intercept and m represents the slope.

 Linear models are parametric, which means


that they have a fixed form with a small
number of numeric parameters that need to be
learned from data. For example, in f (x) = mx
+ c, m and c are the parameters that we are
trying to learn from the data. This technique is
different from tree or rule models, where the
structure of the model (e.g., which features to
use in the tree, and where) is not fixed in
49
advance.
LEARNING MODELS : GEOMETRIC MODELS
 Distance-based models
 As the name implies, distance-based models work on the concept of
distance. In the context of Machine learning, the concept of distance is
not based on merely the physical distance between two points.
 The distance metrics commonly used are Euclidean & Manhattan
distance

50
LEARNING MODELS : GEOMETRIC MODELS
 Distance-based models
 Distance is applied through the concept of neighbors and exemplars.
 Neighbors are points in proximity with respect to the distance measure
expressed through exemplars.
 Exemplars are either centroids that find a center of mass according to a chosen
distance metric or medoids that find the most centrally located data point.
 The most commonly used centroid is the arithmetic mean, which
minimizes squared Euclidean distance to all other points.
 Notes:
 The centroid represents the geometric center of a plane figure, i.e., the arithmetic
mean position of all the points in the figure from the centroid point. This
definition extends to any object in n-dimensional space: its centroid is the mean
position of all the points.
 Medoids are similar in concept to means or centroids. Medoids are most
commonly used on data when a mean or centroid cannot be defined. They are
used in contexts where the centroid is not representative of the dataset, such as
in image data.
 Examples of distance-based models include the nearest-neighbour
models, which use the training data as exemplars – for example, in
classification. The K-means clustering algorithm also uses exemplars to 51
create clusters of similar data points.
LEARNING MODELS : PROBABILISTIC MODELS
 Probabilistic models use the idea of probability to classify new
entities.
 Probabilistic models see features and target variables as random
variables. The process of modelling represents and manipulates the
level of uncertainty with respect to these variables.
 There are two types of probabilistic models: Predictive and
Generative.
 Predictive probability models use the idea of a conditional probability
distribution P (Y |X) from which Y can be predicted from X.
 Generative models estimate the joint distribution P (Y, X). Once we know
the joint distribution for the generative models, we can derive any
conditional or marginal distribution involving the same variables. Thus,
the generative model is capable of creating new data points and their
labels, knowing the joint probability distribution. The joint distribution
looks for a relationship between two variables. Once this relationship is
inferred, it is possible to infer new data points. 52
LEARNING MODELS : PROBABILISTIC MODELS

 Naïve Bayes
 Naïve Bayes is an example of a probabilistic classifier. We can do
this using the Bayes rule defined as

 The Naïve Bayes algorithm is based on the idea of Conditional


Probability. Conditional probability is based on finding the
probability that something will happen, given that something else
has already happened. The task of the algorithm then is to look at
the evidence and to determine the likelihood of a specific class
and assign a label accordingly to each entity.
53
SUMMARY OF LEARNING MODELS
 logical models use a logical expression to partition the instance space
 Geometric(such as distance-based models) uses the idea of distance
(e.g., Euclidian distance) to classify entities
 probabilistic models use the idea of probability to classify new entities.

Learning models

Geometric models Probabilistic Logical models

K-nearest neighbors, Naïve Bayes,


linear regression, Gaussian process Decision tree, random
support vector regression, conditional forest, …
machine, logistic random field, …
regression, …
54
DESIGNING A LEARNING SYSTEM
 For any learning system, we must be knowing the three elements — T
(Task), P (Performance Measure), and E (Training Experience).
 At a high level, the process of learning system looks as below.

55
DESIGNING A LEARNING SYSTEM
 The learning process starts with task T, performance measure P and
training experience E and objective are to find an unknown target
function.
 The target function is an exact knowledge to be learned from the
training experience and its unknown.
 For example, in a case of credit approval, the learning system will
have customer application records as experience and task would be
to classify whether the given customer application is eligible for a
loan.
 So in this case, the training examples can be represented as 8
(x1,y1)(x2,y2)..(xn,yn) where X represents customer application
details and y represents the status of credit approval.
 With these details, what is that exact knowledge to be learned
from the training experience?
 So the target function to be learned in the credit approval learning
system is a mapping function f:X →y. This function represents the
exact knowledge defining the relationship between input variable56
X and output variable y.
DESIGNING A LEARNING SYSTEM
 Just now we looked into the learning process and also understood the goal
of the learning. When we want to design a learning system that follows the
learning process, we need to consider a few design choices. The design
choices will be to decide the following key components
1. Choose the training experience
2. Choose exactly what is to be learned (the target function)
– i.e. the target function
3. Choose how to represent the target function
4. Choose a learning algorithm to infer the target function from the
experience
5. The final design

57
DESIGNING A LEARNING SYSTEM

 Example:
 We will look into the game - checkers learning problem
and apply the above design choices.
 For a checkers learning problem, the three elements will
be,
1. Task T: To play checkers
2. Performance measure P: Total percent of the game won in the
tournament.
3. Training experience E: A set of games played against itself

58
3

TYPES OF MACHINE LEARNING METHODS

59
SUPERVISED LEARNING: OVERVIEW
 Labels are provided
 SL is also called learning from exemplars.
 Supervised learning is a type of machine learning that uses
labeled data to train machine learning models. In labeled
data, the output is already known. The model just needs to
map the inputs to the respective outputs.
 Supervised machine learning algorithm works by using and
analyzing the labeled training data and produces/builds a
function/model, which can be used for mapping new examples (the
class labels for unseen instances) to its target outputs.
 SL has this form:
Given (x1, y1), (x2, y2), ..., (xn, yn)
The algorithm learns a function f(x) to predict y given x. 60
SUPERVISED LEARNING: OVERVIEW
 Example#1:
 Suppose the data consisting of the gender and age of the
patients and each patient is labeled as “healthy” or “sick”.

 Q. What will be the role of the supervised machine learning


algorithm in the above example?
 A. Therefore the purpose of a supervised machine learning
algorithm here is to learn/train the above data and a build a
function/model that identifies any new/unseen patient as 61

“sick” or “healthy” based on his age and gender parameters.


SUPERVISED LEARNING: OVERVIEW
 Example#2:
 An example of supervised learning is to train a system that
identifies the image of an animal.

62
SUPERVISED LEARNING: WHY “SUPERVISED LEARNING”?

 Supervised Learning methods need external supervision to


train machine learning models. They need guidance and
additional information to return the desired result.
 It can be thought of as a teacher supervising the learning
process. We know the correct answers (that is, the correct
outputs), the algorithm iteratively makes predictions on the
training data and is corrected by the teacher. Learning stops
when the algorithm achieves an acceptable level of
performance.

63
SUPERVISED LEARNING: TYPES SL PROBLEMS

 Classification and regression problems are the most


common types of supervised learning problems.

64
SUPERVISED LEARNING: CLASSIFICATION
 Classification: the labels to be predicted are categorical:
 Works by pattern recognition
 Face recognition:

 Optical character recognition: different styles, slant. . .

 Credit scoring: classify customers


into high- and low-risk, based
on their income and savings,
using data about past loans
(whether they were paid or not).

Model: IF income > θ1 AND savings > θ2


THEN low-risk ELSE high-risk 65
SUPERVISED LEARNING: CLASSIFICATION …
 Given (x1, y1), (x2, y2), ..., (xn, yn)
Learn a function f(x) to predict y given x
– y is categorical == classification

66
SUPERVISED LEARNING: REGRESSION …
 Regression: the labels to be predicted are continuous
Given (x1, y1), (x2, y2), ..., (xn, yn)
Learn a function f(x) to predict y given x
– y is real-valued == regression

67
SUPERVISED LEARNING: REGRESSION …
 Example:
 Credit scoring: classify customers into high- and low-risk, based on
their income and savings, using data about past loans (whether they
were paid or not).
 Predict the price of a car from its mileage.

68
SUPERVISED LEARNING: ALGORITHMS
 A wide range of supervised learning algorithms are
available, each with its strengths and weaknesses. There is
no single learning algorithm that works best on all
supervised learning problems
 Some of the most popularly used supervised learning
algorithms are:
 Linear Regression
 Logistic Regression
 Support Vector Machine
 K Nearest Neighbor
 Decision Tree
 Random Forest
 Naive Bayes
69
SUPERVISED LEARNING: APPLICATIONS
 Supervised learning algorithms are generally used
for solving classification and regression problems.
 Few of the top supervised learning applications are
weather prediction, sales forecasting, stock price
analysis.

70
UNSUPERVISED LEARNING
 Unsupervised learning is a type of machine learning that uses
unlabeled data to train machines and works by finding patterns
and understands the trends in the data to discover the output. So,
the model tries to label the data based on the features of the input
data.

 In unsupervised learning algorithms, a classification or


categorization (labels/classes) is not included in the observations.
But instead the algorithm tries to identify similarities between the
inputs so that inputs that have something in common are
categorized together.

 The training process used in “unsupervised learning” techniques


does not need any supervision to build models. They learn on
their own and predict the output.
71
UNSUPERVISED LEARNING …
 Example #1

 Here, we have taken an unlabeled input data, which means it is not categorized
and corresponding outputs are also not given. Now, this unlabeled input data is
fed to the machine learning model in order to train it. Firstly, it will interpret
the raw data to find the hidden patterns from the data and then will apply
suitable algorithms such as k-means clustering, Decision tree, etc. Once it
applies the suitable algorithm, the algorithm divides the data objects into
groups according to the similarities and difference between the objects.
72
UNSUPERVISED LEARNING …
 Example #2:
 Depicted below is an example of an unsupervised learning
technique that uses the images of vehicles to classify if it’s a bus
or a truck. The model learns by identifying the parts of a vehicle,
such as a length and width of the vehicle, the front, and rear end
covers, roof hoods, the types of wheels used, etc. Based on these
features, the model classifies if the vehicle is a bus or a truck.

73
UNSUPERVISED LEARNING …
 Example#3:
 Consider the following data regarding patients entering a
clinic. The data consists of the gender and age of the
patients.

 Q. Based on this data, can we infer anything regarding the


patients entering the clinic?

74
UNSUPERVISED LEARNING
 no labels provided, only input data.

75
UNSUPERVISED LEARNING: APPLICATIONS

 Unsupervised learning is used for solving


clustering and association problems.
 Learning associations:
 Basket analysis: let p(Y |X) = “probability that a customer who buys
product X also buys product Y ”, estimated from past purchases. If p(Y |X)
is large (say 0.7), associate “X → Y ”. When someone buys X, recommend
them Y .
 Clustering: group similar data points/instances.
 Density estimation: where are data points likely to lie?
 Dimensionality reduction: data lies in a low-dimensional manifold.
 Feature selection: keep only useful features.
 Outlier/novelty detection
 Customer segmentation: based on customer behavior, likes,
dislikes, and interests, you can segment and cluster similar
customers into a group. 76
 Image compression: Color quantization
UNSUPERVISED LEARNING: APPLICATIONS
 Genomics application: group individuals by genetic
similarity

77
UNSUPERVISED LEARNING: APPLICATIONS

78
UNSUPERVISED LEARNING: ALGORITHMS
 Selecting the right algorithm depends on the type of
problem you are trying to solve. Some of the common
examples of unsupervised learning are:
 K Means Clustering
 Hierarchical Clustering
 DBSCAN
 Principal Component Analysis (PCA)

79
SEMI-SUPERVISED LEARNING
 labels provided for some points only.
 It is a branch of machine learning that combines a small
amount of labeled data with a large amount of unlabeled
data during training.
 Semi-supervised learning falls between unsupervised
learning (with no labeled training data) and supervised
learning (with only labeled training data).

80
SEMI-SUPERVISED LEARNING: HOW SEMI-SUPERVISED LEARNING WORKS
 Semi-supervised machine learning is a combination
of supervised and unsupervised learning. It uses a small amount of
labeled data and a large amount of unlabeled data, which provides
the benefits of both unsupervised and supervised learning while
avoiding the challenges of finding a large amount of labeled data.
That means you can train a model to label data without having to
use as much labeled training data.
 Here’s how it works:
1. Train the model with the small amount of labeled training data just like
you would in supervised learning, until it gives you good results.
2. Then use it with the unlabeled training dataset to predict the outputs,
which are pseudo labels since they may not be quite accurate.
3. Link the labels from the labeled training data with the pseudo labels
created in the previous step.
4. Link the data inputs in the labeled training data with the inputs in the
unlabeled data.
5. Then, train the model the same way as you did with the labeled set in the
beginning in order to decrease the error and improve the model’s
accuracy. 81
SEMI-SUPERVISED LEARNING: APPLICATIONS

 Text document classifier: this is the type of situation where


semi-supervised learning is ideal because it would be nearly
impossible to find a large amount of labeled text documents.

 The Classification of Content on the Internet: the internet


is a vast trove of web pages, and it cannot be expected that
every page will be labeled and have all the data for the field
that you desire. However, at the same time, it is true that over
the years, some minority of web pages will have been labeled
for one dimension or the other.

82
SEMI-SUPERVISED LEARNING: ASSUMPTIONS

 Semi-supervised methods must make some assumption about


the data in order to justify using a small set of labeled data to
make conclusions about the unlabeled data points. These can
be grouped into three categories.
1. The first is the continuity assumption. This assumes that data
points that are “close” to each other are more likely to have a
common label.
2. The second is the cluster assumption. This assumes that the
data naturally forms discrete clusters, and that points in the same
cluster are more likely to share a label.
3. The third is the manifold assumption. This assumes that the
data roughly lies in a lower-dimensional space (or manifold) than
the input space. This scenario is relevant when an unobservable
or difficult-to-observe system with a small number of parameters83
produces high-dimensional observable output.
REINFORCEMENT LEARNING
 This is somewhere between supervised and unsupervised learning. The
algorithm gets told when the answer is wrong, but does not get told how to
correct it. It has to explore and try out different possibilities until it works
out how to get the answer right.

 Reinforcement learning is sometime called learning with a critic because


of this monitor that scores the answer, but does not suggest improvements.

 No supervised output but delayed reward.

 Given a sequence of states and actions with rewards, find a sequence of


actions (policy) that reaches a goal (output a policy)
 Policy is a mapping from states à actions that tells you what to do in a
given state
 Policies: what actions should an agent take in a particular situation
 Utility estimation: how good is a state (used by policy) 84
REINFORCEMENT LEARNING: HOW IT WORKS
 Reinforcement learning follows trial and error methods to get the
desired result. After accomplishing a task, the agent receives an
award. An example could be to train a dog to catch the ball. If the
dog learns to catch a ball, you give it a reward, such as a biscuit.

 Reinforcement Learning methods do not need any external


supervision to train models.

 Reinforcement learning problems are reward-based. For every task


or for every step completed, there will be a reward received by the
agent. If the task is not achieved correctly, there will be some
penalty added.
85
REINFORCEMENT LEARNING: THE AGENT-ENVIRONMENT INTERFACE
 Reinforcement Learning trains a machine to take suitable actions and
maximize its rewards in a particular situation. It uses an agent and an
environment to produce actions and rewards. The agent has a start and an
end state. But, there might be different paths for reaching the end state,
like a maze. In this learning technique, there is no predefined target
variable.

86
REINFORCEMENT LEARNING: EXAMPLE
 Example#1:
 An example of reinforcement learning is to train a machine that can identify
the shape of an object, given a list of different objects. In the example shown,
the model tries to predict the shape of the object, which is a square in this
case.

 Example #2:
 Consider teaching a dog a new trick: we cannot tell it what to do, but we can
reward/punish it if it does the right/wrong thing. It has to find out what it did
that made it get the reward/punishment. We can use a similar method to train
computers to do many tasks, such as playing backgammon or chess 87
REINFORCEMENT LEARNING: APPLICATIONS

 Reinforcement learning algorithms are widely used in the


gaming industries to build games. It is also used to train
robots to do human tasks.
 Playing chess or a computer game
 Credit assignment problem
 Game playing
 Robot in a maze

88
REINFORCEMENT LEARNING : SUMMARY
 Supervised (inductive) learning
– Given: training data + desired outputs (labels)
 Unsupervised learning
– Given: training data (without desired outputs)
 Semi-supervised learning
– Given: training data + a few desired outputs
 Reinforcement learning
– Rewards from sequence of actions

89
4

MODEL BUILDING

90
MACHINE LEARNING MODELS
 Machine learning models are computer programs that are
used to recognize patterns in data or make predictions.

 Machine learning models are created from machine learning


algorithms, which are trained using either labeled,
unlabeled, or mixed data.

 Different machine learning algorithms are selected as they


can be suited to different goals, such as classification,
regression, clustering, etc.
91
HOW TO BUILD A MACHINE LEARNING MODEL: COMMON STEPS

 Machine learning models are created by training algorithms with


either labeled or unlabeled data, or a mix of both using different
machine learning methods.
 Building a machine learning model project commonly involves the
following 10 steps:
 Step 1: Understand the business problem (and define success)
 Step 2: Understand and identify data
 Step 3: Collecting Data
 Step 4: Preparing data
 Step 5: Choose a method
 Step 6: Training a model
 Step 7: Evaluating the Model
 Step 8: Parameter tuning
 Step 9: Making Predictions
 Step 10: Deploy the machine learning model 92
STEP 1. UNDERSTAND THE BUSINESS PROBLEM (AND
DEFINE SUCCESS)
 The first phase of any machine learning project is developing an understanding
of the business requirements. You need to know what problem you're trying to
solve before attempting to solve it.
 To start, work with the owner of the project and make sure you understand the
project's objectives and requirements.
 Key questions to answer include the following:
 What's the business objective that requires a cognitive solution?
 What parts of the solution are cognitive, and what aren't?
 Have all the necessary technical, business and deployment issues been addressed?
 What are the defined "success" criteria for the project?
 How can the project be staged in iterative sprints?
 Are there any special requirements for transparency, explainability or bias reduction?
 What are the ethical considerations?
 What are the acceptable parameters for accuracy, precision and confusion matrix values?
 What are the expected inputs to the model and the expected outputs?
 What are the characteristics of the problem being solved? Is this a classification,
regression or clustering problem?
 What is the "heuristic" -- the quick-and-dirty approach to solving the problem that
doesn't require machine learning? How much better than the heuristic does the model
need to be?
93
 How will the benefits of the model be measured?
STEP 2. UNDERSTAND AND IDENTIFY DATA

 A machine learning model is built by learning and generalizing from training data, then
applying that acquired knowledge to new data it has never seen before to make
predictions and fulfill its purpose. Lack of data will prevent you from building the
model, and access to data isn't enough. Useful data needs to be clean and in a good
shape.
 Identify your data needs and determine whether the data is in proper shape for the
machine learning project. The focus should be on data identification, initial collection,
requirements, quality identification, insights and potentially interesting aspects that are
worth further investigation.
 Here are some key questions to consider:
 Where are the sources of the data that's needed for training the model?
 What quantity of data is needed for the machine learning project?
 What is the current quantity and quality of training data?
 How are the test set data and training set data being split?
 For supervised learning tasks, is there a way to label that data?
 Can pre-trained models be used?
 Where is the operational and training data located?
 Are there special needs for accessing real-time data on edge devices or in more difficult-to-
reach places?
 Answering these important questions helps you get a handle on the quantity and quality
of data as well as understand the type of data that's needed to make the model work.
94
STEP 3: COLLECTING DATA

 This step requires a the need for reliable data source and quality
data

 It is of the utmost importance to collect reliable data so that your


machine learning model can find the correct patterns. The quality of
the data that you feed to the machine will determine how accurate
your model is. If you have incorrect or outdated data, you will have
wrong outcomes or predictions which are not relevant.

 Make sure you use data from a reliable source, as it will directly
affect the outcome of your model. Good data is relevant, contains
very few missing and repeated values, and has a good
representation of the various subcategories/classes present.
95
STEP 4: PREPARING THE DATA
 After you have your data, you have to prepare it. You can do this
by :
 Putting together all the data you have and randomizing it. This helps
make sure that data is evenly distributed, and the ordering does not affect
the learning process.
 Cleaning the data to remove unwanted data, missing values, rows, and
columns, duplicate values, data type conversion, etc. You might even
have to restructure the dataset and change the rows and columns or index
of rows and columns.
 Visualize the data to understand how it is structured and understand the
relationship between various variables and classes present.
 Splitting the cleaned data into two sets - a training set and a testing set.
The training set is the set your model learns from. A testing set is used to
check the accuracy of your model after training.
 Data preparation and cleansing tasks can take a substantial amount
of time
96
STEP 4: PREPARING THE DATA: SPECIFIC ACTIVITIES
 Procedures during the data preparation, collection and cleansing process
include the following:
 Collect data from the various sources.
 Standardize formats across different data sources.
 Replace incorrect data.
 Enhance and augment data.
 Add more dimensions with pre-calculated amounts and aggregate information
as needed.
 Enhance data with third-party data.
 "Multiply" image-based data sets if they aren't sufficient enough for training.
 Remove extraneous information and deduplication.
 Remove irrelevant data from training to improve results.
 Reduce noise reduction and remove ambiguity.
 Consider anonymizing data.
 Normalize or standardize data to get it into formatted ranges.
 Sample data from large data sets.
 Select features that identify the most important dimensions and, if necessary,
reduce dimensions using a variety of techniques.
 Split data into training, test and validation sets. 97
STEP 5: CHOOSING an Algorithm

 A machine learning model determines the output you get after


running a machine learning algorithm on the collected data.

 It is important to choose an algorithm which is relevant to the task at


hand.

 Over the years, scientists and engineers developed various Algorithms


suited for different tasks like speech recognition, image
recognition, prediction, etc.

 Apart from this, you also have to see if your algorithm is suited for
numerical or categorical data and choose accordingly.

98
STEP 6: TRAINING THE MODEL

 Training is the most important step in machine learning.

 In training, you pass the prepared data to your machine learning


model to find patterns and make predictions. It results in the
model learning from the data so that it can accomplish the task set.

 Over time, with training, the model gets better at predicting.

99
STEP 5: EVALUATING THE MODEL

 After training your model, you have to check to see how it’s
performing. This is done by testing the performance of the
model on previously unseen data. The unseen data used is the
testing set that you split our data into earlier.

 If testing was done on the same data which is used for


training, you will not get an accurate measure, as the model is
already used to the data, and finds the same patterns in it, as it
previously did. This will give you disproportionately high
accuracy.

 When used on testing data, you get an accurate measure of


100
how your model will perform and its speed.
STEP 7: MODEL EVALUATION

 During the model evaluation process, you should do the


following:
 Evaluate the models using a validation data set.
 Determine confusion matrix values for classification problems.
 Identify methods for k-fold cross-validation if that approach is
used.
 Further tune hyperparameters for optimal performance.
 Compare the machine learning model to the baseline model or
heuristic.

101
STEP 8: PARAMETER TUNING
 Once you have created and evaluated your model, see if its
accuracy can be improved in any way. This is done by tuning
the parameters present in your model.

 Parameters are the variables in the model that the programmer


generally decides.

 At a particular value of your parameter, the accuracy will be


the maximum. Parameter tuning refers to finding these values.

102
STEP 9: MAKING PREDICTIONS

 In the end, you can use your model on unseen data to


make predictions accurately.

103
STEP 9: DEPLOY THE MACHINE LEARNING MODEL

 The last step in building a machine learning model is


the deployment of the model.

 Machine learning models are generally developed and tested


in a local or offline environment using training and testing
datasets.

 Deployment is when the model is moved into a live


environment, dealing with new and unseen data.

 This is the point that the model starts to bring a return on


investment to the organization, as it is performing the task it
was trained to do with live data.
104
5

MODEL EVALUATION

105
MODEL EVALUATION: OVERVIEW

 Key questions
Q. How well the model works/perform in an unseen data?

 While training a model is a key step, how the model


generalizes on unseen data is an equally important aspect
that should be considered in every machine learning
pipeline.

 We need to know whether it actually works and,


consequently, if we can trust its predictions.
106
MODEL EVALUATION : DEFINITION

 Model evaluation aims to estimate the generalization


accuracy of a model on future (unseen/out-of-sample)
data.
 The purpose of model evaluation is to help us to know
which algorithm best suits the given dataset for solving a
particular problem
 To select the “Best Fit” algorithm
 It evaluates the performance of different Machine Learning
models, based on the same input dataset.

107
MODEL EVALUATION TECHNIQUES

 There are two methods that are used to evaluate a model


performance. They are
1. Holdout
2. Cross Validation

 Both methods use a test set (i.e data not seen by the model) to
evaluate model performance.

 It’s not recommended to use the data we used to build the model
to evaluate it. This is because our model will simply remember the
whole training set, and will therefore always predict the correct108
label for any point in the training set. This is known as overfitting
MODEL EVALUATION TECHNIQUES: HOLDOUT METHOD

 The Holdout method is used to evaluate the model


performance and uses two types of data : training and testing
 The training data is used to train the system
 The test data is used to calculate the performance of the model
whereas it is trained using the training data set
 This method is used to check how well the machine learning
model developed using different algorithm
techniques performs on unseen samples of data.
 The approach is simple, flexible and fast.

 E.g. 80/20% train-test data split

109
CROSS-VALIDATION
 k-fold cross-validation is the most common cross-validation technique
and it works as the following way:
 The original dataset is partitioned into k equal size subsamples, called folds.
 The k is a user-specified number, usually with 5 or 10 as its preferred value.
 This is repeated k times, such that each time, one of the k subsets is used as
the test set/validation set and the other k-1 subsets are put together to form a
training set.
 The error estimation is averaged over all k trials to get the total effectiveness
of our model.
 Example:
 when performing five-fold cross-validation, the data is first partitioned into 5
parts of (approximately) equal size. A sequence of models is trained. The first
model is trained using the first fold as the test set, and the remaining folds are
used as the training set. This is repeated for each of these 5 splits of the data
and the estimation of accuracy is averaged over all 5 trials to get the total
effectiveness of our model.

 Cross-validation is usually the preferred method because it gives your


model the opportunity to train on multiple train-test splits. This gives you
a better indication of how well your model will perform on unseen data.110
Hold-out, on the other hand, is dependent on just one train-test split.
MODEL EVALUATION METRICS

 Model evaluation metrics are required to quantify model


performance.

 The choice of evaluation metrics depends on a given machine


learning task (such as classification, regression, ranking,
clustering, topic modeling, among others).
 All tasks may not require all evaluation metrics
 Some metrics, such as precision-recall, are useful for multiple tasks.

 Common types of evaluation metrics are depends on the type of


machine learning task
 Classification model
 Clustering model
 Forecast model
111
 Outlier model
MODEL EVALUATION: CLASSIFICATION METRICS

 The different types of classification metrics are:


 Classification Accuracy
 Confusion Matrix
 F-Measure
 Logarithmic Loss
 Area under Curve (AUC)

112
MODEL EVALUATION: CLASSIFICATION METRICS

 Classification Accuracy
 Classification accuracy is similar to the term Accuracy. It is
the ratio of the correct predictions to the total number of
Predictions made by the model from the given data.

113
MODEL EVALUATION: CLASSIFICATION METRICS
 Confusion Matrix
 It is a NxN matrix structure used for
evaluating the performance of a classification
model, where N is the number of classes that
are predicted.
 It is operated on a test dataset in which the
true values are known.
 The matrix lets us know about the number of
incorrect and correct predictions made by a
classifier and is used to find correctness of the
model.
 It consists of values like True Positive, False
Positive, True Negative, and False Negative,
which helps in measuring Accuracy, Precision,
Recall, Specificity, Sensitivity, and AUC curve.
114
MODEL EVALUATION: CLASSIFICATION METRICS
 Confusion matrix:
 There are 4 important terms in confusion matrix:
 True Positives (TP): The cases in which our predictions are TRUE, and the actual output was also
TRUE.
 True Negatives (TN): The cases in which our predictions are FALSE, and the actual output was
also FALSE.
 False Positives (FP): The cases in which our predictions are TRUE, and the actual output was
FALSE.
 False Negative (FN): The cases in which our predictions are FALSE, and the actual output was
TRUE.
 Helps to calculate accuracy, precision, recall and F-measure
 The accuracy can be calculated by using the mean of True Positive and True Negative values of the
total sample values. It tells us about the total number of predictions made by the model that were
correct.
 Precision is the ratio of Number of True Positives in the sample to the total Positive
samples predicted by the classifier. It tells us about the positive samples that were correctly
identified by the model.
 Recall is the ratio of Number of True Positives in the sample to the sum of True Positive and False
Negative samples in the data.
 F1 Score
 It is also called as F-Measure. It is a best measure of the Test accuracy of the developed model. It
makes our task easy by eliminating the need to calculate Precision and Recall separately to know
about the model performance. F1 Score is the Harmonic mean of Recall and Precision. Higher the
F1 Score, better the performance of the model. Without calculating Precision and Recall separately,
115
we can calculate the model performance using F1 score as it is precise and robust.
REGRESSION METRICS
 It helps to predict the state of outcome at any time with the help of
independent variables that are correlated.
 These metrics are designed in order to predict if the data is underfitted or
overfitted for the better usage of the model.
 They are:-
 Mean Absolute Error (MAE)
 Mean Squared Error (MSE)
 Root Mean Squared Error (RMSE)
 Mean Absolute Error is the average of the difference of the original values and
the predicted values. It gives us an idea of how far the predictions are from the
actual output. It doesn’t give clarity on whether the data is under fitted or over
fitted. It is calculated as follows:
 The mean squared error is similar to the mean absolute error. It is computed by
taking the average of the square of the difference between original
and predicted values. With the help of squaring, large errors can be converted
to small errors and large errors can be dealt with. It is computed as follows.
 The root mean squared error is the root of the mean of the square of difference
of the predicted and actual values of the given data. It is the most popular
metric evolution technique used in regression problems. It follows a normal116
distribution and is based on the assumption that errors are unbiased. It is
computed using the below formulae.
6
APPLICATIONS & TRENDS IN
MACHINE LEARNING

117
READING ASSIGNMENT
END

118

You might also like