Professional Documents
Culture Documents
Dr. Ahmed Elngar - ML
Dr. Ahmed Elngar - ML
LEARNING (ML)
HOW TO BUILD AND EVALUATE A MACHINE LEARNING MODEL: COMMON STEPS"
1
Dr. Ahmed Elngar
Faculty of Computers and Artificial Intelligence,
Beni-Suef University , Egypt
elngar_7@yahoo.co.uk
OUTLINES OF THE COURSE
• Introduction to Machine Learning
1
• Concepts of Learning and its process
2
• Types of Learning and Machine learning methods
3
• Model Building
4
• Evaluation
5
• Applications & Current trends in machine learning
6
2
REFERENCES
TEXT BOOKS:
1) Ethem Alpaydin, ”Introduction to Machine Learning”, MIT Press,
Prentice Hall of India, 3rd Edition2014.
2) Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar ” Foundations
of Machine Learning”, MIT Press,2012.
3) MACHINE LEARNING - An Algorithmic Perspective, Second Edition,
Stephen Marsland, 2015.
4) Tom Mitchell, “Machine Learning”, McGraw Hill, 3rdEdition, 1997.
REFERENCE BOOKS:
1) CharuC.Aggarwal,“Data Classification Algorithms and Applications”,
CRCPress, 2014.
2) Charu C. Aggarwal, “DATA CLUSTERING Algorithms and
Applications”, CRC Press, 2014.
3) Kevin P. Murphy ”Machine Learning: A Probabilistic Perspective”, The
MIT Press, 2012
4) Jiawei Han and Micheline Kambers and JianPei, “Data Mining Concepts
andTechniques”,3rd edition, Morgan Kaufman Publications, 2012. 3
4
OBJECTIVES OF LEARNING ML?
The main objectives of learning machine learning is to
know and understand:
What are the different types in machine learning?
What are the different algorithms available for developing
machine learning models?
What tools are available for developing these models?
What are the programming language choices?
What platforms support development and deployment of
Machine Learning applications?
What IDEs (Integrated Development Environment) are
available?
How to quickly upgrade your skills in this important area? 5
1
6
AI VS. MACHINE LEARNING – “LEARN”
Machine learning (ML) is a subset of artificial
intelligence (AI), that is all about getting an AI to
accomplish tasks without being given specific
instructions. In essence, it’s about teaching
machines how to learn!
7
AI VS. MACHINE LEARNING – “LEARNING”
AI is simulated human cognition, that is supposed to do
via learning!
What is learning?
Were we born with PhD level intelligence?
Of course not!
At the beginning of our lives, we have little
understanding of the world around us, but over
time we grow to learn a lot. We use our senses to
take in data, and learn via a combination of interacting
with the world around us, being explicitly taught
certain things by others, finding patterns over time,
and, of course, lots of trial-and-error
“Learning is any process by which a system improves
performance from experience.” 8
- Herbert Simon
AI VS. MACHINE LEARNING – “LEARNING”
AI learns in a similar way. When it’s first created, an
AI knows nothing; ML gives AI the ability to learn
about its world.
AI is all about allowing a system to learn from
examples rather than instructions. ML is what
makes that possible.
9
AI VS. MACHINE LEARNING – “LEARNING”
AIs are taught, not explicitly programmed. In other words,
instead of spelling out specific rules to solve a problem, we
give them examples of what they will encounter in the real
world and let them find the patterns themselves. Allowing
machines to find patterns is beneficial over spelling out the
instructions when the instructions are hard or unknown or
when the data has many different variables, for example
treating cancer, predicting the stock market.
10
WHAT IS MACHINE LEARNING?
Arthur Samuel (at IBM )- coined the term “Machine
Learning” in 1959 at the first time .
He defined machine learning as:
11
DEFINITION OF ML
Machine learning is a branch of artificial intelligence (AI) and
computer science which focuses on the use of data and algorithms to
imitate the way that humans learn, gradually improving its accuracy.
Machine learning is programming computers to optimize a
performance criterion using example data or past experience.
Machine learning (ML) is defined as a discipline of artificial intelligence
(AI) that provides machines the ability to automatically learn from data
and past experiences to identify patterns and make predictions with
minimal human intervention.
ML creates a model defined up to some parameters, and learning is
the execution of a computer program to optimize the parameters of the
model using the training data or past experience. The model may be
predictive to make predictions in the future, or descriptive to gain
knowledge from data, or both.
A model:
is a compressed version of a database;
extracts knowledge from it;
12
does not have perfect performance but is a useful approximation to the
data.
DEFINITION OF ML?
Example #1:
A classic example of a task that requires machine
learning: It is very hard to say what makes a 2
15
WHAT IS MACHINE LEARNING?- WHY “MACHINE LEARNING”?
Example #2: House price prediction
After plotting various data points on the XY plot, we draw a best-fit line
to do our predictions for any other house given its size. You will feed
the known data to the machine and ask it to find the best fit line. Once
the best fit line is found by the machine, you will test its suitability by
feeding in a known house size, i.e. the Y-value in the above curve. The 16
machine will now return the estimated X-value, i.e. the expected price
of the house.
WHAT IS MACHINE LEARNING?- WHY “MACHINE LEARNING”?
Example #3:
Some more examples of tasks that are best solved by
using a learning algorithm
Recognizing patterns:
Facial identities or facial expressions
Handwritten or spoken words
Medical images
Generating patterns:
Generating images or motion sequences
Recognizing anomalies:
Unusual credit card transactions
Unusual patterns of sensor readings in a nuclear power plant
Prediction:
17
Future stock prices or currency exchange rates
WHAT IS MACHINE LEARNING?- THE MACHINE LEARNING PROCESS
18
WHAT IS MACHINE LEARNING?- BASIC MACHINE LEARNING STEPS
19
WHAT IS MACHINE LEARNING?- TRADITIONAL
PROGRAMMING VS. MACHINE LEARNING
20
WHAT IS MACHINE LEARNING?- WHEN DO WE USE
MACHINE LEARNING?
ML is used when:
Human expertise does not exist (navigating on Mars)
Humans can’t explain their expertise (speech recognition)
Models must be customized (personalized medicine)
Solution needs to be adapted to particular cases (user biometrics)
Models are based on huge amounts of data (genomics)
Solution changes in time (routing on a computer network)
21
STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
Autonomous cars
22
STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
Autonomous car sensors
23
STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
Autonomous car technologies
24
STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
Deep learning emergence
25
STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
Deep Belief Net on Face Images
26
STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
Learning of Object Parts
27
STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
Training on Multiple Objects
28
STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
Automatic Speech recognition systems
29
STATE OF THE ART APPLICATIONS OF MACHINE
LEARNING
Speech technologies
30
HISTORY OF ML
1950s
– Samuel’s checker player
– Selfridge’s Pandemonium
1960s:
– Neural networks: Perceptron
– Pattern recognition
– Learning in the limit theory
– Minsky and Papert prove limitationsof Perceptron
1970s:
– Symbolic concept induction
– Winston’s arch learner
– Expert systems and the knowledge acquisition bottleneck
– Quinlan’s D3
– Michalski’s AQ and soybean diagnosis
– Scientific discovery with BACON
– Mathematical discovery with AM 31
HISTORY OF ML…
1980s:
– Advanced decision tree and rule learning
– Explanation-based Learning (EBL)
– Learning and planning and problem solving
– Utility problem
– Analogy
– Cognitive architectures
– Resurgence of neural networks (connectionism, backpropagation)
– Valiant’s PAC Learning Theory
– Focus on experimental methodology
1990s
– Data mining
– Adaptive software agents and web applications
– Text learning
– Reinforcement learning (RL)
– Inductive Logic Programming (ILP)
– Ensembles: Bagging, Boosting, and Stacking
– Bayes Net learning 32
HISTORY OF ML…
2000s
– Support vector machines & kernel methods
– Graphical models
– Statistical relational learning
– Transfer learning
– Sequence labeling
– Collective classification and structured outputs
– Computer Systems Applications (Compilers, Debugging, Graphics,
Security)
– E-mail management
– Personalized assistants that learn
– Learning in robotics and vision
2010s
– Deep learning systems
– Learning for big data
– Bayesian methods
– Multi-task & lifelong learning
– Applications to vision, speech, social networks, learning to read, etc 33
– ???
APPLICATION OF MACHINE LEARNING
The following is a list of some of the typical applications of machine learning.
1. In retail business, machine learning is used to study consumer behaviour.
2. In finance, banks analyze their past data to build models to use in credit
applications, fraud detection, and the stock market.
3. In manufacturing, learning models are used for optimization, control, and
troubleshooting. 3
4. In medicine, learning programs are used for medical diagnosis.
5. In telecommunications, call patterns are analyzed for network optimization and
maximizing the quality of service.
6. In science, large amounts of data in physics, astronomy, and biology can only be
analyzed fast enough by computers. The World Wide Web is huge; it is constantly
growing and searching for relevant information cannot be done manually.
7. In artificial intelligence, it is used to teach a system to learn and adapt to
changes so that the system designer need not foresee and provide solutions for all
possible situations.
8. It is used to find solutions to many problems in vision, speech recognition, and
robotics.
9. Machine learning methods are applied in the design of computer-controlled
vehicles to steer correctly when driving on a variety of roads. 34
10. Machine learning methods have been used to develop programmes for playing
games such as chess, backgammon and Go
CHAPTER SUMMARY
Learning can be viewed as using direct or indirect
experience to approximate a chosen target function.
Learning general models from a data of particular examples
Data is cheap and abundant (data warehouses, data
marts); knowledge is expensive and scarce.
Example in retail: Customer transactions to consumer
behavior:
People who bought “Da Vinci Code” also bought “The Five
People You Meet in Heaven” (www.amazon.com)
Machine Learning builds a model that is a good and
useful approximation to the data.
35
2
36
LEARNING
Definition
A computer program is said to learn from experience E
with respect to some class of tasks T and performance
measure P, if its performance at tasks T, as measured by
P, improves with experience E.
37
LEARNING …
Examples: defining a learning task
I. Handwriting recognition learning problem
T: Recognizing and classifying handwritten words within images
P: Percent of words correctly classified
E: A dataset of handwritten words with given classifications
II. A robot driving learning problem
T: Driving on highways using vision sensors
P: Average distance traveled before an error
E: A sequence of images and steering commands recorded
while observing a human driver
III. A chess learning problem
T: Playing chess
P: Percent of games won against opponents
E: Playing practice games against itself.
IV. Spam filtering
T: Categorize email messages as spam or legitimate.
P: Percentage of email messages correctly classified.
E: Database of emails, some with human-given label 38
COMPONENTS OF LEARNING
Basic components of learning process
The learning process, whether by a human or a machine, can
be divided into four components, namely, data storage,
abstraction, generalization and evaluation
memory and similar devices to store data and use cables and other
technology to retrieve data.
40
COMPONENTS OF LEARNING PROCESS …
Abstraction (2)
The second component of the learning process is known as
abstraction.
Abstraction is the process of extracting knowledge about stored
data. This involves creating general concepts about the data as a
whole. The creation of knowledge involves application of known
models and creation of new models.
The process of fitting a model to a dataset is known as training.
When the model has been trained, the data is transformed into an
abstract form that summarizes the original information
41
COMPONENTS OF LEARNING PROCESS …
Generalization (3)
The third component of the learning process is known as
generalization.
The term generalization describes the process of turning the
knowledge about stored data into a form that can be utilized for
future action.
These actions are to be carried out on tasks that are similar, but
not identical, to those what have been seen before.
In generalization, the goal is to discover those properties of the
data that will be most relevant to future tasks
42
COMPONENTS OF LEARNING PROCESS …
Evaluation (4)
Evaluation is the last component of the learning process. It is the
process of giving feedback to the user to measure the utility of the
learned knowledge.
This feedback is then utilized to effect improvements in the whole
learning process
43
LEARNING MODELS
Machine learning is concerned with using the right features to
build the right models that achieve the right tasks.
For a given problem, the collection of all possible outcomes
represents the sample space or instance space.
The basic idea of Learning models has divided into three
categories.
Using a Logical expression. (Logical models)
Using the Geometry of the instance space. (Geometric models)
Using Probability to classify the instance space. (Probabilistic
models)
Grouping and Grading (an orthogonal categorization to
geometric-probabilistic-logical-compositional) 44
LEARNING MODELS : LOGICAL MODELS
Logical models use a logical expression to divide the instance
space into segments and hence construct grouping models.
A logical expression is an expression that returns a Boolean
value, i.e., a True or False outcome.
Once the data is grouped using a logical expression, the data
is divided into homogeneous groupings for the problem we
are trying to solve.
For example, for a classification problem, all the instances in
the group belong to one class.
45
LEARNING MODELS : LOGICAL MODELS …
There are mainly two kinds of logical models: Tree models
and Rule models.
Rule models consist of a collection of implications or IF-THEN
rules.
For tree-based models, the ‘if-part’ defines a segment and the
‘then-part’ defines the behaviour of the model for this segment.
Rule models follow the same reasoning.
logical models, such as decision trees, a logical expression is
used to partition the instance space. Two instances are similar
when they end up in the same logical segment.
46
LEARNING MODELS : LOGICAL MODELS …
Example:
“Enjoy Sport” as shown above is defined by a set of data from some example days. Each data is
described by six attributes. The task is to learn to predict the value of Enjoy Sport for an
arbitrary day based on the values of its attribute values. The problem can be represented by a
series of hypotheses. Each hypothesis is described by a conjunction of constraints on the
attributes. The training data represents a set of positive and negative examples of the target
function. In the example above, each hypothesis is a vector of six constraints, specifying the
values of the six attributes – Sky, AirTemp, Humidity, Wind, Water, and Forecast. The training
phase involves learning the set of days (as a conjunction of attributes) for which Enjoy Sport =
yes.
Thus, the problem can be formulated as:
Given instances X which represent a set of all possible days, each described by the attributes:
o Sky – (values: Sunny, Cloudy, Rainy),
o AirTemp – (values: Warm, Cold),
o Humidity – (values: Normal, High),
o Wind – (values: Strong, Weak),
47
o Water – (values: Warm, Cold),
o Forecast – (values: Same, Change).
Q. Try to identify a function that can predict the target variable Enjoy Sport as yes/no, i.e., 1 or 0.
LEARNING MODELS : GEOMETRIC MODELS …
In Geometric models, features could be described as points in
two dimensions (x- and y-axis) or a three-dimensional space
(x, y, and z).
for example, temperature as a function of time can be modelled in
two axes
In geometric models, there are two ways we could impose
similarity.
We could use geometric concepts like lines or planes to segment
(classify) the instance space. These are called Linear models.
Alternatively, we can use the geometric notion of distance to
represent similarity. In this case, if two points are close together,
they have similar values for features and thus can be classed as
similar. We call such models as Distance-based models.
48
LEARNING MODELS : GEOMETRIC MODELS
Linear models
Linear models are relatively simple. In this case, the function is
represented as a linear combination of its inputs.
In the simplest case where f(x) represents a straight line, we have
an equation of the form f (x) = mx + c where c represents the
intercept and m represents the slope.
50
LEARNING MODELS : GEOMETRIC MODELS
Distance-based models
Distance is applied through the concept of neighbors and exemplars.
Neighbors are points in proximity with respect to the distance measure
expressed through exemplars.
Exemplars are either centroids that find a center of mass according to a chosen
distance metric or medoids that find the most centrally located data point.
The most commonly used centroid is the arithmetic mean, which
minimizes squared Euclidean distance to all other points.
Notes:
The centroid represents the geometric center of a plane figure, i.e., the arithmetic
mean position of all the points in the figure from the centroid point. This
definition extends to any object in n-dimensional space: its centroid is the mean
position of all the points.
Medoids are similar in concept to means or centroids. Medoids are most
commonly used on data when a mean or centroid cannot be defined. They are
used in contexts where the centroid is not representative of the dataset, such as
in image data.
Examples of distance-based models include the nearest-neighbour
models, which use the training data as exemplars – for example, in
classification. The K-means clustering algorithm also uses exemplars to 51
create clusters of similar data points.
LEARNING MODELS : PROBABILISTIC MODELS
Probabilistic models use the idea of probability to classify new
entities.
Probabilistic models see features and target variables as random
variables. The process of modelling represents and manipulates the
level of uncertainty with respect to these variables.
There are two types of probabilistic models: Predictive and
Generative.
Predictive probability models use the idea of a conditional probability
distribution P (Y |X) from which Y can be predicted from X.
Generative models estimate the joint distribution P (Y, X). Once we know
the joint distribution for the generative models, we can derive any
conditional or marginal distribution involving the same variables. Thus,
the generative model is capable of creating new data points and their
labels, knowing the joint probability distribution. The joint distribution
looks for a relationship between two variables. Once this relationship is
inferred, it is possible to infer new data points. 52
LEARNING MODELS : PROBABILISTIC MODELS
Naïve Bayes
Naïve Bayes is an example of a probabilistic classifier. We can do
this using the Bayes rule defined as
Learning models
55
DESIGNING A LEARNING SYSTEM
The learning process starts with task T, performance measure P and
training experience E and objective are to find an unknown target
function.
The target function is an exact knowledge to be learned from the
training experience and its unknown.
For example, in a case of credit approval, the learning system will
have customer application records as experience and task would be
to classify whether the given customer application is eligible for a
loan.
So in this case, the training examples can be represented as 8
(x1,y1)(x2,y2)..(xn,yn) where X represents customer application
details and y represents the status of credit approval.
With these details, what is that exact knowledge to be learned
from the training experience?
So the target function to be learned in the credit approval learning
system is a mapping function f:X →y. This function represents the
exact knowledge defining the relationship between input variable56
X and output variable y.
DESIGNING A LEARNING SYSTEM
Just now we looked into the learning process and also understood the goal
of the learning. When we want to design a learning system that follows the
learning process, we need to consider a few design choices. The design
choices will be to decide the following key components
1. Choose the training experience
2. Choose exactly what is to be learned (the target function)
– i.e. the target function
3. Choose how to represent the target function
4. Choose a learning algorithm to infer the target function from the
experience
5. The final design
57
DESIGNING A LEARNING SYSTEM
Example:
We will look into the game - checkers learning problem
and apply the above design choices.
For a checkers learning problem, the three elements will
be,
1. Task T: To play checkers
2. Performance measure P: Total percent of the game won in the
tournament.
3. Training experience E: A set of games played against itself
58
3
59
SUPERVISED LEARNING: OVERVIEW
Labels are provided
SL is also called learning from exemplars.
Supervised learning is a type of machine learning that uses
labeled data to train machine learning models. In labeled
data, the output is already known. The model just needs to
map the inputs to the respective outputs.
Supervised machine learning algorithm works by using and
analyzing the labeled training data and produces/builds a
function/model, which can be used for mapping new examples (the
class labels for unseen instances) to its target outputs.
SL has this form:
Given (x1, y1), (x2, y2), ..., (xn, yn)
The algorithm learns a function f(x) to predict y given x. 60
SUPERVISED LEARNING: OVERVIEW
Example#1:
Suppose the data consisting of the gender and age of the
patients and each patient is labeled as “healthy” or “sick”.
62
SUPERVISED LEARNING: WHY “SUPERVISED LEARNING”?
63
SUPERVISED LEARNING: TYPES SL PROBLEMS
64
SUPERVISED LEARNING: CLASSIFICATION
Classification: the labels to be predicted are categorical:
Works by pattern recognition
Face recognition:
66
SUPERVISED LEARNING: REGRESSION …
Regression: the labels to be predicted are continuous
Given (x1, y1), (x2, y2), ..., (xn, yn)
Learn a function f(x) to predict y given x
– y is real-valued == regression
67
SUPERVISED LEARNING: REGRESSION …
Example:
Credit scoring: classify customers into high- and low-risk, based on
their income and savings, using data about past loans (whether they
were paid or not).
Predict the price of a car from its mileage.
68
SUPERVISED LEARNING: ALGORITHMS
A wide range of supervised learning algorithms are
available, each with its strengths and weaknesses. There is
no single learning algorithm that works best on all
supervised learning problems
Some of the most popularly used supervised learning
algorithms are:
Linear Regression
Logistic Regression
Support Vector Machine
K Nearest Neighbor
Decision Tree
Random Forest
Naive Bayes
69
SUPERVISED LEARNING: APPLICATIONS
Supervised learning algorithms are generally used
for solving classification and regression problems.
Few of the top supervised learning applications are
weather prediction, sales forecasting, stock price
analysis.
70
UNSUPERVISED LEARNING
Unsupervised learning is a type of machine learning that uses
unlabeled data to train machines and works by finding patterns
and understands the trends in the data to discover the output. So,
the model tries to label the data based on the features of the input
data.
Here, we have taken an unlabeled input data, which means it is not categorized
and corresponding outputs are also not given. Now, this unlabeled input data is
fed to the machine learning model in order to train it. Firstly, it will interpret
the raw data to find the hidden patterns from the data and then will apply
suitable algorithms such as k-means clustering, Decision tree, etc. Once it
applies the suitable algorithm, the algorithm divides the data objects into
groups according to the similarities and difference between the objects.
72
UNSUPERVISED LEARNING …
Example #2:
Depicted below is an example of an unsupervised learning
technique that uses the images of vehicles to classify if it’s a bus
or a truck. The model learns by identifying the parts of a vehicle,
such as a length and width of the vehicle, the front, and rear end
covers, roof hoods, the types of wheels used, etc. Based on these
features, the model classifies if the vehicle is a bus or a truck.
73
UNSUPERVISED LEARNING …
Example#3:
Consider the following data regarding patients entering a
clinic. The data consists of the gender and age of the
patients.
74
UNSUPERVISED LEARNING
no labels provided, only input data.
75
UNSUPERVISED LEARNING: APPLICATIONS
77
UNSUPERVISED LEARNING: APPLICATIONS
78
UNSUPERVISED LEARNING: ALGORITHMS
Selecting the right algorithm depends on the type of
problem you are trying to solve. Some of the common
examples of unsupervised learning are:
K Means Clustering
Hierarchical Clustering
DBSCAN
Principal Component Analysis (PCA)
79
SEMI-SUPERVISED LEARNING
labels provided for some points only.
It is a branch of machine learning that combines a small
amount of labeled data with a large amount of unlabeled
data during training.
Semi-supervised learning falls between unsupervised
learning (with no labeled training data) and supervised
learning (with only labeled training data).
80
SEMI-SUPERVISED LEARNING: HOW SEMI-SUPERVISED LEARNING WORKS
Semi-supervised machine learning is a combination
of supervised and unsupervised learning. It uses a small amount of
labeled data and a large amount of unlabeled data, which provides
the benefits of both unsupervised and supervised learning while
avoiding the challenges of finding a large amount of labeled data.
That means you can train a model to label data without having to
use as much labeled training data.
Here’s how it works:
1. Train the model with the small amount of labeled training data just like
you would in supervised learning, until it gives you good results.
2. Then use it with the unlabeled training dataset to predict the outputs,
which are pseudo labels since they may not be quite accurate.
3. Link the labels from the labeled training data with the pseudo labels
created in the previous step.
4. Link the data inputs in the labeled training data with the inputs in the
unlabeled data.
5. Then, train the model the same way as you did with the labeled set in the
beginning in order to decrease the error and improve the model’s
accuracy. 81
SEMI-SUPERVISED LEARNING: APPLICATIONS
82
SEMI-SUPERVISED LEARNING: ASSUMPTIONS
86
REINFORCEMENT LEARNING: EXAMPLE
Example#1:
An example of reinforcement learning is to train a machine that can identify
the shape of an object, given a list of different objects. In the example shown,
the model tries to predict the shape of the object, which is a square in this
case.
Example #2:
Consider teaching a dog a new trick: we cannot tell it what to do, but we can
reward/punish it if it does the right/wrong thing. It has to find out what it did
that made it get the reward/punishment. We can use a similar method to train
computers to do many tasks, such as playing backgammon or chess 87
REINFORCEMENT LEARNING: APPLICATIONS
88
REINFORCEMENT LEARNING : SUMMARY
Supervised (inductive) learning
– Given: training data + desired outputs (labels)
Unsupervised learning
– Given: training data (without desired outputs)
Semi-supervised learning
– Given: training data + a few desired outputs
Reinforcement learning
– Rewards from sequence of actions
89
4
MODEL BUILDING
90
MACHINE LEARNING MODELS
Machine learning models are computer programs that are
used to recognize patterns in data or make predictions.
A machine learning model is built by learning and generalizing from training data, then
applying that acquired knowledge to new data it has never seen before to make
predictions and fulfill its purpose. Lack of data will prevent you from building the
model, and access to data isn't enough. Useful data needs to be clean and in a good
shape.
Identify your data needs and determine whether the data is in proper shape for the
machine learning project. The focus should be on data identification, initial collection,
requirements, quality identification, insights and potentially interesting aspects that are
worth further investigation.
Here are some key questions to consider:
Where are the sources of the data that's needed for training the model?
What quantity of data is needed for the machine learning project?
What is the current quantity and quality of training data?
How are the test set data and training set data being split?
For supervised learning tasks, is there a way to label that data?
Can pre-trained models be used?
Where is the operational and training data located?
Are there special needs for accessing real-time data on edge devices or in more difficult-to-
reach places?
Answering these important questions helps you get a handle on the quantity and quality
of data as well as understand the type of data that's needed to make the model work.
94
STEP 3: COLLECTING DATA
This step requires a the need for reliable data source and quality
data
Make sure you use data from a reliable source, as it will directly
affect the outcome of your model. Good data is relevant, contains
very few missing and repeated values, and has a good
representation of the various subcategories/classes present.
95
STEP 4: PREPARING THE DATA
After you have your data, you have to prepare it. You can do this
by :
Putting together all the data you have and randomizing it. This helps
make sure that data is evenly distributed, and the ordering does not affect
the learning process.
Cleaning the data to remove unwanted data, missing values, rows, and
columns, duplicate values, data type conversion, etc. You might even
have to restructure the dataset and change the rows and columns or index
of rows and columns.
Visualize the data to understand how it is structured and understand the
relationship between various variables and classes present.
Splitting the cleaned data into two sets - a training set and a testing set.
The training set is the set your model learns from. A testing set is used to
check the accuracy of your model after training.
Data preparation and cleansing tasks can take a substantial amount
of time
96
STEP 4: PREPARING THE DATA: SPECIFIC ACTIVITIES
Procedures during the data preparation, collection and cleansing process
include the following:
Collect data from the various sources.
Standardize formats across different data sources.
Replace incorrect data.
Enhance and augment data.
Add more dimensions with pre-calculated amounts and aggregate information
as needed.
Enhance data with third-party data.
"Multiply" image-based data sets if they aren't sufficient enough for training.
Remove extraneous information and deduplication.
Remove irrelevant data from training to improve results.
Reduce noise reduction and remove ambiguity.
Consider anonymizing data.
Normalize or standardize data to get it into formatted ranges.
Sample data from large data sets.
Select features that identify the most important dimensions and, if necessary,
reduce dimensions using a variety of techniques.
Split data into training, test and validation sets. 97
STEP 5: CHOOSING an Algorithm
Apart from this, you also have to see if your algorithm is suited for
numerical or categorical data and choose accordingly.
98
STEP 6: TRAINING THE MODEL
99
STEP 5: EVALUATING THE MODEL
After training your model, you have to check to see how it’s
performing. This is done by testing the performance of the
model on previously unseen data. The unseen data used is the
testing set that you split our data into earlier.
101
STEP 8: PARAMETER TUNING
Once you have created and evaluated your model, see if its
accuracy can be improved in any way. This is done by tuning
the parameters present in your model.
102
STEP 9: MAKING PREDICTIONS
103
STEP 9: DEPLOY THE MACHINE LEARNING MODEL
MODEL EVALUATION
105
MODEL EVALUATION: OVERVIEW
Key questions
Q. How well the model works/perform in an unseen data?
107
MODEL EVALUATION TECHNIQUES
Both methods use a test set (i.e data not seen by the model) to
evaluate model performance.
It’s not recommended to use the data we used to build the model
to evaluate it. This is because our model will simply remember the
whole training set, and will therefore always predict the correct108
label for any point in the training set. This is known as overfitting
MODEL EVALUATION TECHNIQUES: HOLDOUT METHOD
109
CROSS-VALIDATION
k-fold cross-validation is the most common cross-validation technique
and it works as the following way:
The original dataset is partitioned into k equal size subsamples, called folds.
The k is a user-specified number, usually with 5 or 10 as its preferred value.
This is repeated k times, such that each time, one of the k subsets is used as
the test set/validation set and the other k-1 subsets are put together to form a
training set.
The error estimation is averaged over all k trials to get the total effectiveness
of our model.
Example:
when performing five-fold cross-validation, the data is first partitioned into 5
parts of (approximately) equal size. A sequence of models is trained. The first
model is trained using the first fold as the test set, and the remaining folds are
used as the training set. This is repeated for each of these 5 splits of the data
and the estimation of accuracy is averaged over all 5 trials to get the total
effectiveness of our model.
112
MODEL EVALUATION: CLASSIFICATION METRICS
Classification Accuracy
Classification accuracy is similar to the term Accuracy. It is
the ratio of the correct predictions to the total number of
Predictions made by the model from the given data.
113
MODEL EVALUATION: CLASSIFICATION METRICS
Confusion Matrix
It is a NxN matrix structure used for
evaluating the performance of a classification
model, where N is the number of classes that
are predicted.
It is operated on a test dataset in which the
true values are known.
The matrix lets us know about the number of
incorrect and correct predictions made by a
classifier and is used to find correctness of the
model.
It consists of values like True Positive, False
Positive, True Negative, and False Negative,
which helps in measuring Accuracy, Precision,
Recall, Specificity, Sensitivity, and AUC curve.
114
MODEL EVALUATION: CLASSIFICATION METRICS
Confusion matrix:
There are 4 important terms in confusion matrix:
True Positives (TP): The cases in which our predictions are TRUE, and the actual output was also
TRUE.
True Negatives (TN): The cases in which our predictions are FALSE, and the actual output was
also FALSE.
False Positives (FP): The cases in which our predictions are TRUE, and the actual output was
FALSE.
False Negative (FN): The cases in which our predictions are FALSE, and the actual output was
TRUE.
Helps to calculate accuracy, precision, recall and F-measure
The accuracy can be calculated by using the mean of True Positive and True Negative values of the
total sample values. It tells us about the total number of predictions made by the model that were
correct.
Precision is the ratio of Number of True Positives in the sample to the total Positive
samples predicted by the classifier. It tells us about the positive samples that were correctly
identified by the model.
Recall is the ratio of Number of True Positives in the sample to the sum of True Positive and False
Negative samples in the data.
F1 Score
It is also called as F-Measure. It is a best measure of the Test accuracy of the developed model. It
makes our task easy by eliminating the need to calculate Precision and Recall separately to know
about the model performance. F1 Score is the Harmonic mean of Recall and Precision. Higher the
F1 Score, better the performance of the model. Without calculating Precision and Recall separately,
115
we can calculate the model performance using F1 score as it is precise and robust.
REGRESSION METRICS
It helps to predict the state of outcome at any time with the help of
independent variables that are correlated.
These metrics are designed in order to predict if the data is underfitted or
overfitted for the better usage of the model.
They are:-
Mean Absolute Error (MAE)
Mean Squared Error (MSE)
Root Mean Squared Error (RMSE)
Mean Absolute Error is the average of the difference of the original values and
the predicted values. It gives us an idea of how far the predictions are from the
actual output. It doesn’t give clarity on whether the data is under fitted or over
fitted. It is calculated as follows:
The mean squared error is similar to the mean absolute error. It is computed by
taking the average of the square of the difference between original
and predicted values. With the help of squaring, large errors can be converted
to small errors and large errors can be dealt with. It is computed as follows.
The root mean squared error is the root of the mean of the square of difference
of the predicted and actual values of the given data. It is the most popular
metric evolution technique used in regression problems. It follows a normal116
distribution and is based on the assumption that errors are unbiased. It is
computed using the below formulae.
6
APPLICATIONS & TRENDS IN
MACHINE LEARNING
117
READING ASSIGNMENT
END
118