Unit 1ML

Machine
Learning
unit-1
Contents
Machine Learning Fundamentals –Types of Machine Learning - Supervised, Unsupervised,
Reinforcement- The Machine Learning process.
Terminologies in ML- Testing ML algorithms: Over fitting, Training, Testing and Validation Sets-
Confusion matrix -Accuracy metrics- ROC Curve.
Basic Statistics: Averages, Variance and Covariance, The Gaussian- The Bias-Variance trade off-
Applications of Machine Learning.
MACHINE LEARNING - Machine learning (ML) is a type of artificial intelligence (AI) that
allows software applications to become more accurate at predicting outcomes without being
explicitly programmed to do so.
• Machine learning algorithms use historical data as input to predict new output values.
TYPES OF MACHINE LEARNING
1.Unsupervised learning-Unsupervised learning is a type of machine learning in
which models are trained using unlabeled dataset and are allowed to act on that
data without any supervision.
The goal of unsupervised learning is to find the underlying structure of dataset,
group that data according to similarities, and represent that dataset in a
compressed format.
Example-Suppose the unsupervised learning algorithm is given an input dataset containing
images of different types of cats and dogs. The algorithm is never trained upon the given
dataset, which means it does not have any idea about the features of the dataset. The task of
the unsupervised learning algorithm is to identify the image features on their own.
Unsupervised learning algorithm will perform this task by clustering the image dataset into the
groups according to similarities between images .
Working of Unsupervised Learning
2. Supervised learning-Supervised learning is the types of machine learning in which
machines are trained using well "labelled" training data, and on basis of that data,
machines predict the output. The labelled data means some input data is already
tagged with the correct output.
The aim of a supervised learning algorithm is to find a mapping function to map the input
variable(x) with the output variable(y).
In the real-world, supervised learning can be used for Risk Assessment, Image
classification, Fraud Detection, spam filtering, etc.
How Supervised Learning Works?
3.Reinforcement Learning-Reinforcement learning is an area of machine learning
concerned with how intelligent agents ought to take actions in an environment in
order to maximize the notion of cumulative reward.
RL can be used in large environments in the following situations:
1. A model of the environment is known, but an analytic solution is not available;
2. Only a simulation model of the environment is given.
3. The only way to collect information about the environment is to interact with it.
Terminologies in ML
Overfitting is a concept in data science,
which occurs when a statistical model fits
exactly against its training data. When this
happens, the algorithm unfortunately cannot
perform accurately against unseen data,
defeating its purpose.
Underfitting means that your model makes

accurate, but initially incorrect predictions. In
this case, train error is large and
test error is large too.
Confusion Matrix in Machine Learning
The confusion matrix is a matrix used to determine the performance of the classification models for a given
set of test data. It can only be determined if the true values for test data are known. The matrix itself can be
easily understood, but the related terminologies may be confusing.
Some features of Confusion matrix are given below:
• For the 2 prediction classes of classifiers, the matrix is of 2*2 table, for 3 classes, it is 3*3 table, and so
on.
• The matrix is divided into two dimensions, that are predicted values and actual values along with the
total number of predictions.
• Predicted values are those values, which are predicted by the model, and actual values are the true
values for the given observations. The above table has the following cases:
• It looks like the below table: • True Negative: Model has given
prediction No, and the real or actual
value was also No.
• True Positive: The model has predicted
yes, and the actual value was also true.
• False Negative: The model has
predicted no, but the actual value was
Yes, it is also called as Type-II error.
• False Positive: The model has
predicted Yes, but the actual value was
No. It is also called a Type-I error.
Calculations using Confusion Matrix
Classification Accuracy: It is one of the important parameters to determine the accuracy of the classification
problems. It defines how often the model predicts the correct output. It can be calculated as the ratio of the
number of correct predictions made by the classifier to all number of predictions made by the classifiers. The
formula is given below:
Error rate: It defines how often the model gives the wrong predictions. The value of error rate can be
calculated as the number of incorrect predictions to all number of the predictions made by the classifier. The
formula is given below:
Precision: It can be defined as the number of correct outputs provided by the model or out of all positive
classes that have predicted correctly by the model, how many of them were actually true. It can be calculated
using the below formula:
Recall: It is defined as the out of total positive classes, how our model predicted correctly. The recall must be
as high as possible.
F-measure: If two models have low precision and high recall or vice versa, it is difficult to compare these
models. So, for this purpose, we can use F-score. This score helps us to evaluate the recall and precision at
the same time. The F-score is maximum if the recall is equal to the precision. It can be calculated using the
below formula:
Bias Variance Tradeoff
1. If our model is too simple and has very few parameters then it may have high bias
and low variance. On the other hand if our model has large number of parameters
then it’s going to have high variance and low bias. So we need to find the right/good
balance without overfitting and underfitting the data.
2. This tradeoff in complexity is why there is a tradeoff between bias and variance. An
algorithm can’t be more complex and less complex at the same time.
3.An optimal balance of bias and
variance would never overfit or
underfit the model.
4. Therefore understanding bias
and variance is critical for
understanding the behavior of
prediction models.
Applications of Machine learning
Machine learning is a buzzword for today's technology, and it is growing very rapidly day by day. We are using
machine learning in our daily life even without knowing it such as Google Maps, Google assistant, Alexa, etc.
Below are some most trending real-world applications of Machine Learning:

Unit 1ML

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 1ML

Uploaded by

Copyright:

Available Formats

Machine

Underfitting means that your model makes

You might also like