Professional Documents
Culture Documents
MLT Unit-1
MLT Unit-1
Machine Learning-
Machine = A Computer/Program/Algorithm/Device
Learning = Getting knowledge/Understanding
Machine Learning = Computer Learning/Algorithm Learning
Definitions:
• “Machine Learning is the process in which a computer program gets knowledge to do a task by itself without
external programs.”
• “Machine learning is a subset of Artificial Intelligence (AI) in which computer software applications become
more accurate without (direct) explicitly programs.”
• “Machine learning enables a machine to automatically learn from data, improve performance from experience
and predict things without explicitly programmed.” —Arthur Samuel
• A computer program is said to “learn” from experience (E) w.r.t. some class of tasks (T) and performance
measure (P), if its performance (P) at tasks (T) as measured by (P), improves with experience (E). —Tom
Mitchell
• In the field of computer science, the machine learning is defined as a technique in which computer programs
automatically improve their performance through past experiences.
In simple words, when we fed the training data to a machine learning algorithm, then this algorithm is
improved/tuned to the training data. Now when test data (similar to training data) is applied then this improved
algorithm provides better results. In other words, we can say that machine learning algorithm learns from
training data. Then applies the learned knowledge on testing data. Thus machine becomes more intelligent.
These improved algorithms are called learned model. These learned models are used in real-life problem
solutions such a business, e-commerce websites, social media platforms, stock market predictions robotic car
driving etc.
Relation between Artificial Intelligence (AI) Machine Learning (ML) and Deep Learning (DL)
Difference Between Machine Learning and Deep Learning :-
NOTE-
“Training data set is used to build a machine learning model.
Test data set is used to evaluate (test) the ML model.”
Difference between Artificial Intelligence (AI) and Machine Learning (ML):-
3. Trigonometry:
tan h (used in activation function in ANN).
4. Probability and Statistics: Bays theorem, Basic probability definition, Conditional probability
mean, Median, Mode, Standard deviation, Outliers, Decision trees, Histogram.
5. Calculus (Optional for advanced topics): Concept of a derivative, Maxima and minima of a
function, Gradient or slope, Partial derivative, Chain rule (used in backpropagation rule in
ANN).
6. Boolean Algebra:
AND logic (Conjunction)
OR logic (Disjunction)
NOT logic (Negation)
9. Information Theory (Entropy, Information Gain): The entropy and information gain are
explained in chapter five in detail.
1. Data quality :
a. It is essential to have good quality data to produce quality ML algorithms and models.
b. To get high-quality data, we must implement data evaluation, integration, exploration, and governance
techniques prior to developing ML models.
c. Accuracy of ML is driven by the quality of the data.
2. Transparency :
a. It is difficult to make definitive statements on how well a model is going to generalize in new environments.
3. Manpower :
a. Manpower means having data and being able to use it. This does not introduce bias into the model.
b. There should be enough skill sets in the organization for software development and data collection.
4. Other :
a. The most common issue with ML is people using it where it does not belong.
b. Every time there is some new innovation in ML, we see overzealous engineers trying to use it where it’s not
really necessary.
c. This used to happen a lot with deep learning and neural networks.
d. Traceability and reproduction of results are two main issues.
1. Image recognition :
a. Image recognition is the process of identifying and detecting an object or a feature in a digital image or video.
b. This is used in many applications like systems for factory automation, toll booth monitoring, and security
surveillance.
2. Speech recognition :
a. Speech Recognition (SR) is the translation of spoken words into text.
b. It is also known as Automatic Speech Recognition (ASR), computer speech recognition, or Speech To Text
(STT).
c. In speech recognition, a software application recognizes spoken words.
3. Medical diagnosis :
a. ML provides methods, techniques, and tools that can help in solving diagnostic and prognostic problems in a
variety of medical domains.
b. It is being used for the analysis of the importance of clinical parameters and their combinations for prognosis.
4. Statistical arbitrage :
a. In finance, statistical arbitrage refers to automated trading strategies that are typical of a short-term and
involve a large number of securities.
b. In such strategies, the user tries to implement a trading algorithm for a set of securities on the basis of
quantities such as historical correlations and general economic variables.
5. Learning associations :
Learning association is the process for discovering relations between variables in large data base.
6. Extraction :
a. Information Extraction (IE) is another application of machine learning.
b. It is the process of extracting structured information from unstructured data.
1. Supervised Learning:-
In supervised learning, the models are trained using labelled data. The model needs to find the mapping
function (f) between input data(x) and output data(y) i.e., y = f(x) …(1.1) Supervised learning needs
supervision to train the model. It is similar to learning of a student in the presence of a teacher in the classroom.
Example: Let us consider a basket which consists of some fruits i.e., Apple, Banana, Mango, Grapes etc. The
task of ML model is to identify the fruits and classifiy them accordingly. To identify type of fruits in supervisor
learning, we provide shape, size, colour and taste of each fruit to the model. This is called “model training” or
“training of model.” Once, training is completed, we test the model by giving the new set of fruits to
identify.Then model will identify the fruits.
OPERATION OF SUPERVISED LEARNING
In supervised learning, models are trained using labelled dataset. The model learns
about each type of data set. Once the training step is completed, then model is tested to
predict the correct output.
Step 1. Training Step
Let us consider a dataset of different types of object shapes which includes triangle,
square and polygon etc. Now our first step is to train the model for each object shape.
if shape has three equal sides, then it is labelled as a “Triangle”.
if shape has four equal sides, then it is labelled as a “Square”.
if the shape has six equal sides, then it is labelled as a “Hexagon”.
Supervised learning can be divided into following two types as classification and
regression.
CLASSIFICATION (DISCRETE SENSE)
In classification problems, the output data is in the form of categories (categorical data).
Therefore classification algorithms are used to output data in the form of two classes e.g.,
Yes-No, Male-Female, True-False etc.
The task of classification algorithm is to map input (X) with discrete output (Y)
variable.
2. Unsupervised Learning :- In supervised learning, both input data (x) and output data (y) (previous) are
provided to train the model. But in unsupervised learning, only input data is provided to the model. In
unsupervised learning, the models are trained using unlabelled data. Here, only input data(x) is provided to train
the model. The goal of unsupervised learning is to find the hidden patterns in input data. The unsupervised
learning methods are suitable when the output variables (i.e., the lables) are not provided.
For example, the clustering task of customer segmentation, product recommendation, friends suggestions on
social media platforms
TYPES OF CLUSTERING
1. Centroid-based (partitioning) clustering
2. Density based clustering (model base)
3. Hierarchical clustering (connectivity based)
4. K-means clustering algorithm
Clustering is the task of partitioning the data set into groups, called as clusters.
The goal is to split the data in such a way that points within a single cluster are very
similar to each other.
3.5.1 Centroid Based Clustering (Partitioning)
In this type of clustering, the data is divided into non-hierarchial groups. It is also
called partition clustering. The most common example of centroid based clustering is
the K-mean clustering algorithm.
In K-means clustering, the data set is divided into a set of K-groups. Centroid
based algorithms are efficient but sensitive to initial condition and outliers.
The confusion matrix is a matrix used to determine the performance of the classification models for a given
set of test data. It can only be determined if the true values for test data are known. The matrix itself can be
easily understood, but the related terminologies may be confusing. Since it shows the errors in the model
performance in the form of a matrix, hence also known as an error matrix. Some features of Confusion
matrix are given below:
o For the 2 prediction classes of classifiers, the matrix is of 2*2 table, for 3 classes, it is 3*3 table,
and so on.
o The matrix is divided into two dimensions, that are predicted values and actual values along with
the total number of predictions.
o Predicted values are those values, which are predicted by the model, and actual values are the true
values for the given observations.
o It looks like the below table:
o True Negative: Model has given prediction No, and the real or actual value was also No.
o True Positive: The model has predicted yes, and the actual value was also true.
o False Negative: The model has predicted no, but the actual value was Yes, it is also called
as Type-II error.
o False Positive: The model has predicted Yes, but the actual value was No. It is also called a Type-
I error.
Suppose we are trying to create a model that can predict the result for the disease that is either a person has
that disease or not. So, the confusion matrix for this is given as:
o Classification Accuracy: It is one of the important parameters to determine the accuracy of the
classification problems. It defines how often the model predicts the correct output. It can be
calculated as the ratio of the number of correct predictions made by the classifier to all number of
predictions made by the classifiers. The formula is given below:
o Misclassification rate: It is also termed as Error rate, and it defines how often the model gives the
wrong predictions. The value of error rate can be calculated as the number of incorrect predictions
to all number of the predictions made by the classifier. The formula is given below:
o Precision: It can be defined as the number of correct outputs provided by the model or out of all
positive classes that have predicted correctly by the model, how many of them were actually true. It
can be calculated using the below formula:
o Recall: It is defined as the out of total positive classes, how our model predicted correctly. The
recall must be as high as possible.
o F-measure: If two models have low precision and high recall or vice versa, it is difficult to
compare these models. So, for this purpose, we can use F-score. This score helps us to evaluate the
recall and precision at the same time. The F-score is maximum if the recall is equal to the precision.
It can be calculated using the below formula:
o Null Error rate: It defines how often our model would be incorrect if it always predicted the
majority class. As per the accuracy paradox, it is said that "the best classifier has a higher error rate
than the null error rate."
o ROC Curve: The ROC is a graph displaying a classifier's performance for all possible thresholds.
The graph is plotted between the true positive rate (on the Y-axis) and the false Positive rate (on
the x-axis).
o 1834: In 1834, Charles Babbage, the father of the computer, conceived a device that could be
programmed with punch cards. However, the machine was never built, but all modern computers
rely on its logical structure.
o 1936: In 1936, Alan Turing gave a theory that how a machine can determine and execute a set of
instructions.
o 1940: In 1940, the first manually operated computer, "ENIAC" was invented, which was the first
electronic general-purpose computer. After that stored program computer such as EDSAC in 1949
and EDVAC in 1951 were invented.
o 1943: In 1943, a human neural network was modeled with an electrical circuit. In 1950, the
scientists started applying their idea to work and analyzed how human neurons might work.
o 1950: In 1950, Alan Turing published a seminal paper, "Computer Machinery and Intelligence,"
on the topic of artificial intelligence. In his paper, he asked, "Can machines think?"
o 1952: Arthur Samuel, who was the pioneer of machine learning, created a program that helped an
IBM computer to play a checkers game. It performed better more it played.
o 1959: In 1959, the term "Machine Learning" was first coined by Arthur Samuel.
o The duration of 1974 to 1980 was the tough time for AI and ML researchers, and this duration was
called as AI winter.
o In this duration, failure of machine translation occurred, and people had reduced their interest from
AI, which led to reduced funding by the government to the researches.
o 1959: In 1959, the first neural network was applied to a real-world problem to remove echoes over
phone lines using an adaptive filter.
o 1985: In 1985, Terry Sejnowski and Charles Rosenberg invented a neural network NETtalk, which
was able to teach itself how to correctly pronounce 20,000 words in one week.
o 1997: The IBM's Deep blue intelligent computer won the chess game against the chess expert
Garry Kasparov, and it became the first computer which had beaten a human chess expert.
o 2006: In the year 2006, computer scientist Geoffrey Hinton has given a new name to neural net
research as "deep learning," and nowadays, it has become one of the most trending technologies.
o 2012: In 2012, Google created a deep neural network which learned to recognize the image of
humans and cats in YouTube videos.
o 2014: In 2014, the Chabot "Eugen Goostman" cleared the Turing Test. It was the first Chabot who
convinced the 33% of human judges that it was not a machine.
o 2014: DeepFace was a deep neural network created by Facebook, and they claimed that it could
recognize a person with the same precision as a human can do.
o 2016: AlphaGo beat the world's number second player Lee sedol at Go game. In 2017 it beat the
number one player of this game Ke Jie.
o 2017: In 2017, the Alphabet's Jigsaw team built an intelligent system that was able to learn
the online trolling. It used to read millions of comments of different websites to learn to stop
online trolling.
Now machine learning has got a great advancement in its research, and it is present everywhere around us,
such as self-driving cars, Amazon Alexa, Catboats, recommender system, and many more. It
includes Supervised, unsupervised, and reinforcement learning with clustering, classification, decision
tree, SVM algorithms, etc.
Modern machine learning models can be used for making various predictions, including weather
prediction, disease prediction, stock market analysis, etc.