Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Machine Learning Notes

Definition:
Machine learning (ML) is a branch of artificial intelligence (AI) and computer science that focuses on the using data
and algorithms to enable AI to imitate the way that humans learn, gradually improving its accuracy.
Example/Application
Image Recognition: It is used to identify objects, persons, places, digital images, etc.Facebook provides us a feature
of auto friend tagging suggestion. Whenever we upload a photo with our Facebook friends, then we automatically
get a tagging suggestion with name, and the technology behind this is machine learning's face detection and
recognition algorithm.
classification of Machine Learning
At a broad level, machine learning can be classified into three types:

 SuperVised Learning
 Unsupervised learning
 Reinforcement learning
1. Supervised learning:
 Supervised learning is the types of machine learning in which machines are trained using well "labelled"
training data, and on basis of that data, machines predict the output.
 Supervised learning is a process of providing input data as well as correct output data to the machine
learning model. The aim of a supervised learning algorithm is to find a mapping function to map the input
variable(x) with the output variable(y).
Types of supervised Machine learning Algorithms:
Supervised learning can be further divided into two types of problems:

1. Regression
Regression algorithms are used if there is a relationship between the input variable and the output variable. It is
used for the prediction of continuous variables, such as Weather forecasting, Market Trends, etc. Below are some
popular Regression algorithms which come under supervised learning:
 Linear Regression
 Regression Trees
 Non-Linear Regression
 Bayesian Linear Regression
 Polynomial Regression
2. Classification
Classification algorithms are used when the output variable is categorical, which means there are two classes such
as Yes-No, Male-Female, True-false, etc.
 Random Forest, Decision Trees, Logistic Regression, Support vector Machines
Unsupervised Learning
Unsupervised learning is a type of machine learning in which models are trained using unlabeled dataset and are
allowed to act on that data without any supervision.
Note:
Unsupervised learning cannot be directly applied to a regression or classification problem because unlike
supervised learning, we have the input data but no corresponding output data. The goal of unsupervised
learning is to find the underlying structure of dataset, group that data according to similarities, and represent
that dataset in a compressed format.

Unsupervised Learning algorithms:


o K-means clustering
o KNN (k-nearest neighbors)
What is Reinforcement Learning?
o Reinforcement learning is a type of machine learning method where an intelligent agent (computer
program) interacts with the environment and learns to act within that.
o RL solves a specific type of problem where decision making is sequential, and the goal is long-term, such
as game-playing, robotics, etc
o Since there is no labeled data, so the agent is bound to learn by its experience only.
AI VS ML
AI is a bigger concept to create intelligent machines that can simulate human thinking capability and behavior,
whereas, machine learning is an application or subset of AI that allows machines to learn from data without being
programmed explicitly.

AI VS ML S DL:
AI simulates human intelligence to perform tasks and make decisions.
ML is a subset of AI that uses algorithms to learn patterns from data.
DL is a subset of ML that employs artificial neural networks for complex tasks.

1. Regression
 Regression analysis is a statistical method to model the relationship between a dependent
(target) and independent (predictor) variables with one or more independent variables
 It predicts continuous/real values such as temperature, age, salary, price, etc.

 Dependent Variable: The main factor in Regression analysis which we want to predict or
understand is called the dependent variable. It is also called target variable.
 Independent Variable: The factors which affect the dependent variables or which are used to
predict the values of the dependent variables are called independent variable, also called as
a predictor.
 Outliers: Outlier is an observation which contains either very low value or very high value in
comparison to other observed values. An outlier may hamper the result, so it should be avoided.
 Underfitting and Overfitting: If our algorithm works well with the training dataset but not well
with test dataset, then such problem is called Overfitting. And if our algorithm does not perform
well even with training dataset, then such problem is called underfitting.

Types of Regression
 Linear Regression
 Logistic Regression
Linear Regression in Machine Learning
It is a statistical method that is used for predictive analysis. Linear regression makes predictions for
continuous/real or numeric variables such as sales, salary, age, product price, etc.
Linear regression algorithm shows a linear relationship between a dependent (y) and one or more
independent (y) variables, hence called as linear regression.

Equation : y= a0+a1x+ ε or y= 𝑦 = ∅0 + ∅1𝑥1


Types of Linear Regression
Linear regression can be further divided into two types of the algorithm:
o Simple Linear Regression:
If a single independent variable is used to predict the value of a numerical dependent variable, then such
a Linear Regression algorithm is called Simple Linear Regression.
o Multiple Linear regression:
If more than one independent variable is used to predict the value of a numerical dependent variable,
then such a Linear Regression algorithm is called Multiple Linear Regression.
Linear Regression Line
Positive Linear Relationship and Negative
Multiple Linear Regression
Multiple Linear Regression is one of the important regression algorithms which models the linear
relationship between a single dependent continuous variable and more than one independent variable.
Example:
Prediction of CO2 emission based on engine size and number of cylinders in a car.

Evaluating a Classification model:


 Mean/Median of prediction
 Standard Deviation of prediction
 Mean Squared Error (MSE)
Classification Algorithm in Machine Learning
As we know, the Supervised Machine Learning algorithm can be broadly classified into Regression and Classification
Algorithms. In Regression algorithms, we have predicted the output for continuous values, but to predict the
categorical values, we need Classification algorithms.
Unlike regression, the output variable of Classification is a category, not a value, such as "Green or Blue", "fruit or
animal", etc.
The best example of an ML classification algorithm is Email Spam Detector.

Types:
o Binary Classifier: If the classification problem has only two possible outcomes, then it is called as Binary
Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc.
o Multi-class Classifier: If a classification problem has more than two outcomes, then it is called as Multi-class
Classifier.
Example: Classifications of types of crops, Classification of types of music.
Learners in Classification Problems:
In the classification problems, there are two types of learners:
1. Lazy Learners: Lazy Learner firstly stores the training dataset and wait until it receives the test dataset. In
Lazy learner case, classification is done on the basis of the most related data stored in the training dataset.
It takes less time in training but more time for predictions.
Example: K-NN algorithm, Case-based reasoning
2. Eager Learners:Eager Learners develop a classification model based on a training dataset before receiving
a test dataset. Opposite to Lazy learners, Eager Learner takes more time in learning, and less time in
prediction. Example: Decision Trees, Naïve Bayes, ANN.
Types of ML Classification Algorithms:
Linear Models
o Logistic Regression
Non-linear Models
o K-Nearest Neighbours
Evaluating a Classification model:
Confusion Matrix, AUC-ROC curve
Logistic Regression in Machine Learning
Logistic regression predicts the output of a categorical dependent variable. Therefore the outcome must be
a categorical or discrete value. It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the
exact value as 0 and 1, it gives the probabilistic values which lie between 0 and 1.

o Logistic Regression is much similar to the Linear Regression except that how they are used. Linear
Regression is used for solving Regression problems, whereas Logistic regression is used for solving the
classification problems.
Logistic Function (Sigmoid Function):
o The value of the logistic regression must be between 0 and 1, which cannot go beyond this limit, so it forms
a curve like the "S" form. The S-form curve is called the Sigmoid function or the logistic function.
Assumptions for Logistic Regression:
o The dependent variable must be categorical in nature.
o The independent variable should not have multi-collinearity.
Logistic Regression Equation:

3. K-Nearest Neighbor(KNN) Algorithm for Machine Learning

o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised Learning
technique.
o K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This
means when new data appears then it can be easily classified into a well suite category by using K- NN
algorithm.
o It is also called a lazy learner algorithm because it does not learn from the training set immediately instead
it stores the dataset and at the time of classification, it performs an action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies that
data into a category that is much similar to the new data.
Why do we need a K-NN Algorithm?
Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1, so this data
point will lie in which of these categories. To solve this type of problem, we need a K-NN algorithm. With the help of
K-NN, we can easily identify the category or class of a particular dataset. Consider the below diagram:

How does K-NN work?


The K-NN working can be explained on the basis of the below algorithm:
o Step-1: Select the number K of the neighbors
o Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
o Step-4: Among these k neighbors, count the number of the data points in each category.
o Step-5: Assign the new data points to that category for which the number of the neighbor is maximum.
o Step-6: Our model is ready.

How to select the value of K in the K-NN Algorithm?


Below are some points to remember while selecting the value of K in the K-NN algorithm:
o There is no particular way to determine the best value for "K", so we need to try some values to find the
best out of them. The most preferred value for K is 5.
o A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of outliers in the model.
o Large values for K are good, but it may find some difficulties.
Advantages of KNN Algorithm:
o It is simple to implement.
o It is robust to the noisy training data
o It can be more effective if the training data is large.

You might also like