Professional Documents
Culture Documents
ML Notes
ML Notes
Definition:
Machine learning (ML) is a branch of artificial intelligence (AI) and computer science that focuses on the using data
and algorithms to enable AI to imitate the way that humans learn, gradually improving its accuracy.
Example/Application
Image Recognition: It is used to identify objects, persons, places, digital images, etc.Facebook provides us a feature
of auto friend tagging suggestion. Whenever we upload a photo with our Facebook friends, then we automatically
get a tagging suggestion with name, and the technology behind this is machine learning's face detection and
recognition algorithm.
classification of Machine Learning
At a broad level, machine learning can be classified into three types:
SuperVised Learning
Unsupervised learning
Reinforcement learning
1. Supervised learning:
Supervised learning is the types of machine learning in which machines are trained using well "labelled"
training data, and on basis of that data, machines predict the output.
Supervised learning is a process of providing input data as well as correct output data to the machine
learning model. The aim of a supervised learning algorithm is to find a mapping function to map the input
variable(x) with the output variable(y).
Types of supervised Machine learning Algorithms:
Supervised learning can be further divided into two types of problems:
1. Regression
Regression algorithms are used if there is a relationship between the input variable and the output variable. It is
used for the prediction of continuous variables, such as Weather forecasting, Market Trends, etc. Below are some
popular Regression algorithms which come under supervised learning:
Linear Regression
Regression Trees
Non-Linear Regression
Bayesian Linear Regression
Polynomial Regression
2. Classification
Classification algorithms are used when the output variable is categorical, which means there are two classes such
as Yes-No, Male-Female, True-false, etc.
Random Forest, Decision Trees, Logistic Regression, Support vector Machines
Unsupervised Learning
Unsupervised learning is a type of machine learning in which models are trained using unlabeled dataset and are
allowed to act on that data without any supervision.
Note:
Unsupervised learning cannot be directly applied to a regression or classification problem because unlike
supervised learning, we have the input data but no corresponding output data. The goal of unsupervised
learning is to find the underlying structure of dataset, group that data according to similarities, and represent
that dataset in a compressed format.
AI VS ML S DL:
AI simulates human intelligence to perform tasks and make decisions.
ML is a subset of AI that uses algorithms to learn patterns from data.
DL is a subset of ML that employs artificial neural networks for complex tasks.
1. Regression
Regression analysis is a statistical method to model the relationship between a dependent
(target) and independent (predictor) variables with one or more independent variables
It predicts continuous/real values such as temperature, age, salary, price, etc.
Dependent Variable: The main factor in Regression analysis which we want to predict or
understand is called the dependent variable. It is also called target variable.
Independent Variable: The factors which affect the dependent variables or which are used to
predict the values of the dependent variables are called independent variable, also called as
a predictor.
Outliers: Outlier is an observation which contains either very low value or very high value in
comparison to other observed values. An outlier may hamper the result, so it should be avoided.
Underfitting and Overfitting: If our algorithm works well with the training dataset but not well
with test dataset, then such problem is called Overfitting. And if our algorithm does not perform
well even with training dataset, then such problem is called underfitting.
Types of Regression
Linear Regression
Logistic Regression
Linear Regression in Machine Learning
It is a statistical method that is used for predictive analysis. Linear regression makes predictions for
continuous/real or numeric variables such as sales, salary, age, product price, etc.
Linear regression algorithm shows a linear relationship between a dependent (y) and one or more
independent (y) variables, hence called as linear regression.
Types:
o Binary Classifier: If the classification problem has only two possible outcomes, then it is called as Binary
Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or DOG, etc.
o Multi-class Classifier: If a classification problem has more than two outcomes, then it is called as Multi-class
Classifier.
Example: Classifications of types of crops, Classification of types of music.
Learners in Classification Problems:
In the classification problems, there are two types of learners:
1. Lazy Learners: Lazy Learner firstly stores the training dataset and wait until it receives the test dataset. In
Lazy learner case, classification is done on the basis of the most related data stored in the training dataset.
It takes less time in training but more time for predictions.
Example: K-NN algorithm, Case-based reasoning
2. Eager Learners:Eager Learners develop a classification model based on a training dataset before receiving
a test dataset. Opposite to Lazy learners, Eager Learner takes more time in learning, and less time in
prediction. Example: Decision Trees, Naïve Bayes, ANN.
Types of ML Classification Algorithms:
Linear Models
o Logistic Regression
Non-linear Models
o K-Nearest Neighbours
Evaluating a Classification model:
Confusion Matrix, AUC-ROC curve
Logistic Regression in Machine Learning
Logistic regression predicts the output of a categorical dependent variable. Therefore the outcome must be
a categorical or discrete value. It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the
exact value as 0 and 1, it gives the probabilistic values which lie between 0 and 1.
o Logistic Regression is much similar to the Linear Regression except that how they are used. Linear
Regression is used for solving Regression problems, whereas Logistic regression is used for solving the
classification problems.
Logistic Function (Sigmoid Function):
o The value of the logistic regression must be between 0 and 1, which cannot go beyond this limit, so it forms
a curve like the "S" form. The S-form curve is called the Sigmoid function or the logistic function.
Assumptions for Logistic Regression:
o The dependent variable must be categorical in nature.
o The independent variable should not have multi-collinearity.
Logistic Regression Equation:
o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised Learning
technique.
o K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This
means when new data appears then it can be easily classified into a well suite category by using K- NN
algorithm.
o It is also called a lazy learner algorithm because it does not learn from the training set immediately instead
it stores the dataset and at the time of classification, it performs an action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies that
data into a category that is much similar to the new data.
Why do we need a K-NN Algorithm?
Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1, so this data
point will lie in which of these categories. To solve this type of problem, we need a K-NN algorithm. With the help of
K-NN, we can easily identify the category or class of a particular dataset. Consider the below diagram: