Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Linear regression algorithm shows a linear relationship between a

dependent (y) and one or more independent (y) variables, hence called
as linear regression. Since linear regression shows the linear
relationship, which means it nds how the value of the dependent
variable is changing according to the value of the independent variable.
>The linear regression model provides a sloped straight line
representing the relationship between the variables. Consider the below
image:
Linear Regression in Machine Learning
Mathematically, we can represent a linear regression as:
y= a0+a1x+ ε
Here,

Y= Dependent Variable (Target Variable)

X= Independent Variable (predictor Variable)

a0= intercept of the line (Gives an additional degree of freedom) a1 =
Linear regression coe cient (scale factor to each input value).

ε = random error seyra narsimha reddy

Types of Linear Regression
Linear regression can be further divided into two types of the algorithm:

Simple Linear Regression:
If a single independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression algorithm
is called Simple Linear Regression.

Mul?ple Linear regression:
If more than one independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression algorithm
is called Multiple Linear Regression.

Linear Regression Line
A linear line showing the relationship between the dependent and
independent variables is called a regression line. A regression line can
show two types of relationship:

Explain k-means nearest neighbour algorithm?


>>K-Nearest Neighbour is one of the simplest Machine Learning algorithms
based on Supervised Learning technique.
>>K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar to
the available categories.
>>K-NN algorithm stores all the available data and classi es a new data point
based on the similarity. This means when new data appears then it can be
easily classi ed into a well suite category by using K- NN algorithm.
>>K-NN algorithm can be used for Regression as well as for Classi ca on but
mostly it is used for the Classi ca on problems.
>>K-NN is a non-parametric algorithm, which means it does not make any
assump on on underlying data.
>>>It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the me of
classi ca on, it performs an ac on on the dataset.
>>>>KNN algorithm at the training phase just stores the dataset and when it
gets new data, then it classi es that data into a category that is much similar to
the new data.

How does Random Forest algorithm work?


Random Forest works in two-phase rst is to create the random forest
by combining N decision tree, and second is to make predictions for
each tree created in the rst phase.

The Working process can be explained in the below steps and diagram:

Step-1: Select random K data points from the training set.

Step-2:  Build the decision trees associated with the selected data
points (Subsets).

Step-3: Choose the number N for decision trees that you want to build.

Step-4: Repeat Step 1 & 2.

Step-5: For new data points, nd the predictions of each decision tree,


and assign the new data points to the category that wins the majority
votes.

Advantages of Random Forest


Random Forest is capable of performing both Classi cation and
Regression tasks.

It is capable of handling large datasets with high dimensionality.
It enhances the accuracy of the model and prevents the over tting
issue.

Disadvantages of Random Forest

Although random forest can be used for both classi cation and
regression tasks, it is not more suitable for Regression tasks.

fi
ti
ti
fi
 

 
ffi


fi
fi



fi
ti

ti
fi
fi

fi
 

fi
fi
fi

fi
ti
ti
fi
Linear Discriminant Analysis (LDA) in Machine Learning
Linear Discriminant Analysis (LDA) is one of the commonly used
dimensionality reduction techniques in machine learning to solve more
than two-class classi cation problems. It is also known as Normal
Discriminant Analysis (NDA) or Discriminant Function Analysis (DFA).

This can be used to project the features of higher dimensional space


into lower- dimensional space in order to reduce resources and
dimensional costs. In this topic, "Linear Discriminant Analysis (LDA) in
machine learning”, we will discuss the LDA algorithm for classi cation
predictive modeling problems, limitation of logistic regression,
representation of linear Discriminant analysis model, how to make a
prediction using LDA, how to prepare data for LDA, extensions to LDA
and much more. So, let's start with a quick introduction to Linear
Discriminant Analysis (LDA) in machine learning.

What is Dimensionality Reduction?


The number of input features, variables, or columns present in a given
dataset is known as dimensionality, and the process to reduce these
features is called dimensionality reduction.

A dataset contains a huge number of input features in various cases,


which makes the predictive modeling task more complicated. Because
it is very di cult to visualize or make predictions for the training dataset
with a high number of features, for such cases, dimensionality reduction
techniques are required to use.

Measure for the distance between two clusters


As we have seen, the closest distance between the two clusters is
crucial for the hierarchical clustering. There are various ways to
calculate the distance between two clusters, and these ways decide the
rule for clustering. These measures are called Linkage methods. Some
of the popular linkage methods are given below:

Single Linkage: It is the Shortest Distance between the closest points of


the clusters. Consider the below image:

Complete Linkage: It is the farthest distance between the two points of


two di erent clusters. It is one of the popular linkage methods as it
forms tighter clusters than single-linkage.

Average Linkage: It is the linkage method in which the distance between


each pair of datasets is added up and then divided by the total number
of datasets to calculate the average distance between two clusters. It is
also one of the most popular linkage methods.

Centroid Linkage: It is the linkage method in which the distance


between the centroid of the clusters is calculated. Consider the below
image:

Bagging Boosting

Various training data subsets are randomly drawn with replacement from the
whole training dataset. Each new subset contains the components that
were misclassi ed by previous models.

Bagging attempts to tackle the over- tting issue. Boosting tries to reduce
bias.

Every model receives an equal weight. Models are weighted by their


performance.

Objective to decrease variance, not bias. Objective to decrease


bias, not variance.

It is the easiest way of connecting predictions that belong to the same type.
It is a way of connecting predictions that belong to the di erent types.

Every model is constructed independently. New models are


a ected by the performance of the previously developed model.
ff
ff
ffi
fi
fi
fi
ff
fi

You might also like