Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 93

ECO-598 - Open Elective

Artificial Intelligence
And
Machine Learning
Prof. Neelapala Anil Kumar,
Department of ECE,
ACED, Alliance University.
Syllabus

Module-3 Types of machine learning Algorithms


Supervised Learning:
Regression, Linear Regression, Polynomial Regression, Naïve Bayes,
Decision Trees, Logistic Regression ,Support Vector Machines,
Random Forest, Ridge regression, Lasso Regression, KNN.
Unsupervised Learning
Clustering-K-Means, Association Rule Learning, Dimensionality
Reduction-PCA, SVD.
Reinforcement Learning:
Markov Decision, Monte Carlo Prediction.
Introduction
Supervised learning algorithms
Unsupervised learning algorithms
Reinforcement learning algorithms

To understand different types of


supervised learning algorithms.
unsupervised learning algorithms.
Reinforcement learning algorithms
Introduction

Online learning 
sequential order, and the best predictor for the future
data is updated at each step, instead of learning on the
entire training dataset at once to get the best
predictor.
Working of Supervised Learning. ( https://www.javatpoint.com/machine-learning
Types of
supervised
Machine learning Algorithms
Regression
Relationship between the input variable and the output variable, prediction
of continuous variables, popular Regression algorithms which come under
supervised learning:
oLinear Regression
oRegression Trees

oNon-Linear Regression
oBayesian Linear Regression
oPolynomial Regression
Classification
Classification algorithms are used when the output variable is categorical,
which means there are two classes such as Yes-No, Male-Female, True-
false, etc.
Spam Filtering,
Random Forest
Decision Trees
Classifications of Supervised Learning. ( https://www.javatpoint.com/machine-learning
Logistic Regression

) Support vector Machines


Dependent Variable: Regression
analysis which we want to predict or
understand  target variable.
Independent Variable: The factors
which affect the dependent variables,
predictor.
Outliers: observation which contains
either very low value or very high value
in comparison to other observed values.
Y= aX+b, Where,
Y = dependent variables (target variables)
X= Independent variables (predictor
variables)
a and b are the linear coefficients
Positive Linear Regression Negative Linear regression
Residuals
Cost function

The average of squared error occurred


between the predicted values and actual
values.
Where,
N=Total number of observation
Yi = Actual value
(a1xi+a0) = Predicted value.
o
Analysing trends and sales estimates
o
Salary forecasting
o
Real estate prediction
o
Arriving at ETAs (Estimated time of
arrival) in traffic.
Need
Steps for Polynomial Regression

The main steps involved in Polynomial


Regression are given below:
Data Pre-processing
o

Build a Linear Regression model and fit it to


o

the dataset
Build a Polynomial Regression model and fit it
o

to the dataset
Visualize the result for Linear Regression and
o

Polynomial Regression model.


Example graph for Polynomial Regression. ( https://www.javatpoint.com/machine-learning Predicting the output.
o
Understanding
• Bayes Theorem
• Bayes’ Theorem is a simple mathematical
formula used for calculating conditional
probabilities.
• Conditional probability is a measure of the
probability of an event occurring given that
another event has (by assumption,
presumption, assertion, or evidence)
occurred.

P (A|B) is Posterior probability: Probability of


hypothesis A on the observed event B.
P (B|A) is Likelihood probability: Probability
of the evidence given that the probability of
a hypothesis is true.
Applications Implementation

 Data Pre-processing step


 It is used for Credit Scoring.
 Fitting Naive Bayes to the Training set
 It is used in medical data classification.
 Predicting the test result
 real-time predictions because Naïve
 Test accuracy of the result(Creation of
Bayes Classifier is an eager learner.
Confusion matrix)
 It is used in Text classification such
 Visualizing the test set result.
as Spam filtering and Sentiment analysis.
General structure Decision Tree Terminologies
• Root Node: Root node is from where the decision tree
starts. It represents the entire dataset, which further gets
divided into two or more homogeneous sets.
• Leaf Node: Leaf nodes are the final output node, and the
tree cannot be segregated further after getting a leaf node.
• Splitting: Splitting is the process of dividing the decision
node/root node into sub-nodes according to the given
conditions.
• Branch/Sub Tree: A tree formed by splitting the tree.
• Pruning: Pruning is the process of removing the
unwanted branches from the tree.
• Parent/Child node: The root node of the tree is called the
parent node, and other nodes are called the child nodes.

( )
https://www.javatpoint.com/machine-learning
Implementation of Decision Tree
Step-1: Begin the tree with the root node, says S, which
contains the complete dataset. o Data Pre-processing step
Step-2: Find the best attribute in the dataset o Fitting a Decision-Tree algorithm to the
using Attribute Selection Measure (ASM).
Training set
Step-3: Divide the S into subsets that contains possible
values for the best attributes.
o Predicting the test result
Step-4: Generate the decision tree node, which contains the o Test accuracy of the result
best attribute. o Visualizing the test set result.
Step-5: Recursively make new decision trees using the
subsets of the dataset created in step -3. Continue this
process until a stage is reached where you cannot further
classify the nodes and called the final node as a leaf node.
Contd.,

The decision tree contains lots of layers,


which makes it complex.
It may have an overfitting issue, which can
be resolved using the Random Forest
algorithm.

For more class labels, the computational


complexity of the decision tree may
increase
• Equation of Straight line
• Y ranges between 0 and 1
• Limiting Range.
• Final Equation.
Example of SVM
Linear
Example
Working of Random Forest Algorithm
o Step-1: Select random K data points from
the training set.
o Step-2: Build the decision trees associated
with the selected data points (Subsets).
o Step-3: Choose the number N for decision
trees that you want to build.
o Step-4: Repeat Step 1 & 2.
o Step-5: For new data points, find the
predictions of each decision tree, and
assign the new data points to the category
that wins the majority votes.
Applications

1. Banking: Banking sector mostly uses this


algorithm for the identification of loan
risk.
2. Medicine: With the help of this algorithm,
disease trends and risks of the disease can
be identified.
3. Land Use: We can identify the areas of
similar land use by this algorithm.
4. Marketing: Marketing trends can be
identified using this algorithm.
 A general linear or polynomial regression
will fail if there is high collinearity between
the independent variables, so to solve such
problems, Ridge regression can be used.
 Ridge regression is a regularization
technique, which is used to reduce the
complexity of the model. It is also called as 
L2 regularization.
It helps to solve the problems if we have
more parameters than samples
• Penalty is fine tuned constant for
shrinking the data. denoted by
Lambda, selection of lambda is
critical.
• Lasso regression algorithm is
defined as a regularization
algorithm that assists in the
elimination of irrelevant
parameters, thus helping in the
concentration of selection and
regularizes the models. 
Merits Demerits
o Always needs to determine the value of K
which may be complex some time.
o The computation cost is high because of
calculating the distance between the data
points for all the training samples.
• Step-3: Assign each data point to their closest
centroid, which will form the predefined K clusters.
• Step-4: Calculate the variance and place a new
centroid of each cluster.
• Step-5: Repeat the 3, which means reassign each
data point to the new closest centroid of each
cluster.
• Step-6: If any reassignment occurs, then go to step-
4 else go to FINISH.
Step-1: Select the number K to decide the number
• Step-7: The model is ready.
of clusters.
Step-2: Select random K points or centroids. (It
can be other from the input dataset).
Choosing the value of "K number
of clusters" in K-means Clustering

The sharp point of bend or a point of the plot


looks like an arm, then that point is considered as
the best value of K.
To find:

• Cluster K1=?
• Cluster K2=?
• With Mean m1 and m2 at each itteration
Finding cluster from
K-data set
• K1={2,3,4}
• K2={10,11,12,20,25,30}
• Now find the average of k1 and K2 to get the mean of m1 and m2
• K1= 2+3+4/3=3
• K2= 10+11+12+20+25+30/6
• =108/6=18
• New means M1= 3: M2= 18
• Rough work:
• Iniially no centroids are given assume
fro the data points.
• Centroids are 3 and 18
• So k1 and k2 can be
• 10 is the data point which has to
decides with cluster so 10-3= 7(k1)
• 18-10=8(k2) so 7 is less so it has to
placed in cluster k1.
Contd..,
K={2,3,4,10,11,11,12,20,25,30}
K1={2,3,4,10,11,12};
K2{20,25,30}
Perform Average to get New mean values
K1= 2+3+4+10+11+12/6
=42/6
=7
K2= 20+25+30/3
=75/3
=25
hence M1=7: m2= 25
Note: Mean values are same as above so K1 an K2 will be same as above hence
we can stop the iteration.
Result:

K1={2,3,4,10,11,12}
K2={20,25,30}
Final Clusters
Homework to solve
• Getting the dataset
• Representing data into a structure
represent the two-dimensional matrix of
independent variable X. Here each row
corresponds to the data items, and the column
corresponds to the Features.
Standardizing the data
features with high variance are more important
compared to the features with lower variance.
Calculating the Covariance of Z
Take the matrix Z, and will transpose it. After
transpose, we will multiply it by Z. The
output matrix will be the Covariance matrix
of Z.
Cond..,

• Calculating the Eigen Values and Eigen


Vectors
the resultant covariance matrix Z. the
covariance matrix are the directions of the
axes with high information.
Sorting the Eigen Vectors
Means from largest to smallest. And
simultaneously sort the eigenvectors
accordingly in matrix P of eigenvalues. The
resultant matrix will be named as P*.
https://www.youtube.com/watch?v=nbBvuuNVfco&list=PLMrJAkhIeNNSVjnsviglFoY2nXildDCcv&index=2
• This can be generalized as
Reinforcement
Learning
https://www.youtube.com/watch?v=LzaWrmKL1Z4
• No idea about environment
• Works along lines of policy iteration method.
1)Policy Evaluation. Estimates the value action function
2)Policy improvement.

You might also like