Professional Documents
Culture Documents
Fai Module 3 Ppt
Fai Module 3 Ppt
Tech
Module 3: Introduction to
Supervised Learning
Algorithms
Course Name: Fundamentals of Artificial Intelligence
[23CSE505]
Total Hours : 05
Table of Contents
• Relation between variables where changes in some variables may “explain” or possibly “cause”
changes in other variables.
• Explanatory variables are termed the independent variables and the variables to be explained are
termed the dependent variables.
y = wx+w0
• Example: Price of a used car
• x : car attributes
y : price
y = g (x | θ )
g ( ) model,
θ parameters
Regression model
• Regression model estimates the nature of the relationship between the independent and
dependent variables.
• Change in dependent variables that results from changes in independent variables, ie.
size of the relationship.
• Strength of the relationship.
• Statistical significance of the relationship.
• Function: a mathematical relationship enabling us to predict what values of one variable
(Y) correspond to given values of another variable (X).
• Y: is referred to as the dependent variable, the response variable or the predicted
variable.
• X: is referred to as the independent variable, the explanatory variable or the predictor
variable.
Linear Regression Algorithm
• Intro
Establishes a relationship between the Independent & Dependent Variables.
Examples of Independent & Dependent Variables:-
• x is Rainfall and y is Crop Yield
• x is Advertising Expense and y is Sales
• x is sales of goods and y is GDP
Here x is Independent Variable & Y is Dependent Variable
• How it Works
• Regression analysis is used to understand which among the Independent Variables are related to Dependent Variables.
• It attempts to model relationship between two variables by fitting a line called Linear Regression Line.
• The case of Single variable is called Simple Linear Regression whereas the case of Multiple Independent Variables, it is
called Multiple Linear Regression
Single Linear Regression Vs Multiple Linear Regression
The Linear Regression line is created using Ordinary Least Square Method.
X1
Multiple X2 Y
Predictors
X3
X4
Examples : Single Linear Regression Vs Multiple Linear Regression
Bivariate or simple regression model
(Education) x y (Income)
(Education) x1
(Sex) x2
y (Income)
(Experience) x3
(Age) x4
Linear Regression Equation
Slope/Gradient
y = mx + c Y Intercept
Sum of Squared Error
What is Error?
Why it is Important?
• Where,
x and y are two variables on the regression line.
b = Slope of the line.
a = y-intercept of the line.
x = Values of the first data set.
y = Values of the second data set.
Solved Example 1
Solved Example 1…
Logistics Regression
• Logistics Regression is used when the dependent variable is categorical.
• It is used to describe data and to explain the relationship between one dependent binary
variable and one or more nominal, ordinal, interval or ratio- level independent variables.
Logistic regression
Table 2 Age and signs of coronary heart disease (CD)
• Linear regression?
Linear Regression - Dot-plot: Data from Table 2
Y
es
Signsofcoronarydisease
N
o
0 2
0 4
0 6
0 8
0 1
00
A
GE(y
ears
)
Logistic regression
Table 3 Prevalence (%) of signs of CD according to age group
Diseased
20 - 29 5 0 0
30 - 39 6 1 17
40 - 49 7 2 29
50 - 59 7 4 57
60 - 69 5 4 80
70 - 79 2 2 100
80 - 89 1 1 100
Dot-plot: Data from Table 3
100
Diseased % 80
60
40
20
0
0 2 4 6 8
Age group
Logistic function
1.0
e x
Probability of disease
0.8 P( y x )
1 e x
0.6
0.4
0.2
0.0
x
What Are the Types of Logistic Regression?
• Binary logistic regression
• Binary logistic regression was mentioned earlier in the case of classifying an object as an animal or
not an animal—it’s an either/or solution. There are just two possible outcome answers. This concept
is typically represented as a 0 or a 1 in coding. Examples include:
• Whether or not to lend to a bank customer (outcomes are yes or no).
• Assessing cancer risk (outcomes are high or low).
• Will a team win tomorrow’s game (outcomes are yes or no).
• Multinomial logistic regression
• Multinomial logistic regression is a model where there are multiple classes that an item can be
classified as. There is a set of three or more predefined classes set up prior to running the model.
Examples include:
• Classifying texts into what language they come from.
• Predicting whether a student will go to college, trade school or into the workforce.
• Does your cat prefer wet food, dry food or human food?
What Are the Types of Logistic Regression?
• Ordinal logistic regression
• Ordinal logistic regression is also a model where there are multiple classes that an item can be
classified as; however, in this case an ordering of classes is required. Classes do not need to be
proportionate. The distance between each class can vary. Examples include:
• Ranking restaurants on a scale of 0 to 5 stars.
• Predicting the podium results of an Olympic event.
• Assessing a choice of candidates, specifically in places that institute ranked-choice voting.
Differences Between Linear and Logistic Regression
Linear Regression Logistic Regression
Linear regression is used to predict the continuous Logistic regression is used to predict the categorical
dependent variable using a given set of independent dependent variable using a given set of independent
variables. variables.
Linear regression is used for solving regression
It is used for solving classification problems.
problem.
In this we predict the value of continuous variables In this we predict values of categorical variables
In this we find best fit line. In this we find S-Curve.
Least square estimation method is used for estimation of Maximum likelihood estimation method is used for
accuracy. Estimation of accuracy.
The output must be continuous value, such as price, age, Output must be categorical value such as 0 or 1, Yes or
etc. no, etc.
It required linear relationship between dependent and
It not required linear relationship.
independent variables.
There may be collinearity between the independent There should not be collinearity between independent
variables. variables.
Naïve Bayes
Naïve Bayes
• Naïve bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem and
used for solving classification problems.
• It is mainly used in text classification that includes a high dimensional training dataset.
• Naïve Baes Classifier is one of the simple and most effective classification algorithms which helps
in building the fast machine learning models that can make quick predictions.
• It is a probabilistic classifier, which means it predicts on the basis of the probability of an object.
• Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis, and
classifying articles.
Why is it called Naïve Bayes?
• The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be described
as:
• Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is independent
of the occurrence of other features. Such as if the fruit is identified on the basis of color, shape,
and taste, then red, spherical, and sweet fruit is recognized as an apple. Hence each feature
individually contributes to identify that it is an apple without depending on each other.
P(Play=No) = 5/14
We have four variables; we calculate for each
we calculate the conditional probability table
• Test Phase
– Given a new instance of variable values,
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
– Given calculated Look up tables
P(Outlook=Sunny|Play=Yes) = 2/9 P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play=Yes) = 3/9 P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=Yes) = 3/9 P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=Yes) = 3/9 P(Wind=Strong|Play=No) = 3/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14
• The Naive Bayes Algorithm has trouble with the ‘zero-frequency problem’. It happens when you
assign zero probability for categorical variables in the training dataset that is not available. When
you use a smooth method for overcoming this problem, you can make it work the best.
• It will assume that all the attributes are independent, which rarely happens in real life. It will limit
the application of this algorithm in real-world situations.
• It will estimate things wrong sometimes, so you shouldn’t take its probability outputs seriously.
Applications that use Naive Bayes
Decision Tree
Decision Tree
• Decision tree induction is a type of supervised algorithm.
• Decision tree induction is the learning of decision trees from class-labeled training tuples.
• A decision tree is a flowchart-like tree structure, where each internal node (nonleaf node) denotes a test on an
attribute, each branch represents an outcome of the test, and each leaf node (or terminal node) holds a class
label.
• The topmost node in a tree is the root node.
• A Decision tree is a tree in which each branch node represents a choice between a number of alternatives, and
each leaf node represents decision.
Decision Tree Classification Task
6 No Medium 60K No
Training Set
Apply
Decision
Model
Tid Attrib1 Attrib2 Attrib3 Class
Tree
11 No Small 55K ?
15 No Large 67K ?
10
Test Set
Decision Tree Classification Task
6 No Medium 60K No
Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class
Model
11 No Small 55K ?
15 No Large 67K ?
10
Test Set
An Example of Decision Tree
To sanction the loan or not?
Can be predicted whether a person can cheat or not based on the Account data.
• According to the value of information gain, we split the node and build the decision tree.
• A decision tree algorithm always tries to maximize the value of information gain, and a
node/attribute having the highest information gain is split first. It can be calculated using the below
formula:
• Gini Index
Information Gain:
• Information gain is the measurement of changes in entropy after the segmentation of a dataset
based on an attribute.
• According to the value of information gain, we split the node and build the decision tree.
• A decision tree algorithm always tries to maximize the value of information gain, and a
node/attribute having the highest information gain is split first. It can be calculated using the below
formula:
Where,
S= Total number of samples
P(yes)= probability of yes
P(no)= probability of no a metric to measure the impurity in a given attribute. It
specifies randomness in data. Entropy can be calculated as:
Example
Instance Classification a1 a2
1 + T T
2 + T T
3 - T F
4 + F F
5 - F T
6 - F T
1) What is the entropy of training examples with respect to the target function?
2) What is the information gain of a1,a2 relative to the training examples
3) Draw a decision tree for the given dataset
Solution
[3+,3-]
=1 3 - T F
4 + F F
5 - F T
6 - F T
Solution
For Attribute a1
I (for a1 = T) =- (2/3) log2 (2/3) - (1/3) log 2(1/3) = 0.9183
I (for a1 = F) = - (1/3) log2 (1/3) - (2/3) log 2(2/3) = 0.9183
Instan Classif a1 a2
Gain (S,a1)= E(S) - (3/6) E (S+) + (3/6) E(S-) ce ication
1 + T T
= 1 - 3/6*0.9183 + 3/6*0.9183 2 + T T
3 - T F
= 0.0817
4 + F F
5 - F T
6 - F T
For Attribute a2 Instance Classificat a1 a2
I (for a2 = T) = - (2/4) log2 (2/4) - (2/4) log 2(2/4) = 1 ion
1 + T T
I (for a2 = F) = - (1/2) log2 (1/2) - (1/2) log 2(1/2) = 1
2 + T T
Gain (S,a2)= E(S) - (4/6) E (S+) + (2/6) E(S-) 3 - T F
= 1 - 4/6*1 + 2/6 *1 4 + F F
5 - F T
=0
6 - F T
Gain (S,a1)=0.0817
Gain (S,a2)=0
a1
F
T
a2 a2
T F F T
+ - + -
Applications of Supervised learning
• Supervised learning can be used to solve a wide variety of problems, including:
• Spam filtering: Supervised learning algorithms can be trained to identify and classify spam emails
based on their content, helping users avoid unwanted messages.
• Image classification: Supervised learning can automatically classify images into different
categories, such as animals, objects, or scenes, facilitating tasks like image search, content
moderation, and image-based product recommendations.
• Medical diagnosis: Supervised learning can assist in medical diagnosis by analyzing patient data,
such as medical images, test results, and patient history, to identify patterns that suggest specific
diseases or conditions.
• Fraud detection: Supervised learning models can analyze financial transactions and identify
patterns that indicate fraudulent activity, helping financial institutions prevent fraud and protect
their customers.
• Natural language processing (NLP): Supervised learning plays a crucial role in NLP tasks,
including sentiment analysis, machine translation, and text summarization, enabling machines to
understand and process human language effectively.
Advantages of Supervised learning
• Supervised learning allows collecting data and produces data output from previous
experiences.
• Helps to optimize performance criteria with the help of experience.
• Supervised machine learning helps to solve various types of real-world
computation problems.
• It performs classification and regression tasks.
• It allows estimating or mapping the result to a new sample.
• We have complete control over choosing the number of classes we want in the
training data.
Disadvantages of Supervised learning
• Classifying big data can be challenging.
• Training for supervised learning needs a lot of computation time. So, it
requires a lot of time.
• Supervised learning cannot handle all complex tasks in Machine Learning.
• Computation time is vast for supervised learning.
• It requires a labelled data set.
• It requires a training process.
Assessments
QUESTION 1:
A _________ is a decision support tool that uses a tree-like graph or model of
decisions and their possible consequences, including chance event outcomes,
resource costs, and utility.
a) Decision tree
b) Graphs
c) Trees
d) Neural Networks
Assessments
QUESTION 2:
Which of the following statement is not true about Naïve
Bayes classifier algorithm?
a) It cannot be used for Binary as well as multi-class
classifications
b) It is the most popular choice for text classification
problems
c) It performs well in Multi-class prediction as compared to
other algorithms
d) It is one of the fast and easy machine learning algorithms
to predict a class of test datasets
Did You Know?