Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

ARTIFICIAL INTELLIGENCE

ASSIGNMENT ON ML

NABIN GYAWALI (16)


KATHMANDU UNIVERSITY
Naïve Bayes Algorithm
Naïve Bayes is a probabilistic machine learning algorithm based on the Bayes Theorem, used in
a wide variety of classification tasks with the assumption of independence among predictors.
Bayes’ Theorem is a simple mathematical formula used for calculating conditional probabilities.
Conditional probability is a measure of the probability of an event occurring given that another
event has (by assumption, presumption, assertion, or evidence) occurred.
Naive Bayes model is easy to build and particularly useful for very large data sets. Along with
simplicity, Naive Bayes is known to outperform even highly sophisticated classification
methods.

Figure 1: Bayes theorem

How Naïve Bayes algorithm works with example:


Say you have 1000 fruits which could be either ‘banana’, ‘orange’ or ‘other’. These are the 3
possible classes of the Y variable. We have data for the following X variables, all of which are
binary (1 or 0).
• Long
• Sweet
• Yellow
The table is given as:
Type Long Not Long Sweet Not Sweet Yellow Not Total
Yellow
Banana 400 100 350 150 450 50 500

Orange 0 300 150 150 300 0 300

Other 100 100 150 50 50 150 200

Total 500 500 650 350 800 200 1000

So the objective of the classifier is to predict if a given fruit is a ‘Banana’ or ‘Orange’ or ‘Other’
when only the 3 features (long, sweet and yellow) are known.
Step 1: Compute the prior probabilities for each of the class of fruits.
o P(Y=Banana) = 500 / 1000 = 0.50
o P(Y=Orange) = 300 / 1000 = 0.30
o P(Y=Other) = 200 / 1000 = 0.20
Step 2: Step 2: Compute the probability of evidence that goes in the denominator.
This is nothing but the product of P of Xs for all X. This is an optional step because the
denominator is the same for all the classes and so will not affect the probabilities.
• P(x1=Long) = 500 / 1000 = 0.50
• P(x2=Sweet) = 650 / 1000 = 0.65
• P(x3=Yellow) = 800 / 1000 = 0.80
Step 3: Compute the probability of likelihood of evidences that goes in the numerator.
It is the product of conditional probabilities of the 3 features. If you refer back to the
formula, it says P(X1 |Y=k). Here X1 is ‘Long’ and k is ‘Banana’. That means the
probability the fruit is ‘Long’ given that it is a Banana. In the above table, you have 500
Bananas. Out of that 400 is long. So, P(Long | Banana) = 400/500 = 0.8.
Here, for Banana alone.
Probability of Likelihood for Banana
• P(x1=Long | Y=Banana) = 400 / 500 = 0.80
• P(x2=Sweet | Y=Banana) = 350 / 500 = 0.70
• P(x3=Yellow | Y=Banana) = 450 / 500 = 0.90
So, the overall probability of Likelihood of evidence for Banana = 0.8 * 0.7 * 0.9 = 0.504
Step 4: Substitute all the 3 equations into the Naive Bayes formula, to get the probability
that it is a banana.
𝑃(𝐵𝑎𝑛𝑎𝑛𝑎|𝐿𝑜𝑛𝑔, 𝑆𝑤𝑒𝑒𝑡 𝑎𝑛𝑑 𝑌𝑒𝑙𝑙𝑜𝑤)
𝑃(𝐿𝑜𝑛𝑔|𝐵𝑎𝑛𝑎𝑛𝑎) ∗ 𝑃(𝑆𝑤𝑒𝑒𝑡|𝐵𝑎𝑛𝑎𝑛𝑎) ∗ 𝑃(𝑌𝑒𝑙𝑙𝑜𝑤|𝐵𝑎𝑛𝑎𝑛𝑎) ∗ 𝑃(𝐵𝑎𝑛𝑎𝑛𝑎)
=
𝑃(𝑙𝑜𝑛𝑔) ∗ 𝑃(𝑆𝑤𝑒𝑒𝑡) ∗ 𝑃(𝑌𝑒𝑙𝑙𝑜𝑤)

0.8∗0.7∗0.9∗0.5
= = 0.252/P(Evidence)
𝑃(𝐸𝑣𝑖𝑑𝑒𝑛𝑐𝑒)

P (Orange|Long, Sweet, Yellow) = 0


P (Other Fruit|Long, Sweet and Yellow) = 0.
Clearly, Banana gets the highest probability, so that will be our predicted class.

Linear Regression:
Linear Regression is a machine learning algorithm based on supervised learning. It performs a
regression task. Regression models a target prediction value based on independent variables. It is
mostly used for finding out the relationship between variables and forecasting.
There are mainly two types of linear regression model.
• Simple Linear Regression.
• Multivariable regression
The representation of linear regression is a linear equation that combines a specific set of input
values (x) the solution to which is the predicted output for that set of input values (y). As such,
both the input values (x) and the output value are numeric.
Simple linear regression uses traditional slope-intercept form, where m and b are the variables,
our algorithm will try to “learn” to produce the most accurate predictions. x represents our input
data and y represents our prediction.

y=mx+b
Multivariable regression is a more complex, multi-variable linear equation might look like this,
where w represents the coefficients, or weights, our model will try to learn.

f(x, y,z)=w1x+w2y+w3z
The variables x, y,z represent the attributes, or distinct pieces of information, we have about each
observation. For sales predictions, these attributes might include a company’s advertising spend
on radio, TV, and newspapers.
Sales=w1Radio+w2TV+w3News
When working with linear regression, our main goal is to find the best fit line that means the
error between predicted values and actual values should be minimized. The best fit line will have
the least error.

Example: Simple Linear Regression:


Let’s say we are given a dataset with the following columns (features): how much a company
spends on Radio advertising each year and its annual Sales in terms of units sold. We are trying
to develop an equation that will let us to predict units sold based on how much a company spends
on radio advertising. The rows (observations) represent companies.
Company Radio ($) Sales
Amazon 37.8 22.1
Google 39.3 10.4
Facebook 45.9 18.3
Apple 41.3 18.5

Our prediction function outputs an estimate of sales given a company’s radio advertising spend
and our current values for Weight and Bias.
Sales=Weight⋅Radio+Bias
Weight: the coefficient for the Radio independent variable. In machine learning we call
coefficients weights.
Radio: the independent variable. In machine learning we call these variables features.
Bias: the intercept where our line intercepts the y-axis. In machine learning we can call
intercepts bias. Bias offsets all predictions that we make.
Our algorithm will try to learn the correct values for Weight and Bias. By the end of our training,
our equation will approximate the line of best fit.
Figure 2: Regression Plot

You might also like