Download as pdf or txt
Download as pdf or txt
You are on page 1of 83

Bayesian learning – Introduction, Bayes theorem, Bayes theorem and concept learning,

Maximum Likelihood and least squared error hypotheses, maximum likelihood


hypotheses for predicting probabilities, minimum description length principle, Bayes
optimal classifier, Gibs algorithm, Naïve Bayes classifier, an example: learning to
classify text, Bayesian belief networks, the EM algorithm.
Computational learning theory – Introduction, probably learning an approximately
correct hypothesis, sample complexity for finite hypothesis space, sample complexity
for infinite hypothesis spaces, the mistake bound model of learning.
Instance-Based Learning- Introduction, k-nearest neighbour algorithm, locally
weighted regression, radial basis functions, case-based reasoning, remarks on lazy and
eager learning.
Bayes Theorem gives the probability of an event based on prior Knowledge of
Conditions.

P(A|B) – Hypothesis
P(B|A) – Likelihood
P(B) – Marginal
P(A) - Prior
P(A/B).P(B) = P (A∩B) – Eq 1
P(B/A).P(A) = P (B ∩A) – Eq 2
P (A/B).P(B) = P(B/A).P(A)
Bayes Theorem

A – Hypothesis
B – Given Data
P(A/B) = Finding probability of Hypothesis when probability of Training examples are given.
P (B/A) = Finding probability of given data provided with probability of Hypothesis that is true
P (A) = Probability of Hypothesis before observing given data
P (B) = Probability of given data
Bayes Theorem Calculates the probability
of each possible hypothesis and outputs
the most probable one.
Hmap or hml
p – probability Density Function
Hmap or Hml

-> Product
µ - Mean
– Standard Deviation
X- Variable or Input
Due to Negative Symbol
argmax changed to argmin

Least Square Error


log(ab) = log a + log b

•Minimum Length/Short
Hypothesis is Required
•To Convert max to min
negative symbol is added
Let us Consider the problem of designing a code to transmit messages drawn at random
from a set D Where probability of drawing an i th message = Pi
While transmitting we want a code that minimises the expected number of bits.
To do this we should assign shorter codes to the most probable, We represent the length
of message i with respect to c as

i- Stands for Message


Therefore h MDL = h MAP
MDL – Minimum Description Length
The expected misclassification error for the Gibbs algorithm is at most twice the expected error of
the Bayes optimal classifier
One highly practical Bayesian learning method is the Naive Bayes learner, often called the
Naive Bayes classifier.
In some domains its performance has been shown to be comparable to that of neural
network and decision tree learning.
The Naive Bayes classifier applies to learning tasks where each instance x is described by a
conjunction of attribute values and where the target function f (x) can take on any value from
some finite set V.
A set of training examples of the target function is provided, and a new instance is presented,
described by the tuple of attribute values (al, a2.. .an,). The learner is asked to predict the target
value, or classification, for this new instance.
The Bayesian approach to classifying the new instance is to assign the most probable target
value, VMAP, given the attribute values (al, a2.. .an,) that describe the instance.
EM – Expectation, Maximization
Used to find the Latent Variable.(Not Directly Observed).It is Inferred from Other
Variables.
Basic for many Unsupervised Clustering Algorithm.
Steps in EM Algorithm
1. Initially a set of Initial Values Considered.A Set of Incomplete data given to System.
2. Next Step Expectation. Here we use Observed data to estimate or guess the value of
Missing /Incomplete data.
3. Next Step Maximisation.Here we use the Complete data generated in preceding e-step
to update the Values.
4. We check if Values are Converging/Not
5. If Converging Stop. Otherwise Repeat Step 2 and 3 till the Convergence occurs.
Instance based Learning

Lazy Learners Radial Based Case Based


(KNN) Functions (RBF) Reasoning (CBR)
KNN is Example for Lazy Learning
Example:
Given Data Query ->x = (Maths=6,CS=8) and K=3
Classification : Pass/Fail
S.No Maths CS Pass/Fail
1. 4 3 Fail
2. 6 7 Pass
3. 7 8 Pass
4. 5 5 Fail
5. 8 8 Pass

Euclidean Distance(d) = x2,y2 ->xo1,xo2


x1,y1->xa1,xa2
o - Observed Value
a - Actual Value
The Data Point will fit to the Given Curve
To Modify Weight
a1,a2..an ->Attributes of instance X
w1,w2..wn->Coe-Efficient of
Attributes /Weights of Function
Initially Weights are Assumed
Random Values and then modify
to fit training Examples
td – Target Output (Known and fixed)
od – Calculated Output (Change)
Calculated Value

Actual Value

Attribute w.r to Example


Or instance
K ->Decreasing Function
Minimum Distance will be
given More Importance.
One approach to function approximation that is closely related to distance-
weighted regression and also to artificial neural networks is learning with radial
basis functions. In this approach, the learned hypothesis is a function of the form

You might also like