Professional Documents
Culture Documents
Lec 6
Lec 6
Fall 2010
Classification
-2
-4
-6 -5 0 5
Basic Vocabulary
I We’ll call each thing that we are classifying a point
I Each point is described by a set of numbers that we call
features
I It is your job to decide on the features. You need to pick
features that help you do your job.
I For instance, if you are looking for edges, you’ll probably want
to look at the response of different derivative filters.
I In the graph below, every point is described by two features.
I The point at index i in the dataset will be described as xi
-2
-4
-6 -5 0 5
Separating Points Linearly
-2
-4
-6 -5 0 5
Expressing Linear Separation Mathematically
sign(ax + by + c)
4
3.5
2 2.1
1.5
0
-0.9
-2 -2.3
-2.6
-4
-6 -4 -2 0 2 4 6
How can we find the separating line?
ax + by + c = 0
-2
-4
-6 -5 0 5
How do we get the parameters of this line?
4
3.5
2 2.1
1.5
0
-0.9
-2 -2.3
-2.6
-4
-6 -4 -2 0 2 4 6
Looking at the confidence in classification
I We can translate this response into a probability.
I We will use the logistic function to transform the response
into a probability of the correct classification
I We will assume that each point will be labeled with a label l.
l can take values +1 or −1.
1
P [l = +1|x] =
1 + exp(−(ax + by + c))
Finding the line parameters
I Now, for any set of line constants, we can find out the
probability assigned to the correct label of each item in the
training set.
N N
Y Y 1
P [li |xi ] =
1 + exp(−li (axi + byi + c))
i=1 i=1
N N
Y Y 1
P [li |xi ] =
1 + exp(−li (axi + byi + c))
i=1 i=1
N N
Y Y 1
P [li |xi ] =
1 + exp(−li (axi + byi + c))
i=1 i=1
N N
!
Y X
log P [li |xi ] = −log (1 + exp(−li (axi + byi + c)))
i=1 i=1
I Compare
N
X
L= log (1 + exp(−li (axi + byi + c)))
i=1
I With
N
X
log(1 + exp (−li (wT xi ))
L= (2)
i=1
I What’s missing?
I In the upper equation c is a constant.
I This is called the bias term.
Function of the bias term
4
3.5
2 2.1
1.5
0
-0.9
-2 -2.3
-2.6
-4
-6 -4 -2 0 2 4 6
How do we implement it?