Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Department of Electrical and Computer Engineering

University of Delaware
FSAN/ELEG815 Analytics I: Statistical Learning
Homework #3, Fall 2021

Name:

1. Handwritten Digit Recognition. The goal is to recognize the digit in each


image of the dataset given in “DigitsTraining” which contains some digits
from the US Postal Service Zip Code. We are going to decompose the
big task of separating ten digits into smaller tasks of separating two of
the digits (binary classification). Use two digits: the final number in your
UD ID and conveniently choose any other number to replicate the results
from the slides chapter “The Learning Problem” (take into account the
features that you are going to use for classification to choose the second
number).
(a) Extract 2 features from the images: average intensity and symmetry.
Using this two features, implement the Perceptron Learning Algo-
rithm. Use an error metric for binary classification. To compute
Eout , use the testing set given to you in “DigitsTesting”. Show only
200 iterations.
(b) Repeat item (a), for the pocket algorithm. Show the same two plots
that are in Slide 32 and 33 of the chapter “The Learning Problem”.
Compare your results and draw conclusions.
(c) Use linear regression for classification. Even though, linear regression
learns a real-valued function, binary-valued functions are also real-
valued ±1 ∈ R. Thus, you can use linear regression to compute w
and approximate your binary classification wT xn ≈ yn = ±1. Use
your result for w to compute sign(wT xn ) and report the value for
Ein and Eout .
(d) Repeat item (b), using your result for w in item (c), as the initial
weights in the pocket algorithm. Compare your results.
Dataset description: The first column in DigitsTraining and DigitsTest-
ing corresponds to the digit number, following columns correspond to 256
pixels of the 16 × 16 pixel image of the digit. Thus, we have 7291 inputs
in DigitsTraining and 2007 inputs in DigitsTesting. From these datasets,
work only with those inputs that correspond to the digits you chose. Re-
member, one of the digits corresponds to the final number in your UD
ID.

1
2. Given a data set with 4 samples (N = 4) in a one-dimensional input space.
Determine the maximum number of dichotomies for this data set, i.e.
mH (4) and sketch all the possible dichotomies for the following hypothesis
sets:

(a) Negative ray: H contains the functions which are −1 on [a, ∞) (for
some a) i.e. h(x) = −sign(x − a).
Negative Ray

(b) Both Positive or negative ray: H contains functions of the form:

h(x) = bsign(x − a)

where b = {−1, 1} and a ∈ R.


Example:
• For b = 1, thus h(x) = sign(x − a):

Positive Ray
• For b = −1, thus h(x) = −sign(x − a):

Negative Ray
(c) Negative interval: This is a hypothesis set H that contains the
functions which are −1 on an interval [a, b] and +1 elsewhere.

2
(d) Both Positive or negative interval: This is a hypothesis set H
that contains the functions which are +1 on an interval [a, b] and −1
elsewhere together with functions which are −1 on an interval [a, b]
and +1 elsewhere. H contains functions of the form:

c
 if x < a
h(x) = −c if a ≤ x ≤ b

c if x > b

where c = {−1, 1}, a, b ∈ R, and a ≤ b.


Example:
• For c = 1:

Negative Interval
• For c = −1:

Positive Interval
(e) One Positive and one negative interval: This is a hypothesis
set H that contains the functions which are +1 on (−∞, a), −1 on
an interval [a, b], +1 on an interval (b, c], and −1 on (c, ∞).


 1 if x<a

−1 if a≤x≤b
h(x) =


 1 if b<x≤c
−1 if x>c

where a, b, c ∈ R, and a ≤ b ≤ c.

You might also like