Question 1 (Linear Regression)

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Visualization, Mapping, Value Chain Analysis, Mind Mapping,

Brainstorming, Concept Development, Assumption Testing,


Rapid Prototyping, Implementation

Design Thinking tools to help you create real value for your
customers and users.
[(Empathize: Typeform, Zoom, Creatlr), (Define: Smaply,
Userforge, MakeMyPersona), (Ideate: SessionLab, Stormboard,
IdeaFlip), (Prototype: Boords, Mockingbird, POP), (Test:
UserTesting, HotJar, PingPong), (For the complete process:
Sprintbase, InVision, Mural, Miro),

Question 1 (Linear Regression): In this question, you will implement


linear basis function regression with polynomial and Gaussian basis.
Start by downloading the code and dataset from the website:
http://vda.univie.ac.at/Teaching/ML/15s/assignments/asgn02-data.zip.
The dataset is Housing dataset from the UCI repository. The task is to predict median
house value from features describing a town.
Functions are provided for loading the data1 , and normalizing the features and target
to have 0 mean and unit variance:
[ t ,X] = loadData ( ) ;
X n = normalizeData (X,X) ;
t = normalizeData ( t , t ) ;

the provided functions that you can use are:


1
Note that loadData reorders the datapoints using a fixed permutation. Use this fixed permutation for
the questions in this assignment. If you are interested in what happens in ”reality”, try using a random
permutation afterwards. Results will not always be as clean as you will get with the fixed permutation.

1
• [t,X] = loadData(): loads data from ’housing.data’ data file. t is the target
output and X is the input features.
• X n = normalizeData( X, ref ): Normalizes the data in X using the mean
and variance of the data in ref. If ref=X, then X n is a linear transformation of
X with zero mean and unit variance.

For the following, use these normalized features X n and targets for learning the model.
Have a look at the source code for all provided files. You may be able to use the
structures as hint.

Polynomial Basis Function


Implement linear basis function regression with polynomial basis functions. Use
only monomials of a single variable (x1 , x21 , x31 , · · · , x2 , x22 , · · ·), and no cross-terms
(x1 x2 ).
(a): [+CODE] Create a MATLAB script polynomial regression.m for the fol-
lowing:
Using the first 100 points as training data, and the remainder as testing data,
fit a polynomial basis function regression for degree 1 to degree 7 polynomials.
Do not use any regularization. Plot training error and test error (in RMS error)
versus polynomial degree.
Put this plot, along with a brief comment on what you see, in your
report.
(b): Run your polynomial regression using a degree 1 polynomial. Examine the
learned weights. What value is chosen for w5 , the weight on the 5th feature (aver-
age number of rooms per dwelling)? What value is chosen for the weight on the
7th feature (weighted distance to five Boston employment centers)? (Don’t forget
the bias weight and the normalizations.) Do these 2 weights seem reasonable?
Put the values of all weights and your comments on weights for the 5th
and 7th features in your report. You do not need to submit code for
this part.
(c): [+CODE] Create a MATLAB script polynomial regression 1d vis.m for
the following:
It is difficult to visualize the results of high-dimensional regression. Instead, only
use one of the features (use X n(:,2)) and again perform polynomial regression.
Produce plots of the training data points, learned polynomial, and test data
points. The code visualize 1d.m may be useful as a template. Do not forget
the normalization.
Put 3 of these plots, for interesting (low-order, high-order polynomials)
results, in your report. Include brief comments.

Page 2
Some of the tools can be useful in more than one stage. When the tool is used outside
of its commonly considered stage, the tool remains fairly consistent in terms of how it is
executed, but the objectives for the tool align with the intention of the stage in which it
is being utilized. You will
notice none of the tool sequences shown introduce repetition at the ends of the process
model. Repetition within the What is stage would require a reset or delay of the
subsequent stage outcomes in order to be consistent with a revised outlook of the
current state. This implies the original
problem is being abandoned or has undergone major revision. Starting an independent
design- thinking project to explore the new problem may be a better choice than to start
over from within an existing project. The learning launch is a tool that is rarely
repeated within a design-thinking project. Subsequent revision and additional market
delivery would occur in either future design thinking projects or by business units who
have taken on the task of scaling up marketing, manufacturing and delivery of the
innovative solution.

Visualization, Mapping, Value Chain Analysis, Mind Mapping,


Brainstorming, Concept Development, Assumption Testing, Rapid
Prototyping, Implementation

Design Thinking tools to help you create real value for your
customers and users.
Put this ploton your report, and note which regularizing constant λ
[(Empathize:
youTypeform, Zoom,
would choose fromCreatlr), (Define: Smaply, Userforge,
the cross-validation.
2

MakeMyPersona), (Ideate:
or reportSessionLab, Stormboard, IdeaFlip),
The unregularized result (λ = 0) will not appear on this scale. You can either add it as a separate
horizontal line as a baseline, this number separately.

(Prototype: Boords, Mockingbird, POP), (Test: UserTesting, HotJar,


PingPong), (For the complete process:
Page Sprintbase,
3 InVision, Mural,
Miro),
 
a ∼ X|C1 ∼ N µ1 , σ12
 
b ∼ X|C2 ∼ N µ2 , σ22
(c)
:
We define random variable c = a − b. Each sample of c is generated by a random
(b W
): ha
ti
sample from C2 , subtracted from a random sample from C1 .
(a
):
W
ha st
he
W t is pro
ha t he b
ti
st pro abili
he ba ty
dis bil tha
tri i ty tc
bu <
tio that 0
no c= ?
fc 0±
?
?

2 Z x −t2
erf (x) = √ e dt. (1)
π 0
(d): Write down the solution of the last two sections in terms of the Fisher
criteria. Explain why this relation between the Fisher criteria and distribution of
the random variable c = a − b makes sense.

Page 4

You might also like