Professional Documents
Culture Documents
Machine Learning Lecture 1 (UW)
Machine Learning Lecture 1 (UW)
Traditional Algorithms
Traditional Algorithms
airplane
other
bird
Machine Learning algorithms
other.append(tweet)
return birds, planes, other
feature 1
Machine Learning algorithms
else:
other.append(tweet)
return birds, planes, other
feature 1
Machine Learning algorithms
Regression Unsupervised
Predict continuous value: Learning
ex: stock market, credit score,
temperature, Netflix rating Predict structure:
tree of life from DNA, find
Classi cation similar images, community
detection
Predict categorical value:
loan or not? spam or not? what
disease is this?
Prerequisites
■ Formally:
■ MATH 308, CSE 312, STAT 390 or equivalent
■ Familiarity with:
■ Linear algebra
■ linear dependence, rank, linear equations, SVD
■ Multivariate calculus
■ Probability and statistics
■ Distributions, marginalization, moments, conditional expectation
■ Algorithms
■ Basic data structures, complexity
■ “Can I learn these topics concurrently?”
■ Use HW0 to judge skills
■ See website for review materials!
Grading
■ 5 homework (100%=10%+20%+20%+20%+30%)
Each contains both theoretical questions and will have
programming
Collaboration okay but must write who you collaborated
with. You must write, submit, and understand your
answers and code (which run on autograder)
Do not Google for answers.
■ NO exams
■ We will assign random subgroups as PODs (when dust
clears)
Homework
Homework
Homework & Grading
CSE 446:
Just do A problems
Doing B problems will not get higher grades
Grade is on 4.0 scale (relative to students in 446)
CSE 546:
If just do A problems, grade is up to 3.8
B problems are for 0.2
Final grade = A grade + B grade (relative students in
546)
Communication Chanels
Textbooks
■ Required Textbook:
Machine Learning: a Probabilistic Perspective;
Kevin Murphy
Addcodes
Maximum Likelihood
Estimation
Jamie Morgenstern
Billionaire: Why?
■ P
(D|✓) =
b k
✓M LE =
n
■ You: flip the coin 5 times. Billionaire: I got 3 heads.
✓bM LE =
✓bM LE =
Simple bound
(based on Hoeffding’s inequality)
■ For n flips and k heads the MLE is unbiased for true θ*:
k
b
✓M LE =
n
E[✓bM LE ] = ✓⇤
2n✏2
P (|✓bM LE ⇤
✓ | ✏) 2e
PAC Learning
■ Sum of Gaussians
X ~ N(µX,σ2X)
Y ~ N(µY,σ2Y)
Z = X+Y ➔ Z ~ N(µX+µY, σ2X+σ2Y)
2⇡ i=1
■ Log-likelihood of data:
n
X
p (xi µ)2
log P (D|µ, ) = n log( 2⇡) 2
i=1
2
Xn
■ MLE: 1
bM LE
µ = xi
n i=1
Xn
c2 M LE 1
= (xi bM LE )2
µ
n i=1
■ MLE for the variance of a Gaussian is biased
E[c2 M LE ] 6= 2
■ Learning is…
Collect some data
■ E.g., coin flips
Choose a hypothesis class or model
■ E.g., binomial
Choose a loss function
■ E.g., data likelihood
Choose an optimization procedure
■ E.g., set derivative to zero to obtain MLE
Justifying the accuracy of the estimate
■ E.g., Hoeffding’s inequality
Maximum
Maximum Likelihood
Estimation, cont.
Likelihood
Estimation, cont.
Machine Learning – CSE446
Jamie Morgenstern
University of Washington
2
✏i ⇠ N (0, )
Linear model Noise model
# square feet
2
✏i ⇠ N (0, )
Loss function:
n
X
log(p(yi |xi , w, b))
# square feet
i=1
©2018 Kevin Jamieson 42
n
X n
X
arg min log(p(yi |xi , w, b)) ⌘ arg min (yi (wxi + b))2
w,b w,b
i=1 i=1
2
✏i ⇠ N (0, )
Loss function:
n
X
(yi (wxi + b))2
# square feet i=1
©2018 Kevin Jamieson 45