Professional Documents
Culture Documents
Lec1-Lecture Advance Statistics
Lec1-Lecture Advance Statistics
Gaofeng Da
General idea
Statistical learning
Why do we estimate f ?
How do we estimate f ?
Generally,
• Statistical Learning, and this course, are all about how to
estimate f
• The term statistical learning refers to using the data to “learn" f
We will assume we have observed a set of training data
Parametric methods
Recall Figure 2.3, it can be found that even if the standard deviation is
low we will still get a bad answer if we use the wrong model.
Statistical learning
Non-parametric Methods
A poor estimate
n
1X
MSE = (yi − ŷi )2 ,
n
i=1
where ŷi = f̂ (xi ) is the prediction our method gives for ith
observation in our training data.
n
1X
Training error rate = I(yi 6= ŷi ),
n
i=1
where ŷ0 is the predicted class label that results from applying
the classifier to the test observations with predictor x0 . A good
classifier is one for which the test error rate is smallest.
Statistical learning
P(Y = j|X = x0 )
is largest.
• For a two-class problem (Class 1 and Class 2), the Bayes
Classifier predicts Class 1 if P(Y = 1|X = x0 ) > 0.5, Class 2
otherwise.
• The Bayes classifier produces the lowest test error rate, called
Bayes error rate. The overall Bayes error rate is given by
1 X
P(Y = j|X0 = x0 ) ≈ I(yi = j)
K
i∈N0
K = 1 and K = 100
A Fundamental Picture
For both regression and classification, choosing the correct level of
flexibility is critical to the success of any statistical learning method.
Figure: We must always keep this picture in mind when choosing a learning
method. More flexible/complicated is not always better!
R Language
R: a programming language and free software environment for
statistical computing and graphics