Professional Documents
Culture Documents
2.1 Intro Statistical Learning 1
2.1 Intro Statistical Learning 1
2.1 Intro Statistical Learning 1
Data Mining
BIF 524 - CSC 498
Data is the sword of the 21st century, those who wield it well,
the Samurai. – Jonathan Rosenberg
1
9/1/2021
Before we start
Textbook
https://www.statlearning.com/
2
9/1/2021
Course Description
This course covers the fundamental techniques and applications
for mining data; topics include concepts from:
• Machine learning
• Statistics
• Techniques and algorithms for parametric and non-parametric
classification, clustering, classifier assessment.
• Supervised vs unsupervised learning.
• Expert system
• Graphical models
3
9/1/2021
Teaching/Learning methods
• Plenty of applications
4
9/1/2021
Additional Remarks
Introduction
10
5
9/1/2021
Introduction (2)
11
12
6
9/1/2021
13
14
7
9/1/2021
15
16
8
9/1/2021
17
18
9
9/1/2021
Notation
• Use n to represent the number of distinct data points, or
observations, in our sample; p the number of variables.
• xij represent the value of the jth variable for the ith observation,
where i = 1, 2, . . ., n and j = 1, 2, . . . , p
• X denote a n×p matrix.
19
Yi f (Xi ) i
20
10
9/1/2021
Simple Example
The function f that connects the input variable to the output variable is
in general unknown. In this situation one must estimate f based on the
observed points.
21
0.10
0.05
0.05
0.00
0.00
y
y
-0.05
-0.05
The difficulty of
-0.10
-0.10
estimating f will
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
depend on the
standard deviation of
sd=0.01 sd=0.03
the ε’s.
0.10
y
-0.05
-0.10
-0.10
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
22
11