Professional Documents
Culture Documents
Chapter 5 Statistical Unsupervised Methods
Chapter 5 Statistical Unsupervised Methods
CHAPTER 5
Statistical Unsupervised
Learning
1
Overview
2
Unsupervised Learning
Unsupervised vs Supervised Learning:
Dataset
An n-by-p matrix A
n rows representing n objects
Each object has p numeric values describing it.
Goal
1. Understand the structure of the data, e.g., the underlying process generating the data.
Aij= quantity of j-th product
n customers purchased by the i-th
A customer
Aim: find a subset of the products that characterize customer behavior
Unsupervised Learning
Example 2- Social-network matrices
p groups (e.g., BU group, opera, etc.)
n documents
A
Aij= frequency of the j-th
term in the i-th document
Aim: find a subset of the terms that accurately clusters the documents
Unsupervised Learning
Example 4-Recommendation systems
p products
customer
Aim: find a subset of the products that accurately describe the behavior or the customers
Overview
13
Singular Value Decomposition (SVD)
feature 2
object) and p columns (one for each feature).
Object d
Rows: vectors in a Euclidean space, 1 .5
X =
.5 1
(d,x) Object x
2nd (right)
singular 1: measures how much of the data
4
vector variance is explained by the first singular
vector.
A = U VT
features
sig. significant
significant
noise noise
noise
=
objects
SVD
Rank-k approximations (Ak)
Ak is the best
approximation
of A nxp nxk kxk kxp
Uk (Vk): orthogonal matrix containing the top k left (right) singular vectors of A.
k: diagonal matrix containing the top k singular values of A
Ak is an approximation of A
SVD
Example computation:
Find the SVD of A, UΣVT, where
solve
For λ= 6:
(all not normalized)
For λ=0:
SVD
Example computation:
After normalization:
SVD
Example computation:
Finally the SVD of A, UΣVT
SVD
Python Example
import numpy as np
import scipy [[ 2 0 2]
from scipy.linalg import svd [ 1 2 -1]]
28
Principal Components Analysis (PCA)
• PCA could be used for dimensional reduction, also as a tool for data
visualization.
PCA
PCA and SVD
• PCA is SVD done on centered data
• PCA looks for such a direction that the data projected to it has the
maximal variance
Normalized means
• The second principal component scores z12 , z22 ,..., zn 2 take the form
zi 2 = 12 xi1 + 22 xi 2 + ... + p 2 xip , where 2 is the second
principal component loading vector, with 12 , 22 ,..., P 2 elements
PCA
Geometry
The population size (pop) and ad spending (ad) for 100 different cities are
shown as purple circles. The green solid line indicates the first principal
component direction, and the blue dashed line indicates the second principal
component direction.
PCA
Computation of PCs
• Given n p dataset X.
• If SVD of M is UDV’, the principal components are the right singular vectors of M
PCA
Further PCs
1 etc..
PCA
Illustration
• USA arrests data: For each of the 50 states in the United States, the data set
contains the number of arrests per 100,000 residents for each of three crimes:
Assault, Murder, and Rape. UrbanPop (% of the population in each state
living in urban areas) is also recorded.
• n = 50, p = 4.
• The blue state names represent the scores for the first two principal
components.
• The orange arrows indicate the first two principal component loading
vectors (with axes on the top and right).
• Project from 64 to 2
• Visualization
https://jakevdp.github.io/PythonDataScienceHandbook/05.09-principal-component-analysis.html
46