Professional Documents
Culture Documents
06-Dim Red
06-Dim Red
http://cs246.stanford.edu
The above matrix is really 2-dimensional. All rows can be reconstructed by scaling [1 1 1 0 0] or [0 0 0 1 1]
1/23/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 2
Assumption: Data lies on or near a low d-dimensional subspace Axes of this subspace are effective representation of the data
Remove redundant and noisy features Interpretation and visualization Easier storage and processing of the data
Not all words are useful
1/23/2013
m x n matrix (e.g., m documents, n terms) m x r matrix (m documents, r concepts) r x r diagonal matrix (strength of each concept) (r : rank of the matrix A)
1/23/2013
VT
1/23/2013
1u1v1
2u2v2
+
i scalar ui vector vi vector
1/23/2013
It is always possible to decompose a real matrix A into A = U VT , where U, , V: unique U, V: column orthonormal : diagonal
UT U = I; VT V = I (I: identity matrix) (Columns are orthogonal unit vectors) Entries (singular values) are positive, and sorted in decreasing order (1 2 ... 0)
1/23/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 8
Casablanca
Matrix
SciFi
Romnce
1 3 4 5 0 0 0
1 3 4 5 2 0 1
1 3 4 5 0 0 0
0 0 0 0 4 5 2
0 0 0 0 4 5 2
VT
1/23/2013
Casablanca
Matrix
SciFi
Romnce
1 3 4 5 0 0 0
1 3 4 5 2 0 1
1 3 4 5 0 0 0
0 0 0 0 4 5 2
0 0 0 0 4 5 2
0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09
1/23/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 10
Casablanca
Matrix
SciFi
Romnce
1 3 4 5 0 0 0
1 3 4 5 2 0 1
1 3 4 5 0 0 0
0 0 0 0 4 5 2
0 0 0 0 4 5 2
0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09
1/23/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 11
Casablanca
A = U VT - example:
Serenity Amelie Alien SciFi-concept
Matrix
Romance-concept
SciFi
Romnce
1 3 4 5 0 0 0
1 3 4 5 2 0 1
1 3 4 5 0 0 0
0 0 0 0 4 5 2
0 0 0 0 4 5 2
0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09
1/23/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 12
Casablanca
A = U VT - example:
Serenity Amelie Alien SciFi-concept strength of the SciFi-concept
Matrix
SciFi
Romnce
1 3 4 5 0 0 0
1 3 4 5 2 0 1
1 3 4 5 0 0 0
0 0 0 0 4 5 2
0 0 0 0 4 5 2
0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09
1/23/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 13
Casablanca
A = U VT - example:
Serenity Amelie Alien SciFi-concept
Matrix
SciFi
Romnce
1 3 4 5 0 0 0
1 3 4 5 2 0 1
1 3 4 5 0 0 0
0 0 0 0 4 5 2
0 0 0 0 4 5 2
SciFi-concept
1/23/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets
0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09
14
U: user-to-concept similarity matrix V: movie-to-concept similarity matrix : its diagonal elements: strength of each concept
1/23/2013
15
v1 Movie 1 rating
1/23/2013
16
Movie 2 rating
A = U VT - example:
v1
Movie 1 rating
1 3 4 5 0 0 0
0 0 0 0 4 5 2
0 0 0 0 4 5 2
0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09
17
Movie 2 rating
A = U VT - example:
variance (spread) on the v1 axis
v1
Movie 1 rating
1 3 4 5 0 0 0
1 3 4 5 2 0 1
1/23/2013
1 3 4 5 0 0 0
0 0 0 0 4 5 2
0 0 0 0 4 5 2
0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09
18
Movie 2 rating
A = U VT - example:
v1
Movie 1 rating
1 3 4 5 0 0 0
1 3 4 5 2 0 1
1/23/2013
1 3 4 5 0 0 0
1 3 4 5 2 0 1
1/23/2013
1 3 4 5 0 0 0
0 0 0 0 4 5 2
0 0 0 0 4 5 2
0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09
20
More details Q: How exactly is dim. reduction done? A: Set smallest singular values to zero
1 3 4 5 0 0 0 1 3 4 5 2 0 1
1/23/2013
1 3 4 5 0 0 0
0 0 0 0 4 5 2
0 0 0 0 4 5 2
0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09
21
More details Q: How exactly is dim. reduction done? A: Set smallest singular values to zero
1 3 4 5 0 0 0 1 3 4 5 2 0 1
1/23/2013
1 3 4 5 0 0 0
0 0 0 0 4 5 2
0 0 0 0 4 5 2
0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09
22
More details Q: How exactly is dim. reduction done? A: Set smallest singular values to zero
1 3 4 5 0 0 0 1 3 4 5 2 0 1
1/23/2013
1 3 4 5 0 0 0
0 0 0 0 4 5 2
0 0 0 0 4 5 2
0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09
23
More details Q: How exactly is dim. reduction done? A: Set smallest singular values to zero
1 3 4 5 0 0 0 1 3 4 5 2 0 1
1/23/2013
1 3 4 5 0 0 0
0 0 0 0 4 5 2
0 0 0 0 4 5 2
12.4 0 0 9.5
0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69
24
More details Q: How exactly is dim. reduction done? A: Set smallest singular values to zero
1 3 4 5 0 0 0 1 3 4 5 2 0 1 1 3 4 5 0 0 0 0 0 0 0 4 5 2 0 0 0 0 4 5 2
MF = ij Mij2
1/23/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets
Frobenius norm:
A-BF = ij (Aij-Bij)2
is small
25
Sigma
VT
B is best approximation of A
Sigma
VT
26
1/23/2013
(12, rank(A)=r)
S = diagonal nxn matrix where si=i (i=1k) else si=0 B is a solution to minB A-BF where rank(B)=k
1/23/2013
where M = P Q R is SVD of M
27
where M = P Q R is SVD of M
U VT - U S VT = U ( - S) VT
1/23/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets
28
A = U VT , B = U S VT (12 0, rank(A)=r)
We used: U
VT -
US
VT
= U ( - S)
VT
i =1
We want to choose si to minimize Solution is to set si=i (i=1k) and other si=0
= min si ( i si ) +
2 i =1 k i = k +1
i = k +1
i
29
1/23/2013
u1
u2
1 2 v1 v2
1/23/2013
30
1 3 4 5 0 0 0
1 3 4 5 2 0 1
1 3 4 5 0 0 0
0 0 0 0 4 5 2
0 0 0 0 4 5 2
k terms
u1 vT1 +
u2 vT2 +...
Assume: 1 2 3 ... 0 Why is setting small i to 0 the right thing to do? Vectors ui and vi are unit length, so i scales them. So, zeroing small i introduces less error.
nx1
1xm
1/23/2013
31
1 3 4 5 0 0 0
1/23/2013
1 3 4 5 2 0 1
1 3 4 5 0 0 0
0 0 0 0 4 5 2
0 0 0 0 4 5 2
u1 vT1 +
u2 vT2 +...
Assume: 1 2 3 ...
32
1/23/2013
33
Dimensionality reduction:
keep the few largest singular values (80-90% of energy) SVD: picks up linear correlations
1/23/2013
34
A = U VT
What is:
A = U VT
What is:
X XT
X XT
1/24/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets
So, i = i2
36
Casablanca
Q: Find users that like Matrix A: Map query into a concept space how?
Serenity Alien Amelie
Matrix
SciFi
Romnce
1 3 4 5 0 0 0
1 3 4 5 2 0 1
1 3 4 5 0 0 0
0 0 0 0 4 5 2
0 0 0 0 4 5 2
1/23/2013
-0.01 -0.03 -0.04 -0.05 x 0.65 -0.67 0.32 0.56 0.12 0.40
0.59 0.56 0.09 0.09 -0.02 0.12 -0.69 -0.69 -0.80 0.40 0.09 0.09
38
Q: Find users that like Matrix A: Map query into a concept space how?
Casablanca
Alien
Serenity
Alien
Amelie
Matrix
q= 5 0 0 0 0
Project into concept space: Inner product with each concept vector vi
1/23/2013
v2 v1 Matrix
39
Q: Find users that like Matrix A: Map query into a concept space how?
Casablanca
Alien
Serenity
Alien
Amelie
Matrix
q= 5 0 0 0 0
Project into concept space: Inner product with each concept vector vi
1/23/2013
v2 v1 q*v1 Matrix
40
q= 5 0 0 0 0
SciFi-concept
Alien
2.8
0.6
1/23/2013
41
How would the user d that rated (Alien, Serenity) be handled? dconcept = d V E.g.:
q= 0 4 5 0 0
SciFi-concept
Alien
5.2
0.4
1/23/2013
42
Observation: User d that rated (Alien, Serenity) will be similar to user q that rated (Matrix), although d and q have zero ratings in common!
Casablanca Serenity Amelie Matrix Alien SciFi-concept
d= q=
0 4
2.8 5.2
0.6 0.4
5 0
Similarity 0
43
- Lack of sparsity:
=
A singular vector specifies a linear combination of all input columns or rows Singular vectors are dense!
VT
U
1/23/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets 44
Announcements:
HW2 has been posted LSH Gradiance quiz has been posted. Due 2013-01-30 23:59 Date for an alternate final: Tue 3/19 6-9PM
1/23/2013
45
Frobenius norm:
XF = ij Xij2
Goal: Express A as a product of matrices C,U,R Make A-CURF small Constraints on C and R:
A
1/24/2013
C
Jure Leskovec, Stanford C246: Mining Massive Datasets
R
46
Frobenius norm:
XF = ij Xij2
Goal: Express A as a product of matrices C,U,R Make A-CURF small Constraints on C and R:
A
1/24/2013
C
Jure Leskovec, Stanford C246: Mining Massive Datasets
R
47
1/23/2013
49
Let SVD of W = X Z YT
W C
U = W+
1/23/2013 Jure Leskovec, Stanford C246: Mining Massive Datasets
Why pseudoinverse works? W = X Z Y then W-1 = X-1 Z-1 Y-1 Due to orthonomality X-1=XT and Y-1=YT Since Z is diagonal Z-1 = 1/Zii Thus, if W is nonsingular, pseudoinverse is the true inverse
50
1/23/2013
51
Rd A Cd
Rs Cs
Construct a small U
1/23/2013
52
SVD: A = U
Huge but sparse
T V
CUR: A = C U R
Huge but sparse
1/23/2013
Very sparse
1/23/2013
54
Accuracy:
1 relative sum squared errors #output matrix entries / #input matrix entries
Sun, Faloutsos: Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM 07.
Jure Leskovec, Stanford C246: Mining Massive Datasets 55
1/23/2013
Lowerdimensional linear projection that preserves Euclidean distances Data lies on a nonlinear lowdim curve aka manifold
Use the distance as measured along the manifold
1/23/2013
56
Drineas et al., Fast Monte Carlo Algorithms for Matrices III: Computing a Compressed Approximate Matrix Decomposition, SIAM Journal on Computing, 2006. J. Sun, Y. Xie, H. Zhang, C. Faloutsos: Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM 2007 Intra- and interpopulation genotype reconstruction from tagging SNPs, P. Paschou, M. W. Mahoney, A. Javed, J. R. Kidd, A. J. Pakstis, S. Gu, K. K. Kidd, and P. Drineas, Genome Research, 17(1), 96-107 (2007) Tensor-CUR Decompositions For Tensor-Based Data, M. W. Mahoney, M. Maggioni, and P. Drineas, Proc. 12-th Annual SIGKDD, 327-336 (2006)
Jure Leskovec, Stanford C246: Mining Massive Datasets 57
1/23/2013