Professional Documents
Culture Documents
mining_ratio_rules_via_principal_sparse_non_negative_matrix_factorization
mining_ratio_rules_via_principal_sparse_non_negative_matrix_factorization
1 2 3 4 3 2 2
Chenyong Hu ,Benyu Zhang ,Shuicheng Yan ,Qiang Yang ,Jun Yan ,Zheng Chen ,Wei-Ying Ma
1 3
# $ % &
! '$ (
&) ! ! !
*
+ , % & ( - .
! " ! ! " / $
80
80 80
(0.20718 0.79282)
60 (0.64290.76595)
$spend on butter
$spend on butter
$spend on butter
60 60
40
40 20 (-0.765950.6429) 40
20 0
20
-20
(0.63907 0.36093)
0 0
0 20 40 60 80 100 -20 0 20 40 60 80 100 0 20 40 60 80 100
$spend on bread $spend on bread $spend on bread
(a) A data matrix with 2-dimension (b) Ratio Rules identified by PCA c)Ratio Rules identified by PSNMF
Fig 1. A data matrix and latent associations discovered by PCA and PSNMF
hlj 20
wkl = wkl vkj hlj (7) 15
j l
wkl hlj j
Z
10
To make the solution unique, we further require that the 1-
5
normal of the column vector in matrix W is one. In
0 30
addition, matrix H needs to be adjusted accordingly. 0 20
10 10
wkl = wkl wkl (8) 20
k
30 0 Y
X
hkl = hkl k
wkl (9) Fig 2.Dataset with two clusters
It is proved that the objective function is non-increasing
under the above iterative updating rules, and the Table 1. Ratio Rules based on PSNMF and PCA
RR1 RR2 RR3
convergence of the iteration is guaranteed (in Appendix). PSNMF RR1 RR2
(X) 0.000 0.696 0.020 PCA
3.3 Principal SNMF (Y) 0.493 0.304 0.980 (X) -0.52 0.72
(Z) 0.507 0.000 0.000
(Y) -0.77 -0.15
When the dataset V is decomposed with W and H ,
Sum( wi ) 49.88 21.64 3.488
(Z) -0.38 -0.68
Support ( wi ) 0.665 0.289 0.046
each column value of H represents the corresponding
projection on the basis space W . As a whole, the sum of (a)Based on PSNMF (b) Based on PCA
every row vector of H represents the importance of
corresponding base. Therefore, we define a support Table 1.(a) lists all the rules and corresponding support
measurement after normalizing every column of H : values. After ranking such rules, Ratio Rules are obtained
hkl = hkl k
hkl (10) since support ( w1 ) + support ( w2 ) = 0.9535>90% .
Definition. For every rule (column vector) of W , we rule1 :: X : Y : Z => 0 : 0.493 : 0.507 (0.6650)
define a support measurement: rule2 :: X : Y : Z => 0.696 : 0.304 : 0 (0.2885)
support ( wi ) = hij hij (11) For example, 2/3 transactions (the cluster with distribution
j ij
on y-z plain) are mostly depended on rule1 and others on
Consequently, we can measure the importance of each
rule2 . Therefore, the corresponding support value (0.665)
rule for the entire dataset by their support values. The
more value of support implies the more importance of of rule1 does not contradict with intuition. Otherwise,
such rule for the whole dataset. Table 1.(b) lists the Ratio Rules by PCA which is difficult
In order to select the principal k rules as Ratio Rules, to explain the negative association obviously.
firstly, we rank the whole rules in descending by the
support value. And then, retain the first k principal rules Real Dataset: NBA ( 459 × 11 )
as Ratio Rules because they are more important than
others. About the selection of k value, a simple method is
This dataset comes from basketball statistics
taken such as:
k
obtained from the 97-98 season, including minutes played,
support ( wi ) Point per Game, Assist per Game, etc. The reason why we
min i =1
> threshold (12)
k M
support ( wi ) select this dataset is that it can give a intuitive meaning of
i =1
such latent associations. Table 2 presents the first three
From above (12), Ratio Rules are obtained effectively
Ratio Rules ( RR1 , RR2 , RR3 ) by the PSNMF. Based on a
according that the sum of k support values of rules cover
threshold (i.e.90%) of the grand total support values. general knowledge of basketball, we conjecture the RR1
represent the agility of a player, which gives the ratio of
4. Experiments Assists per Game and Steals, is 0.206:0.220 ≈ 1:1 . It means
that the average player who possess one time of assist per
game will be also steal the ball one time, and so does
Synthetic dataset:
RR2 (0.117 : 0.263 ≈ 1: 2.25) . In this case, traditional method
We have applied both the PSNMF and the PCA to a
dataset that consists of two clusters, which contains 25 cannot give such information behind the dataset.
iteration ,the new state vector to the values that minimize
Table 2. Ratio Rules by PSNMF from NBA the auxiliary function:
field RR1 RR2 RR3 Z ( t +1) = arg min z G ( Z , Z t ) (13)
Games 0.450 Then the objective function L( Z ) is non-increasing when
Minute 0.013
Points Per Game 0.010 Z is updated using (13), because of
Rebound Per 0.117 L( Z ( t +1) ) ≤ G ( Z (t +1) , Z t ) ≤ G ( Z t , Z t ) = L ( Z t ) .
Game
Assists per Game 0.206 Updating H : with W fixed, H is updated by
Steals 0.220 minimizing L( H ) = D (V WH ) . An auxiliary function is
Fouls 0.263
3Points constructed for L( H ) as:
wik hkj' wik hkj'
)=
G( H , H ' vij log vij − vij log ( wik hkj ) − log
5. Conclusion i, j i, j,k k
w h '
ik kj k
wik hkj'
+ yij − vij + λ hij
In this work, we proposed Principal Sparse Non- i, j i, j i, j
aims to learn latent components which are called Ratio difficult to testify G ( H , H ) = L ( H ) . The following proves
Rules. Experimental results illustrate that our Ratio Rules G(H , H ') ≥ L( H ) . Because log( k wik hkj ) is a convex
are more suited for representing associations between
function, the following holds for all i ,j and µijk = 1 :
items than that by PCA. k
To prove the convergence of the leaning algorithm It is easily to prove G (W ,W ) = L(W ) and G (W ,W '
) ≥ L (W )
(6)-(7), an auxiliary function G ( H , Z ' ) is given for likewise. we can get:
objective function L( Z ) with the properties that hlj
wkl = wkl' vkj hlj
) ≥ L ( Z )) and G ( Z , Z ) = L( Z ) ,we will show that the
G (Z , Z ' j k
wkl'hlj j
multiplicative update rule corresponds to setting ,at each This completes the proof.