A Survey On Incremental Feature Extraction: D2 Makoto Miwa Chikayama & Taura Lab

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 34

A Survey on

Incremental Feature
Extraction
D2 Makoto Miwa
Chikayama & Taura Lab.
Table of Contents
 Introduction
 Dimension Reduction
 Feature Extraction
 Feature Selection
 Incremental Feature Extraction
 Discussion
 Summary
Introduction
 Large and high-dimensional data
 Web documents, etc…
 A large amount of resources are needed in
 Information Retrieval
 Classification tasks
 Data Preservation etc…

Dimension Reduction
Dimension Reduction
Weight (kg) overweight
underweight
60

50

140 150 Height (cm)

Dimension Reduction
 preserves information on classification of overweight and
underweight as much as possible
 makes classification easier
 reduces data size ( 2 features  1 feature )
Dimension Reduction
Weight (kg) weight / height
 Feature Extraction (FE)
 Generates feature
 ex.
 Preserves weight / height

Height (cm)
 Feature Selection (FS) Weight (kg) Weight (kg)
 Selects feature
 ex.
 Preserves weight

Height (cm)
Problem Setting
classes samples

dimensions (features) dimensions

 Each of data ( samples) is represented by features


 Data belong to different classes in supervised learning
 Dimension reduction is to generate or select features
preserving original information as much as possible in some
criterion
In most cases
Feature Extraction
 Extracts features by projecting data to a lower-d
imensional space
 Unsupervised Method
 Principal Component Analysis (PCA)
 Independent Component Analysis (ICA)
 Supervised Method
 Linear Discriminant Analysis (LDA)
 Maximum Margin Criterion (MMC)
 Orthogonal Centroid algorithm (OC)
 Finds an optimal projection matrix
Principal Component Analysis
 Unsupervised Method

 PCA tries to maximize

 PCA needs Singular Value


Decomposition calculation
(SVD).
time complexity :
: covariance matrix
space complexity :
Linear Discriminant Analysis
 Supervised Method
 LDA tries to maximize

 LDA needs to calculate


 samples required
 LDA needs SVD
 time complexity :
 space complexity: : Interclass scatter matrix
: Intraclass scatter matrix
Orthogonal Centroid
[Park et al. 2003]
 Supervised Method
 OC tries to maximize

 OC solves the problem


by QR decomposition.
 time complexity :
 space complexity :
Feature Extraction (Accuracy)
Bad

Face database (AR)


original inner class scatter is large
Feature Extraction (Accuracy)
Bad

Document database (Reuters-21578)


original inner class scatter is small
Feature Extraction (Time)
Bad

4 datasets
(2 face sets & 2 documents)
Feature Selection
 Selects features in some criterion
 Information Gain (IG)
 Chi-square value (CHI)
 Orthogonal Centroid Feature Selection (OCFS)
Feature Selection by CHI
 Chi-square value
 represents the strength of the correlation between
classes and a feature

:# of and co-occurs :# of occurs without


:# of occurs without :# of neither nor occurs

prior probability of
Feature Selection by CHI
 Feature selection by CHI
 Select top features in chi-square value
 The main computational time is spent on the
calculation of
 time complexity :
 space complexity:
Feature Selection (Accuracy)

789,670 documents
Feature Selection (Time)
FE vs FS
time complexity space complexity

PCA

LDA

OC

CHI

In most cases
FE vs FS
 FE needs matrix computation
 High computational and spatial Cost
 FE finds an optimal solution
 FS doesn’t need matrix computation
 Fast
 FS can treat very high dimensional data
 FS finds a nearly optimal solution
FE vs FS
Table of Contents
 Introduction
 Dimension Reduction
 Feature Extraction
 Feature Selection
 Incremental Feature Extraction
 Summary
Incremental Feature Extraction
 Feature extraction can’t treat large data at once
 Data aren’t always presented at once
 Some data may arrive later

 Incremental feature extraction


 splits large data into several small data and treats them
incrementally without keeping the large data
 can treat newly presented data without learning from
beginning
Incremental Feature Extraction
 Based on FE algorithms
 PCA  Incremental PCA
 LDA  Incremental LDA
 MMC  Incremental MMC
 OC Incremental OC
Incremental Linear Discriminant A
nalysis (1) [Pang et al. 2005]
 Data are presented as a stream of chunks

 ILDA updates and using newly


presented data , , , and
: mean of
: mean of class in
Incremental Linear Discriminant A
nalysis (2) [Pang et al. 2005]
 ILDA doesn’t directly update
 ILDA needs SVD like LDA, but its treating matrix
is smaller than LDA.
 time complexity
: chunk size
(an update )
 space complexity
LDA
time complexity :
space complexity:
In most cases
Incremental LDA (Time)

100 persons × 5face (56×46 pixels)


Incremental LDA (Memory)

100 persons × 5face (56×46 pixels)


Incremental Orthogonal Centro
id (1) [Yan et al. 2006]
 Data are presented one by one
 IOC estimates by the following equation
Incremental Orthogonal Centro
id (2) [Yan et al. 2006]

: Lagrange multipliers

: th estimated eigenvector  
(column of projection matrix)
Incremental Orthogonal Centro
id (3) [Yan et al. 2006]
 IOC directly updates eigenvectors
(=projection matrix ) incrementally
Fast, although not exactly so in early stages.
 time complexity (update )
 space complexity

OC
time complexity :
space complexity :

In most cases
Incremental OC
IPCA IMMC IOC CHI OCFS

F1 0.4806 0.8063 0.8169 0.4475 0.4475

Time (s) 22,960.17 48,813.51 26,884.6 68.83 35.96

Coverage step 1,362 4,657 1,776

Reuters Corpus Volume1


(500,000 dimensions  3 dimensions)
 IOC is precise
 IOC is fast in supervised FE methods
 FS methods are much faster than Incremental FE methods
Discussion
 Thanks to incremental FE,
 FE can treat large scale data
 FE can update fast when new data arrived.
 Like FE, incremental FE also can’t treat extremely
high dimensional data
 FS can’t find an optimal solution

 A method that generates features like FE at a low


spatial cost like FS is needed
 FS + FE
 Feature generation + FS
Summary
 Dimension Reduction methods
 Feature Extraction
 precise
 Feature Selection
 fast
 Incremental Feature Extraction
 enables FE to treat large scale data
 enables FE to treat newly presented data

You might also like