A Survey On Incremental Feature Extraction: D2 Makoto Miwa Chikayama & Taura Lab

A Survey on
Incremental Feature
Extraction
D2 Makoto Miwa
Chikayama & Taura Lab.
Table of Contents
 Introduction
 Dimension Reduction
 Feature Extraction
 Feature Selection
 Incremental Feature Extraction
 Discussion
 Summary
Introduction
 Large and high-dimensional data
 Web documents, etc…
 A large amount of resources are needed in
 Information Retrieval
 Classification tasks
 Data Preservation etc…
Dimension Reduction
Dimension Reduction
Weight (kg) overweight
underweight
60
50
140 150 Height (cm)
Dimension Reduction
 preserves information on classification of overweight and
underweight as much as possible
 makes classification easier
 reduces data size ( 2 features  1 feature )
Dimension Reduction
Weight (kg) weight / height
 Feature Extraction (FE)
 Generates feature
 ex.
 Preserves weight / height
Height (cm)
 Feature Selection (FS) Weight (kg) Weight (kg)
 Selects feature
 ex.
 Preserves weight
Height (cm)
Problem Setting
classes samples
dimensions (features) dimensions
 Each of data ( samples) is represented by features

 Data belong to different classes in supervised learning
 Dimension reduction is to generate or select features
preserving original information as much as possible in some
criterion
In most cases
Feature Extraction
 Extracts features by projecting data to a lower-d
imensional space
 Unsupervised Method
 Principal Component Analysis (PCA)
 Independent Component Analysis (ICA)
 Supervised Method
 Linear Discriminant Analysis (LDA)
 Maximum Margin Criterion (MMC)
 Orthogonal Centroid algorithm (OC)
 Finds an optimal projection matrix
Principal Component Analysis
 Unsupervised Method
 PCA tries to maximize
 PCA needs Singular Value

Decomposition calculation
(SVD).
time complexity :
: covariance matrix
space complexity :
Linear Discriminant Analysis
 LDA tries to maximize
 LDA needs to calculate

 samples required
 LDA needs SVD
 time complexity :
 space complexity: : Interclass scatter matrix
: Intraclass scatter matrix
Orthogonal Centroid
[Park et al. 2003]
 OC tries to maximize
 OC solves the problem

by QR decomposition.
 space complexity :
Feature Extraction (Accuracy)
Bad
Face database (AR)

original inner class scatter is large
Feature Extraction (Accuracy)
Bad
Document database (Reuters-21578)

original inner class scatter is small
Feature Extraction (Time)
Bad
4 datasets
(2 face sets & 2 documents)
Feature Selection
 Selects features in some criterion
 Information Gain (IG)
 Chi-square value (CHI)
 Orthogonal Centroid Feature Selection (OCFS)
Feature Selection by CHI
 Chi-square value
 represents the strength of the correlation between
classes and a feature
:# of and co-occurs :# of occurs without

:# of occurs without :# of neither nor occurs
prior probability of
Feature Selection by CHI
 Feature selection by CHI
 Select top features in chi-square value
 The main computational time is spent on the
calculation of
 space complexity:
Feature Selection (Accuracy)
789,670 documents
Feature Selection (Time)
FE vs FS
time complexity space complexity
PCA
LDA
OC
CHI
In most cases
FE vs FS
 FE needs matrix computation
 High computational and spatial Cost
 FE finds an optimal solution
 FS doesn’t need matrix computation
 Fast
 FS can treat very high dimensional data
 FS finds a nearly optimal solution
FE vs FS
Table of Contents
 Introduction
 Dimension Reduction
 Summary
Incremental Feature Extraction
 Feature extraction can’t treat large data at once
 Data aren’t always presented at once
 Some data may arrive later
 Incremental feature extraction

 splits large data into several small data and treats them
incrementally without keeping the large data
 can treat newly presented data without learning from
beginning
Incremental Feature Extraction
 Based on FE algorithms
 PCA  Incremental PCA
 LDA  Incremental LDA
 MMC  Incremental MMC
 OC Incremental OC
Incremental Linear Discriminant A
nalysis (1) [Pang et al. 2005]
 Data are presented as a stream of chunks
 ILDA updates and using newly

presented data , , , and
: mean of
: mean of class in
Incremental Linear Discriminant A
nalysis (2) [Pang et al. 2005]
 ILDA doesn’t directly update
 ILDA needs SVD like LDA, but its treating matrix
is smaller than LDA.
 time complexity
: chunk size
(an update )
 space complexity
LDA
time complexity :
space complexity:
In most cases
Incremental LDA (Time)
100 persons × 5face (56×46 pixels)

Incremental LDA (Memory)
100 persons × 5face (56×46 pixels)

Incremental Orthogonal Centro
id (1) [Yan et al. 2006]
 Data are presented one by one
 IOC estimates by the following equation
id (2) [Yan et al. 2006]
: Lagrange multipliers
: th estimated eigenvector 　
(column of projection matrix)
id (3) [Yan et al. 2006]
 IOC directly updates eigenvectors
(=projection matrix ) incrementally
Fast, although not exactly so in early stages.
 time complexity (update )
 space complexity
OC
time complexity :
space complexity :
In most cases
Incremental OC
IPCA IMMC IOC CHI OCFS
F1 0.4806 0.8063 0.8169 0.4475 0.4475
Time (s) 22,960.17 48,813.51 26,884.6 68.83 35.96
Coverage step 1,362 4,657 1,776
Reuters Corpus Volume1

(500,000 dimensions  3 dimensions)
 IOC is precise
 IOC is fast in supervised FE methods
 FS methods are much faster than Incremental FE methods
Discussion
 Thanks to incremental FE,
 FE can treat large scale data
 FE can update fast when new data arrived.
 Like FE, incremental FE also can’t treat extremely
high dimensional data
 FS can’t find an optimal solution
 A method that generates features like FE at a low

spatial cost like FS is needed
 FS + FE
 Feature generation + FS
Summary
 Dimension Reduction methods
 precise
 fast
 enables FE to treat large scale data
 enables FE to treat newly presented data

A Survey On Incremental Feature Extraction: D2 Makoto Miwa Chikayama & Taura Lab

Uploaded by

Copyright:

Available Formats

You might also like

A Survey On Incremental Feature Extraction: D2 Makoto Miwa Chikayama & Taura Lab

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Survey On Incremental Feature Extraction: D2 Makoto Miwa Chikayama & Taura Lab

Uploaded by

Copyright:

Available Formats

A Survey on

140 150 Height (cm)

dimensions (features) dimensions

 Each of data ( samples) is represented by features

 PCA tries to maximize

 PCA needs Singular Value

 LDA needs to calculate

 OC solves the problem

Face database (AR)

Document database (Reuters-21578)

:# of and co-occurs :# of occurs without

 Incremental feature extraction

 ILDA updates and using newly

100 persons × 5face (56×46 pixels)

100 persons × 5face (56×46 pixels)

F1 0.4806 0.8063 0.8169 0.4475 0.4475

Time (s) 22,960.17 48,813.51 26,884.6 68.83 35.96

Coverage step 1,362 4,657 1,776

Reuters Corpus Volume1

 A method that generates features like FE at a low

You might also like