Feature Selection For A Real-World Learning Task

Feature Selection for a real-world Learning Task
Kollmar, D.; Hellmann, D. H.

Universitt Kaiserslautern, D 67663 Kaiserslautern, Germany {kollmar, hellmann}@rhrk.uni-kl.de
Abstract. Machine learning algorithms used in early fault detection for centrifugal pumps make it possible to better exploit the information content of measured signals, making machine monitoring more economical and application-oriented. The total amount of sensors is reduced by exhausting the information derived from the sensors far beyond the scope of traditional engineering through the application of various features and highdimensional decision-making. The feature selection plays a crucial role in modelling an early fault detection system. Due to presence of noisy features with outliers and correlations between features a correctly determined subset of features will distinctly improve the classification rate. In addition the requirements for the hardware to monitor the pump decrease therefore its price. Wrappers and filters, the two major approaches for feature selection described in literature [4] will be investigated and compared using real-world data.
The machine learning task

The process industry ensures many commodities of our modern society like chemicals, pharmaceutics and nutrition. In most plants pumps are the driving force and their availability directly determines the production output. Smaller process pumps are low price, highly standardised products; hundreds and even thousands of them are employed in one large plant alone. Due to the diversity of the processes and the media handled, the actual design of each pump as well as its operating range vary depending on the type of plant. Mostly, pump damage is not only caused by ageing and natural wear, but also by impermissible operating conditions such as operation at minimum flow or dry-running. Market competition has put a focus on the decrease of operating costs, namely production outages due to pump downtimes and plant design costs which are strongly driven by redundant pumps and the additional piping and instrumentation required therefore. Traditional machine monitoring techniques employ one or more sensors per fault whose signals are interpreted by a human expert. Sterile, hazardous or toxic environments put strong impositions on sensors; the price of some measurement chains almost equals the pumps price. In former times skilled service staff detected faults such as a bearing failure or cavitation from alterations of the sound emitted by a pump even with complex mixtures caused by adjacent machinery.
Selective Filtering of Informationen
Man as a Model for Machine Learning

Fig. 1. Extracting Information from Complex Signals
This implicit monitoring system fulfils the requirements of sensor minimisation and even does not require any media-wetted sensors which could conflict with certain aggressive products. Therefore the human expert is the ideal model for the early fault detection system.
Machine Learning meets Pump Monitoring

In the DFG1-funded project Machine Learning applied to early fault detection for failure critical components [12] the application of machine learning algorithms for pump monitoring has been investigated for the first time. The interdisciplinary collaboration of pump experts and computer scientists was a major asset in this new approach. The data is obtained from velocity probes mounted on the pump casing. The amplitude and phase frequency spectra carry a huge amount of information despite strong correlations between certain frequencies. The machine learning algorithm proposed for this project is See5 [11] a very fast and powerful inducer for decision trees. Many comparative studies of machine learning algorithms as by Lim and Loh [5] or the Statlog Project [7] proved the competitive classifying ability and the outstanding computing performance of this program. A recent study by Martens et. al. [6] comparing C4.5, the precessor of See5, and NEFClass, a late neural network with fuzzy logic, reflects these results:
100 Classification Rate 80 60 40 20 0
Aucrx C4.5
Breast
Heart
Monks1
Monks2
Monks3
Pima
Average
NEFClass
Database
Fig. 2. Comparison of C4.5 and NEFClass on various machine learning problems [6].
Neither of the algorithms has a constantly superior classification rate; depending on the type of learning problem the neural network or the decision tree is better. On average the decision tree even outperforms the neural network regarding the classification rate. As shown by Quinlan in [9] neural networks are advantageous, when most features are to be exploited in parallel for making up a decision, whereas the decision tree can handle features, which are only locally relevant. The major advantage of decision trees is their training time which is 2 3 orders of magnitude shorter compared to neural networks. A further asset of See5 are cost matrices which allow the user to model a real-world problem The individual importance of the correct classification of different classes as well as the ratio of false alarms and undetected faults can be influenced. Centrifugal pumps cease to operate when the medium contains too much gas. The gas intensity of the liquid is a continuous variable and any threshold for defining classes is arbitrary.
predicted class
Fig. 3. Cost matrix for modeling a pump monitoring task in See5

1
Deutsche Forschungsgemeinschaft (German Research Association)
true class
As shown in figure 3 costs can be employed to model the physical relationship among the different gas classes (GA, GB and GC). 0 refers to a correct classification, 1 describes a misclassification into a neighbouring class and 3 refers to a total misclassification. Blockage (pressure sided valve closed) is a purely binary fault which is translated into misclassification costs of 3 for any different class. During optimisation of a pump monitoring system, single cost values can be adjusted to influence the classification ability of the decision tree. Although the overall error rate often increases the errors can be shifted to harmless misclassifications, e. g. the confusion of neighbouring classes. In the following study the cost matrix remains unchanged in order to ensure the comparability of the results. The average costs which in the following will be used to evaluate different classifiers are obtained by dividing the cumulated costs for the unseen test data sets (3 cross validations) by their population. The process of modelling an early fault detection system is described in figure 4. The pump is equipped with different sensors and operated under normal operating conditions at various flow rates and under every fault to be detected. In a preliminary study only the frequencies carrying information are preselected by a human expert in order to reduce the amount of data. Up to 40 features are generated from one sensor. A typical database consists of 30.000 datasets with 30 features each.
6 Vibration Velocity [mm/s]
Preselection
Cavitation
Normal Operating
0 0 50 100 150 200 Frequency [Hz]
Speed < 2950

yes no
Characteristic features with high Content of Information

Features Speed 1. Harmonic (Amplitude) 1. Harmonic (Phase Angle) 4. Harmonic (Amplitude) Normal Operating 2943 2,7 mm/s 112 1,4mm/s Cavitation 2945 3,7 mm/s 180 2,8 mm/s
1. Harmonic > 3,4

yes no
Dry Run
Cavitation
Normal Operating
Fig. 4. Defining features from spectra and using them for growing a decision tree
Wrappers for Feature Selection

As shown in figure 5, an exhaustive exploration of the feature space is hardly possible with as few as 10 features due to the number of combinations arising and the resulting computing time.
Z - Number of Combinations
100000 10000 1000 100 10 1 1 3
Z =
i =1
n! i!(n i )!
11
13
15
n - Number of Features
Fig. 5. Evolution of the number of different features subsets for an exhaustive search
MDLM 01
D. Kollmar, D.-H. Hellmann
1/3
Wrappers [4, 8] are auxiliary algorithms wrapped around the basic machine learning algorithm to be employed for classification. They create several different subsets of features and compute a classifier for each of them. According to a criterion, mainly the error rate on unseen data, the optimal subset is composed.
Classification
speed 1. harmonic Classification
See5
...
Model1
speed 1. harmonic
}
d1 features d1 D
...
...
Modelopt
Fig. 6. Wrappers determine the optimal feature subset by running the machine learning algorithm with a sequence of subsets
}
D features
Classification 1. harmonic
See5
...
Modelm
}
dm features dm D
Best Features (BF) [2] Given a data set with N features, N classifiers are trained, each using one feature only. The classifiers are ranked according to the criterion and the best M < N are elected. Best Features does not take into account dependencies between features (feature A is only effective with presence of feature B) nor redundancy among the features and is therefore a fast but very sub optimal approach. Sequential Forward Selection (SFS) [2] In its basic form the feature selection starts with an empty subset. In each round all possible feature subsets are constructed, which contain the best feature combination found in the round before plus one extra feature. A classifier is trained and evaluated based on each subset. This approach is capable of avoiding the selection of redundant features.
1st Round
A B C D E AB AC AD AE BC BD BE CD CE DE
2nd Round
ABE BCE BDE ABCE ABDE
3rd Round
ABDE BCDE ABCDE
stage 2
stage 1
BE
BDE
ABDE
Fig. 7. Double-stage sequential forward-algorithm with 5 features (A, B, C, D, E)
In order to determine dependencies between features, several features have to be introduced simultaneously, the basic scheme is depicted in figure 7. With 25 features the first round requires training of 25 classifiers in a single-stage SFS, 325 classifiers in a double-stage SFS and as many as 2625 classifiers in a triple-stage SFS. Sequential Backward Selection (SBS) [2] The SBS is initialised with the complete feature set. In each round all possible feature subsets are constructed from the current subset omitting one feature. After each round the feature subset with the best criterion is retained. As in the case of the SFS, the single-stage version is capable of detecting redundant features but higher stage orders have to be considered for dealing with dependent features.
0th Round
ABCDE 1 stage
1st Round
ABCD ABCE ABDE ACDE BCDE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
2nd Round
AC AD AE CD CE DE ACD ACE ADE CDE
Best Feature Set ACDE Features to be removed B Fig. 8. 2-stage sequential Backward-Algorithm with 5 features (A, B, C, D, E)
stage 2
AE CD
Branch and Bound (BB) [2] In contrary to the selection algorithms described so far, the BB yields an optimal subset (as by an exhaustive search) provided the following assumption is fulfilled: Adding an extra feature y to a feature subset does not lead to a deterioration of the criterion J() (monotonicity) Feature Sets:
{y1 , y 2 ,..., y D } = Y = 0 1 2 ... D = { }

J
( Eq. 1 )
Criterion:
( ) J( ) J( ) ... J( )
0 1 2 D
(Eq. 2 )
The BB-algorithm is depicted in figure 9:
MDLM 01
1/5
{1,2,3,4,5,6}
{1,3,5,6}
4 {5,6}
6 {3,4}
5 {1,6}
6 {1,2}
Fig. 9. Branch and Bound-Tree for determining the optimal subset containing 2 out of 6 features
The tree is covered from the right to the left side. The criterion c = J({1,2} of the first leaf {1,2} is determined. Next the tree is searched for the first unprocessed node. Departing from this node the rightmost, unprocessed path is chosen. For each node the criterion is determined. If the criterion J({1,3,5,6} < c = J {1,2}, the current path is abandoned. BB is ideal for any classificator which allows for a recursive training when adding or removing a feature, e. g. linear discriminant analysis. In general the monotonicity-assumption only holds when the error rate on training data is chosen as criterion but fails on any criterion based on unseen test data. In this case BB is not optimal any more
Comparison of Wrappers
As BF and BB were considered unsuitable for the pump data, only SBS and SFS have been examined. The results depicted in figure 10 show a strong influence of feature selection on the quality of the resulting classifier. Whereas using all 25 features (i. e. directly employing the data without feature selection) yields average misclassification costs of 0,62, the optimum for a single stage SFS occurs at 17 features with costs of 0,50, but a combination of only 6 features leads to costs of 0,51. A double-stage SFS further reduces the misclassification costs to 0,38. Both SBS trials proved worse than their SFS counterparts although SBS theoretically should retain dependent features due to its deselection strategy.
1,5 Avaerage Costs on Test Data 1,3 1,1 0,9 0,7 0,5 0,3
2-stage sequential Forward 2-stage sequential Backward 1-stage sequential Backward 1-stage sequential Forward
10
Number of Features
15
20
25
Fig. 10. Comparison of SBS and SFS with pump data (30.000 data sets; 25 features; 3 fold crossvalidation averaged in each round) using See5 with a cost matrix (weighted misclassification rate)
A ranking of the features (figure 11) showed the varying effectiveness of the features depending on the selection method. The feature c1 was selected in the first round of the double -stage SFS but in the 21st round of the single-stage SFS. What is more the second feature selected aside c1 in the double -stage selection was n,
which was selected in the 2nd round of the double-stage SFS. The chart shows a strong interdependency between the features and the non-linear behaviour of the inducer.
25
20
Relevance
15
10
vR
c1
c4
s8
v8a v1a
c2
v2p v0p v1p
c0 vB2 vB1 v0a Features
s2
s0
v2a
s1
v4p v4a
c8
veff
s4
v8p
Fig. 11. Ranking of the 25 features in the 4 selections shown in figure 10 (1 = most relevant)
A further obstacle is the influence of the two tuning parameters implemented in See5 on the actual feature selection. Not only the value of the criterion is changed but also the location of the optimum and the best feature subset are altered.
Avaerage Costs on Test Data 1,5 1,3 1,1 0,9 0,7 0,5 0,3 Cases = 10; Pruning = 0,1 Cases = 10; Pruning = 0,01 Cases = 2; Pruning = 1
10 Number of Features
15
20
25
Fig. 12. Influence of the parameters in See5 on a double -stage SFS (30.000 data sets; 25 features; 3 fold crossvalidation averaged in each round). The curve for Pruning=0,01 was started with a triple-stage SFS in the 1st round and then continued as a double -stage SFS
Genetic Wrapper
The comparison of wrappers showed a distinct advantage of multistage wrappers due to their ability of selecting dependent features. Even if the selection is stopped after reaching the optimum, the computing effort remains high because of the high amount of combinations in the beginning. A further disadvantage of the sequential selection is the rigid strategy which does not allow for the removal of features which, in the course of the selection process, have become obsolete. An alternative approach consists of a genetic selection which explores the feature space from a determined or from a random starting point. Genetic algorithms use two parallel strategies for optimising: evolution (small and steady development of the population and selection of the best) mutation (spontaneous creation of new specimen)
MDLM 01
1/7
The flow charts of the genetic selection are shown in figures 13 & 14. The algorithm uses a result stack for the subsets already evaluated and a working stack for new subsets. The subsets in the working stack are sorted with respect to their prognosis, which is the evaluation criterion reached by their parent. In each round the first A subsets are evaluated.
Random create N subsets with M features and store in the working stack repeat until stop condition or working stack is empty with A subsets having the best prognosis remove from working stack train classifier evaluate criterion store subset and criterion in result stack expand subsets Mopt = Number of features of best subset per B subsets in working stack create 1 mutation with Mopt features and store on top of the working stack
Fig. 13. Flow chart for genetic feature selection
The evolution is modelled by single-stage SFS and SBS. After training a classifier from a subset with M features, all possible sequential combinations with M+1 and M-1 features are composed. All subsets which have not been evaluated yet are stored in the working stack.
with each subset to be expanded perform single stage sequential forward selection set prognosis = criterion (subset) add to templist perform single stage sequential backward selection set prognosis = criterion (subset) add to templist for each subset in list subset in result stack ? subset in working stack ? yes prognosis = MAX( prognosis (working stack) prognosis (templist)) store subset in working stack no
yes
no
Fig. 14. Expansion of Subsets
The algorithm will fully exploit the feature space, if no stopping criterion is introduced. If memory is scarce, the size of the working stack can be limited by deleting the subsets having the worst prognosis. When the selection is attracted by a local minimum, the area will be fully explored before the search is continued outside unless a mutation leads to a better subset. The double stage SFS, applied to the pump data, has determined a subset of 10 features with an average cost on test data of 0,38. The least computing effort (stop immediately after a minimum) would have been the training of 1386 decision trees.
0,8
average costs [-]
0,7 0,6 0,5 0,4 0,3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1. Subset 2. Subset 3. Subset
Fig. 15. Genetic feature selection with N = 10 subsets / M = 10 features / A = 100 subsets / B = 2%
After training 610 decision trees the genetic selection (figure 15) comes up with a subset of 9 features at a cost of 0,38. Within 1110 subsets evaluated (Round 12) there are 5 subsets with 9 or 10 features having costs of 0,38 each. The user can choose between different subsets and therefore minimise other criteria imposed on the features such as number or costs of sensors required.
1 0,9 average costs [-] 0,8 0,7 0,6 0,5 0,4 0,3 1 2 3 4 5 6 7
1. Subset
10
11
12
13
14
15
16
17
18
2. Subset
3. Subset
Fig. 16. Genetic feature selection with N = 10 subsets / M = 4 features / A = 100 subsets / B = 2%
Due to the evolutionary strategy the random selected features play a minor role for convergence and quality of the results. The computing time rises if the number of features in the initial selection is far off the optimum, as shown in figure 16. For an unknown dataset it may be worth to start with a single stage SFS and set the initial value M to the number of features at minimum criterion.
Filters for feature selection

Filters are standalone algorithms which determine an optimal subset regardless of the machine learning algorithm envisaged for classification. Filters are mainly used for ML algorithms with categorical classes. The effectiveness of continuous features for continuous classification problems can be directly evaluated using regression techniques. Three groups of filters can be discerned: Wrappers combined with a fast machine learning algorithm, e. g. Nearest Neighbour. The evaluation of the subsets is based on a criterion defined for the classifier of the secondary ML algorithm. Blackbox-algorithms, which directly determine a suitable feature subset (e. g. Eubafes [10]). Evaluation of the feature subset based on the class topology in the feature space. Criteria are the distance between the classes in the subspace (based on the records in the training set) or the classification probability based on the estimated density functions of the classes, e. g. Chernoff, Bhattacharyya or Mahalanobis distance [2]. Criteria which depend on distances in the feature space are strongly influenced by scaling (e. g. due to an altered amplification of a signal used to derive some features) and transformations applied to features as well as the chosen metric :
MDLM 01
1/9
feature y
Class B -Block Distance
nce ista nD idia ucl E
Class A City Tchebychev Distance
; 1 2
1
feature x
Fig. 17. Metrics for the distance between 2 points 1 and 2 in the feature space
In general the Euclidean metric is preferred, but City-Block (sum of components) Tchebycheff (largest component) deliver different kinds of information about the data at modest computing effort. The estimation of the probabilistic distance requires the knowledge of the class density function. In general a normal distribution is assumed but particularly in the case of multi-modal class densities this assumption is illconceived. Analysis of decision trees Occasionally the usage of statistics based on decision trees is used as a filter. In [1] the features are ranked according to their frequency of occurrence in the decision tree.
MM2 > y1 MM1 > x1 MM2 > y1 K2 K1 MM2 > y 2 MM1 > x2 ... K1 K2 K1 K2
feature 2
feature 2
Class 2
1 ss Cla
Class 1
s2 as Cl
feature 1
feature 1
Fig. 18. Orthogonal and oblique class borders and corresponding decision trees
As shown in figure 18 this approach favours features which require oblique borders in the feature space for class separation. Some weighting rules have been considered but none performed satisfyingly on the pump data Weighting of feature frequency with the number of data sets of the node Weighting of feature frequency with the number of correctly classified data sets of the node Weighting of feature frequency with the node level in the tree Comparison of filter algorithms The filtering of features for classification with See5 using a cost matrix is challenging because due to the cost matrix in See5 the optimisation goal differs between filter and classificator.
Several filter criteria, applied in a single-stage SFS, are plotted in figure 19. All criteria have a monotonous evolution, therefore the determination of the most suitable feature subset is arbitrary and has to be based on curvature or a similar criterion.
Criterion relative 100,0% Tschebychev 80,0% City-Block Euklid J3
60,0% Mahalanobis 40,0% Bhattacharyya Euklid J2
20,0%
0,0% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Number of Features
Fig. 19. Criteria of different filters as in [13] for an increasing number of features(SBS) related to 100% for 20 features of the pump problem. The dot indicates the feature subset actually chosen for testing.
In addition a Bayes-classifier and a 1-Nearest-Neighbor-Classifier have been combined with a single stage SFS. The best feature subset determined with each filter was subjected to a parameter study in See5 (figure 20). Most filters lead to poor performing feature subsets. Only the Bhattacharyya-distance achieves similar results as the single-stage SFS wrapper with See5. A general suitability of probabilistic filters is refuted by the Mahalanobis-distance.
other classificators class distance probabilistic distance
Blackbox
0,90 average error costs for test data 0,80 0,70 0,60 0,50 0,40
fe at ur es W ra pp er fo rC 5 Eu ba fe s 0, 5 Eu ba fe s 0, 8 Eu ba fe s 0, 9 Eu ba fe s 1, 0 Ba N ye c s hs te rN ac hb ar
6 m=10 c=0,2
6 m=10 c=1
9 m=5 c=10
20 m=2 c=1 12 m=1 c=1
14 m=2 c=0,5
12 m=1 c=0,2
11 m=2 c=1
8 m=2 c=1
8 m=5 c=2
7 m=5 c=2
5 m=2 c=10
13 m=5 c=2
9 m=1 c=10
Ts ch e
Fig. 20. Average costs for test data (mean value of 3 cross validations) obtained with the best See5-decision tree after a parameter optimisation with 4 values for Cases (m) and 7 values for Pruning (c).
MDLM 01
Bh at ta
20
ch ar yy a M ah al an ob is
Eu kl id
Eu kl id
C ity -B l
by ch ev
J2
J3
oc k
1 / 11
Conclusion
Several feature selection algorithms have been compared on a real world machine learning problem. Due to its flexibility for modelling a complex relationship among classes with cost matrices, See5 has been chosen as induction algorithm. Sequential forward and backward selection has been studied in depth. Using multistage approaches very effective feature subsets have been identified. A new genetic wrapper algorithm reduces computing time while increasing the number of nearly optimal subsets determined. As expected filter algorithms do perform worse than wrappers because of the particular optimisation goal introduced through the cost matrix. The application of filters only seems to be advisable with slowly converging classificators such as neural networks, when computing time limits the usage of wrappers.
References
Cardie, C.: Using decision-trees to improve case-based learning; 10. Conf on Machine Learning; Wien; 1993 Devijer, P.; Kittler, J.: Pattern Recognition A statistical approach; Prentice / Hall; 1982 Duda, R.; Hart, P.: Pattern classification and scene analysis; John Wiley & Sons; 1973 Kohavi, R.: Wrappers for performance enhancement and oblivious decison graphs; Dissertation; Stanford University; 1995 5. Lim, T.-S., Loh, W.-Y. and Shih, Y.-S.: A comparison of prediction accuracy, complexity, and training time of 33 old and new classification algorithms; Machine Leraning; preprint www.recursivepartitioning.com/mach1317.pdf; 1999 6. Martens et al: An initial comparison of a fuzzy neural classifier and a decision tree based classifier, Expert Systems with Applications (Pergamon) 15; 1998 7. Michie, D.; Spiegelhalter, D.; Taylor, C.: Machine Learning, Neural and Statistical Classification; Ellis Horwood; 1994 8. Perner, P.; Apte, C.: Empirical Evaluation of Feature Subset Selection Based on a Real World Data Set, In: D.A. Zighed, J. Komorowski, and J. Zytkow, Principles of Data Mining and Knowledge Discovery, Springer; 2000 9. Quinlan, J. R., Comparing connectionist and symbolic learning methods, Computational Learning Theory and Natural Learning Systems; Constraints and Prospects, ed. R. Rivest; MIT Press, 1994 10.Scherf, M.; Brauer, W.: Feature Selection by Means of a Feature Weighting Approach; Technical Report No FKI-221-97; Forschungsberichte Knstliche Intelligenz, Institut fr Informatik, TU Mnchen; 1997 11.See5 / C5; Release 1.10; Quinlan, J. R. ;www.rulequest.com; 1999 12.Spath, D.; Hellmann, D. H.: Automatisches Lernen zur Strungsfrherkennung bei ausfallkritischen Anlageelementen; Abschlubericht zum DFG- Forschungsprojekt Sp 448/7-3 und He 2585/1-3; 1999 13.Rauber, T.: Tooldiag - Pattern Recognition Toolbox; Version 2.1; www.inf.ufes.br/~thomas/home/tooldiag; 1994 1. 2. 3. 4.

Feature Selection For A Real-World Learning Task

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Feature Selection For A Real-World Learning Task

Uploaded by

Copyright:

Available Formats

Feature Selection for a real-world Learning Task

Kollmar, D.; Hellmann, D. H.

The machine learning task

Man as a Model for Machine Learning

Machine Learning meets Pump Monitoring

Fig. 3. Cost matrix for modeling a pump monitoring task in See5

Deutsche Forschungsgemeinschaft (German Research Association)

6 Vibration Velocity [mm/s]

Speed < 2950

Characteristic features with high Content of Information

1. Harmonic > 3,4

Wrappers for Feature Selection

100000 10000 1000 100 10 1 1 3

D. Kollmar, D.-H. Hellmann

speed 1. harmonic Classification

Fig. 7. Double-stage sequential forward-algorithm with 5 features (A, B, C, D, E)

{y1 , y 2 ,..., y D } = Y = 0 1 2 ... D = { }

The BB-algorithm is depicted in figure 9:

D. Kollmar, D.-H. Hellmann

v2p v0p v1p

c0 vB2 vB1 v0a Features

D. Kollmar, D.-H. Hellmann

Fig. 13. Flow chart for genetic feature selection

Fig. 14. Expansion of Subsets

average costs [-]

0,7 0,6 0,5 0,4 0,3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1. Subset 2. Subset 3. Subset

Filters for feature selection

D. Kollmar, D.-H. Hellmann

Class B -Block Distance

nce ista nD idia ucl E

Class A City Tchebychev Distance

60,0% Mahalanobis 40,0% Bhattacharyya Euklid J2

0,0% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Number of Features

20 m=2 c=1 12 m=1 c=1

D. Kollmar, D.-H. Hellmann

You might also like