Professional Documents
Culture Documents
Engineering Applications of Arti Ficial Intelligence
Engineering Applications of Arti Ficial Intelligence
art ic l e i nf o a b s t r a c t
Article history: This paper presents a new approach that avoids the over-fitting and complexity problems suffered in the
Received 2 January 2014 construction of decision trees. Decision trees are an efficient means of building classification models,
Received in revised form especially in industrial engineering. In their construction phase, the two main problems are choosing
16 June 2014
suitable attributes and database components. In the present work, a combination of attribute selection
Accepted 16 June 2014
Available online 8 July 2014
and data sampling is used to overcome these problems. To validate the proposed approach, several
experiments are performed on 10 benchmark datasets, and the results are compared with those from classical
Keywords: approaches. Finally, we present an efficient application of the proposed approach in the construction of non-
Decision tree construction complex decision rules for fault diagnosis problems in rotating machines.
Pruning
& 2014 Elsevier Ltd. All rights reserved.
Research graph
Attribute selection
Data sampling
1. Introduction et al., 2011), and Bayesian classifiers (Yang et al., 2005) have all
been applied. However, decision tree techniques are still preferred
In the industrial field, the risks of failure and disruption are in engineering applications, because they allow users to easily
increasing with the complexity of installed equipment. This understand the behaviour of the built models against the above-
phenomenon affects product quality, causes the immediate shut- mentioned classifiers. Their use in such applications has been reported
down of a machine, and undermines the proper functioning of an in numerous research papers, e.g. Sugumaran and Ramachandran
entire production system. Rotating machines are a major class of (2007), Zhao and Zhang (2008), Sakthivel et al. (2010), and Sugumaran
mechanical equipment, and need the utmost care and continuous et al. (2007).
monitoring to ensure optimal operation. Traditionally, vibration The construction of a decision tree (DT) includes growing and
analyses and many signal processing techniques have been used to pruning stages. In the growing phase, the training data (samples)
extract useful information for monitoring the operating condition. are repeatedly split into two or more descendant subsets, accord-
Khelf et al. (2013) analysed the frequency domain to extract ing to certain split rules, until all instances of each subset wrap the
information and diagnose faults. Cepstral analysis has been used same class (pure) or some stopping criterion has been reached.
to construct a robust gear fault indicator (Badaoui et al., 2004), Generally, this growing phase outputs a large DT that includes the
and a short-time Fourier transform representation was derived learning examples and considers many uncertainties in the data
(Mosher et al., 2003). Other techniques have also been employed, (particularity, noise and residual variation). Pruning approaches
such as the Wigner–Ville distribution (Baydar and Ball, 2001), based on heuristics prevent the over-fitting problem by removing
continuous wavelet analysis (Kankar et al., 2011), and discrete all sections of the DT that may be based on noisy and/or erroneous
wavelet analysis (Djebala et al., 2008). data. This reduces the complexity and size of the DT. The pruning
Classification algorithms can be used in the construction of phase can under-prune or over-prune the grown DT. Moreover,
condition-monitoring diagnostic systems. For example, neural many existing heuristics are very challenging (Breiman et al., 1984;
networks (Chen and Chen, 2011), support vector machines (Deng Niblett and Bratko, 1987; Quinlan, 1987), but, unfortunately, no
single method outperforms the others (Mingers, 1989; Esposito
n
et al., 1997).
Corresponding author.
E-mail addresses: karabadji@labged.net (N.E.I. Karabadji),
In terms of growing phase problems, there are two possible
seridi@labged.net (H. Seridi), ilyeskhelf@gmail.com (I. Khelf), solutions: the first reduces DT complexity by reducing the number
azizi@labged.net (N. Azizi), ramzi86@hotmail.com (R. Boulkroune). of learning data, simplifying the decision rules (Piramuthu, 2008).
http://dx.doi.org/10.1016/j.engappai.2014.06.010
0952-1976/& 2014 Elsevier Ltd. All rights reserved.
72 N.E.I. Karabadji et al. / Engineering Applications of Artificial Intelligence 35 (2014) 71–83
The second solution uses attribute selection to overcome over- misclassification correction depends on the number of leaves and
fitting problems (Yildiz and Alpaydin, 2005; Kohavi and John, misclassifications.
1997). To overcome both the DT size and over-fitting risks, we Error-based pruning (EBP, Quinlan, 1993) is an improved ver-
propose to combine attribute selection and data reduction to sion of PEP that traverses the tree according to a bottom-up post-
construct an Improved Unpruned Decision Tree IUDT . The opti- order strategy. No pruning dataset is required, and the binomial
mal DT construction (DTC) problem will thus be converted into an continuity correction rate of PEP is used. Therefore, the difference
exploration of the combinatorial graph research space problem. is that, in each iteration, EBP considers the possibility of grafting a
The key feature of this proposition is to encode each subset of branch ty in place of the parent of y itself. The estimation errors
attributes Ai and a samples subset Xj into a couple ðAi ; X j Þ. All t x ; t y are calculated to determine whether it is convenient to prune
possible ðAi ; X j Þ couples form the research space graph. The results node x (the tree rooted by x replaced by a leaf), replace it with ty
show that the proposed schematic largely improves the tree (the largest sub-tree), or keep the original tx.
performance compared to standard pruned DTs, as well as those Recently, Luo et al. (2013) developed a new pruning method
based solely on attribute selection or data reduction. based on the structural risk of the leaf nodes. This method was
The rest of the paper is organized as follows: In Section 2, some developed under the hypothesis that leaves with high accuracies
previous studies on DTC are briefly discussed. Section 3 introduces mean that the tree can classify the training data very well, and a
the main notions used in this work. In Section 4, we describe our large volume of such leaves implies generally good performance.
approach based on attribute selection and database sampling to Using this hypothesis, the structural risk measures the product of
outperform conventional DTC. Section 5 reports the experimental the accuracy and the volume of leaf nodes. As in common pruning
results using 10 benchmark datasets. In Section 6, IUDT is applied methods, a series of sub-trees are generated. The process visits
to the problem of fault diagnosis in rotating machines. Finally, each node x on DT T0 (tx is a sub-tree whose root is x). For each
Section 7 concludes the study. sub-tree tx, feasible pruning nodes are found (their two children
are leaves), and the structural risks are measured. Finally, the sub-
tree that maximizes the structural risk is selected for pruning.
Additional post-pruning methods have been proposed, such as
2. Related work critical value pruning (Mingers, 1987), minimum error pruning
(Niblett and Bratko, 1987), and DI pruning (which balances both
This section describes post-pruning approaches that have been the Depth and the Impurity of nodes) (Fournier and Crémilleux,
proposed to improve DTC. Their common aim was to decrease 2002). The choice of DT has also been validated (Karabadji et al.,
(1) the tree complexity and (2) the error rate of an independent 2012), and genetic algorithms used to pull out the best tree over a
test dataset. Pruning methods have various differences that can be set of different models (e.g. BFTree, J48, LMT, Hall et al., 2009). To
summarized as follows: select the most robust DT, all models were generated and their
performances measured on distinct training and validation sets. In
1. the necessity of the test dataset; this work, the main objective is to construct DTs without under-
2. the generation of a series of pruned sub-trees or the processing pruning or over-fitting the training dataset, and without choosing
of a single tree; between different pruning methods. Two prior works have shown
3. the pruning determination criteria. that unpruned DTs give similar results to pruned trees when a
Laplace correction is used to calculate the class probabilities
Breiman et al. (1984) developed error-complexity pruning, which (Bradford et al., 1998; Provost and Domingos, 2003).
uses the cost-complexity risk. The pruning measure uses an error The identification of smaller sets of highly predictive attributes
rate penalty based on the sub-tree size. The errors and the size of has been considered by many learning schemes. Attribute selec-
the tree's leaves (complexity) are both considered in this pruning tion shares the same objective as pruning methods, namely the
method. The cost-complexity risk measurement of all possible elimination of irrelevant, redundant, and noisy attributes in the
sub-trees in an initial DT T0 is calculated as the training error R(t) building phase to produce good DT performance. Many studies
added to the product of a factor α and the number of leaves jt j in have investigated and improved classification models (Bermejo
the sub-tree t, i.e. RC α ðtÞ ¼ RðtÞ þ αðjt jÞ. A series of sub-decision et al., 2012; Macaš et al., 2012). In these works, wrapper techni-
trees with the smallest value of α are selected to be pruned. Finally, ques have been applied to attribute selection. A target learning
the correctly pruned sub-tree t is selected from the α sequence of algorithm is used to estimate the value of attribute subsets. The
sub-trees using an independent test dataset. The final selection is process is driven by the binary relation “D” between attribute
based on the error rate or standard error (assuming a binomial subsets. The search process can be conducted on a depth-first or
distribution). breadth-first basis, or a combination of both (e.g. “A star” (An)
Reduced-error pruning, proposed by Quinlan (1987), produces algorithm). Wrappers are generally better than filters, but the
a series of pruned DTs using the test dataset. A complete DT T0 is improved performance comes at a computational cost—in the
first grown using the training dataset. A test dataset is then used, worst case, 2m subsets of attributes must be tested (m is the
and for each node in T0, the number of classification errors made number of attributes) (Kohavi and John, 1997).
on the pruning set when the sub-tree t is kept is compared with Similar to attribute selection, DTs can be improved by reducing
the number of classification errors made when t is turned into the data complexity, as well as reducing the effects of unwanted
a leaf. Next, the positive difference between the two errors is data characteristics. Data reduction essentially involves dimen-
assigned to the sub-tree root node. The node with the largest sionality reduction and/or example reduction (Piramuthu, 2008).
difference is then pruned. This process is repeated until the Generally, reduction methods use sampling (e.g. random, strati-
pruning increases the misclassification rate. Finally, the smallest fied) to select examples for consideration in the learning phase
version of the most accurate tree with respect to the test dataset is (Ishibuchi et al., 2001; Liu, 2010).
generated. In conclusion, different pruning techniques have been studied,
In contrast to reduced-error pruning, the necessity of separate but none is adequate for all varieties of problem. There has been a
test datasets can be avoided using pessimistic error pruning (PEP, recent focus on attribute selection and sampling data to improve
Quinlan, 1987). This uses the binomial continuity correction rate to DTC. To realize a better DT for a specific application, we propose
obtain a more realistic estimate of the misclassification rate. The the IUDT algorithm, which combines a novel scheme of random
N.E.I. Karabadji et al. / Engineering Applications of Artificial Intelligence 35 (2014) 71–83 73
database sampling with an attribute selection wrapper process. P s is said to be direct (immediate) if it only associates φ with the most
The main objective of our study is to reduce the effective number general patterns of the set of patterns that are more specific P s ðφÞ,
of examples and training data attributes, and thus minimize the P s ðφÞ ¼ minðP s ðφÞÞ. Then, an operator P g is said to be direct (immedi-
size of the DT. ate) if it only associates φ with the most specific patterns of the set of
patterns that are more general P g ðφÞ, P g ðφÞ ¼ maxðP g ðφÞÞ.
3. Preliminaries
3.1.3. Minimum and maximum patterns
Let be a specialization relation in L, and ΦD L be a patterns set.
Before describing our approach, we give some basic results
minðΦÞ is the most general patterns set of Φ, and maxðΦÞ is the
on the ordering of sets, classifier over-fitting problems, attribute
most specific patterns set of Φ.
relevance and redundancy problems, and finally DTC. The definitions
presented in this section use the same semantics as in Davey (2002). minðΦÞ ¼ fφ A Φj∄θ A Φ s:t: θ ! φg
Definition 4 (Incremental usefulness). “Given data D, a learning This section describes the idea of replacing the post-pruning
algorithm T, and a subset of attributes X, the attribute e is phase by attribute selection and dataset reduction. The main
incrementally useful to T with respect to X if the accuracy of the objective is to illustrate that the performance of an unpruned DT
hypothesis that T produces using the group of attributes feg [ X can be improved using these two steps. Wrapper attribute selec-
is better than the accuracy achieved using just the subset of tion can eliminate irrelevant and redundant attributes, and data
attributes X” (Caruana and Freitag, 1994). reduction reduces the size complexity problem. We must therefore
determine both the best attribute subset and the subset of training
To obtain a predictive feature subset of attributes, the above examples used to build the best unpruned decision tree tn. The key
definition is useful in the present work. features of the proposed method are as follows: (i) data prepro-
cessing, (ii) definition of attribute combinations as a level-wise
3.3.3. Redundancy research space (specialization graph), and (iii) application of an
Attribute redundancy in machine learning is widely accepted as oriented breadth-first exploration in the research space to find the
the correlation between attributes. Two attributes are redundant best unpruned decision tree tn. Fig. 5 presents a schematic of the
to each other if their values are completely correlated. Correlation proposed method.
between two variables can be checked based on the entropy, or
the random variable uncertainty (Xing et al., 2001; Liu and Yu, 4.1. Data preprocessing
2005).
Experimental analyses on DT pruning methods give the follow-
3.4. Decision trees ing split of training and test data: 25 instances of 70% training and
30% test data (Esposito et al., 1997), 9 instances of 60% training and
DTs are built recursively, as illustrated in Fig. 4, following a top- 40% test data (Mingers, 1989), and 10 instances of 25% training and
down approach. They are composed of a root, several nodes, 75% test data (Bradford et al., 1998). Multiple instances were used
branches, and leaves. DTs grow according to the use of an attribute to avoid the assumption that “a single random split of data may
sequence to divide training examples into n classes. Tree building give unrepresentative results”. To obtain representative results
can be described as follows. First, the indicator that ensures in the experimentation stage, and to reduce the size of the target
the best split of the training examples is chosen, and population robust unpruned DT, the dataset was randomly split into a 50%
subsets are distributed to new nodes. The same operation is training set and a 50% test set five times. Each training set was
repeated for each node (subset population) until no further split further split at random into three sub-training sets containing 50%
N.E.I. Karabadji et al. / Engineering Applications of Artificial Intelligence 35 (2014) 71–83 75
Example 2. For a dataset that contains two attributes fa; bg, the
couples set is represented as follows:
R ¼ fða; 1Þ; …; ða; 15Þ; ðb; 1Þ; …; ðb; 15Þ; ða; b; 1Þ; …; ða; b; 15Þg
Fig. 7 illustrates the research graph P〈R; D〉.
REPTree, SimpleCart), which is subject to the constraint of avoid- the UCI collection (Blake and Merz, 1998). The code is implemen-
ing the pruning phase. Every constructed DT is evaluated accord- ted in Java using the WEKA framework (Hall et al., 2009) and
ing to formula (3). Algorithm 1 gives the pseudo-code of the GUAVA Google library (Bourrillion et al., 2010). Experiments were
proposed traversal method. conducted and compared with the results from the original WEKA
pruned DTs, i.e. DTP (Decision Tree construction technique with
Algorithm 1. j The best unpruned DT search function.
use of Pruning phase), an “attribute selection” algorithm imple-
Input: A dataset D, a language LT, predicates IRP and PRP. mentation that behaves like a wrapper algorithm, i.e. IUDTAS
Output: The optimal DT t n ðX; iÞ. (Improved Unpruned Decision Tree construction using only
1: L1 ¼ fðe; iÞjeA E; iA Ig Attribute Selection), and a second algorithm that considers only
2: C 1 ¼ fðe; iÞjðe; iÞ A L1 ; IRPðtðe; iÞÞ and PRPðtðe; iÞÞg the sampling step, i.e. IUDTSD (Improved Unpruned Decision Tree
3: t n ¼ Best of ðC 1 Þ construction using only Sampling Data).
4: i¼1 The datasets are described in Table 1. The “% Base error”
5: while Ci is not empty do column refers to the percentage error obtained if the most
6: Li þ 1 ¼ fðY; iÞj 8 ðX; iÞ A C i ; 8 e A E; Y ¼ X [ eg frequent class is always predicted.
7: C i þ 1 ¼ fðX; iÞjðX; iÞ A Li þ 1 ; IRPðtðX; iÞÞ and PRPðtðX; iÞÞg In addition, the selected datasets represent various applica-
8: t 0 ¼ Best of ðC i þ 1 Þ tions. Three DTs, namely J48, SimpleCart, and REPTree, were
9: if Accuracyðt n Þ o Accuracyðt 0 Þ then chosen to apply different pruning methods within their standard
10: tn ¼ t0 WEKA implementation. The main DT characteristics considered in
11: end if the experiments are reported in Table 2.
12: i¼i þ1 The experiments are based on the DTs reported in Table 2,
13: end while implemented in their standard settings but without the pruning
14: return tn phase (unpruned DTs).
Tables 3, 4 and 5 list the classification accuracy for J48, REPTree,
and SimpleCart, respectively, when the IUDT algorithm is applied
to the 10 experimental datasets. The tables show that the number
Until recently, every wrapper algorithm that considered more
of attributes used to construct the DT is less than the number of
than 30 attributes was computationally expensive. There are many
data attributes, and only about one-sixth are used in the case
methods of speeding up the traversal process. The principal
of large attribute sets. The accuracy results show that the DTs
concepts are based on minimizing the evaluation subsets (Hall
outperform the %-based accuracy when only the most frequent
and Holmes, 2003; Bermejo et al.; Ruiz et al., 2006), or using a
class is continually predicted.
randomized search (Stracuzzi and Utgoff, 2004; Huerta et al.,
Table 6 shows WEKA's DTs built using the pruning phase (DTP).
2006). To tackle this problem, the incremental relevance and a
The design of these trees is mostly based on the construction (growing
proportional of 5% of the best subset's accuracy properties are
and pruning) phase, where 50% of the examples (instances) are used
adopted. The accuracy of the DT constructed using couple (X,i) is
as the training set.
calculated as follows:
!
As in the proposed approach, each dataset was randomly split
I into training and test sets five times (50% split). The results
AccuracyðtðX; iÞÞ ¼ average ∑ AccuracyðtðX; iÞ; aÞ þ AccuracyðtðX; iÞ; wðiÞÞ :
aai reported in Table 6 represent the mean prediction accuracy over
ð3Þ the five i-test datasets, which gives a comparison of accuracy and
size between the proposed approach and the standard implemen-
Fig. 8 shows an instance of IRP and PRP. IRP aims to eliminate tation DTP, i.e. using the pruning phase.
the generated candidates Lk þ 1 by the addition of the gray and Table 6 shows that results differ from one model to another.
green attributes (i.e. IRP is not satisfied), and PRP aims to keep the Generally, the J48 trees are larger than the REPTree and SimpleCart
attribute subsets that have an accuracy that is greater than that of DTs. This phenomenon is due to the pruning technique applied. In
the best of the intermediate candidates ðLk þ 1 Þ minus 5% (e.g. on contrast to the Reduced-error pruning (REPTree) and Error Complex-
the right of Fig. 8, the black candidate is the best, white candidates ity Pruning (SimpleCart) techniques, the EBP technique employed
are eliminated, and only the gray ones are kept). by J48 (and discussed in Section 2) grafts one of the sub-trees of
a sub-tree X that was selected to be pruned. Furthermore, the
accuracy of the J48 DTs is the best for six datasets. Clearly, these
5. Validation of IUDT
results provide a picture of the over-pruning and under-pruning
encountered in some databases. Over-pruning can be observed in
The proposed method is implemented and tested on 10
the case of the Zoo and Breast-cancer datasets using REPTree and
standard machine learning datasets, which were extracted from
Table 1
Main characteristics of the databases used for the experiments.
Table 2
Main characteristics of the DTs used for the experiments.
Table 3 Table 6
IUDT approach applied to J48 results. DTP results.
Table 4
IUDT approach applied to REPTree results.
Table 7
Data # Attributes Size Accuracy IUDTSD results.
Table 8 Table 12
Application of IUDTAS to J48 results. Comparison of IUDTSD and IUDT results.
Table 9
Application of IUDTAS to REPTree results.
Table 13
Dataset # Attributes Size Accuracy Comparison of IUDTAS and IUDT results.
condition (class). The approaches investigated in this paper are acquisition system equipped with OROS software. Vibration sig-
then used to seek the non-complex construction of effective natures were recorded under three different rotational speeds
decision rules, following the schematic shown in Fig. 9. (300, 900, and 1500 rpm) under a normal operating condition
and with three different faults: mass imbalance, gear fault, and
faulty belt.
6.1. Experimental study
6.2. Signal processing
The test rig shown in Fig. 10 is composed of three shafts, two
gears (one with 60 teeth and the other with 48 teeth), six bearing The wavelet transform has been widely studied over the past
housings, a coupling, and a toothed belt. The system is driven by a two decades, and its use has seen a significant growth and interest
DC variable-speed electric motor with a rotational speed ranging in vibration analysis. The formulation of its discrete variant (DWT),
from 0 to 1500 rpm. which requires less computation time than the continuous form, is
Vibration signatures were used to monitor the condition of the shown in the following equation:
test rig. The vibration signals were acquired using an acceler- Z þ1 !
ometer fixed on the bearing housing, connected to a data 1 t 2j k
DWTðj; kÞ ¼ pffiffiffiffi sðtÞψ n dt ð4Þ
2j 1 2j
Fig. 10. Test rig (URASM-CSC Laboratory). Fig. 11. DWT decomposition.
80 N.E.I. Karabadji et al. / Engineering Applications of Artificial Intelligence 35 (2014) 71–83
Fig. 12. Original signal, approximations, and detail spectra extracted from the test rig under the condition of mass imbalance.
Table 14 Table 15
Fault diagnosis results for rotating machine application. Indicators used in the construction of the best tree.
Accuracy Size Accuracy Size Accuracy Size Accuracy Size 1 Skewness Original signal
2 Root mean square of the spectra Original signal
J48 89.52 21 91.66 33 92.38 19 96.66 29 3 Signal root mean square First approximation signal
REPTree 98.41 21 85.31 27 92.14 19 90.47 29 4 Crest factor First detail signal
SimpleCart 95.37 23 92.30 35 90.47 19 96.66 25 5 Mean interval between the frequencies Original signal
of the four maximum amplitudes
Experimental results on 10 benchmark datasets showed that the We conclude that the proposed method is feasible, especially in
proposed approach improves the accuracy of DTs and effectively the case of small datasets, and is an effective approach when
reduces their size, avoiding the problems of over- or under-pruning. exhaustive knowledge on data characteristics is missing.
Finally, the IUDT approach was applied to the practical
application of fault diagnosis in rotating equipment. This demon- Acknowledgments
strated its effectiveness in terms of classification accuracy and tree
complexity. Moreover, extracting only indicators selected by the In memoriam, we would like to gratefully acknowledge Prof.
proposed approach allows a significant gain in storage memory Lhouari Nourine and Prof. Engelbert Mephu Nguifo for their kind
and computation time. advice on this research.
Appendix A
REPTree DTP
¼¼¼¼¼¼¼¼¼¼¼¼
sCF-Sigs o 0:13
j sCF-A2s o0:07
j j sfMXPx-Sigs o53:5 : 2.000000 (42/4) [25/7]
j j sfMXPx-Sigs 4 ¼ 53:5 : 1.000000 (7/0) [3/0]
j sCF-A2s 4 ¼ 0:07
j j sMeanFFT-Sigs o 0
j j j sfMXPx-Sigs o 52:5
j j j j sfMXPx-Sigs o10:5 : 1.000000 (18/0) [6/0]
j j j j sfMXPx-Sigs 4 ¼ 10:5
j j j j j sfMXPx-Sigs o 50:5 : 2.000000 (30/2) [12/0]
j j j j j sfMXPx-Sigs 4 ¼ 50:5 : 1.000000 (18/2) [7/2]
j j j sfMXPx- Sigs 4 ¼ 52:5
j j j j sCF-A2s o 0.08 : 1.000000 (13/0) [6/2]
j j j j sCF-A2s 4 ¼ 0:08
j j j j j sRMS-Sigs o 3.03 : 1.000000 (2/0) [2/0]
j j j j j sRMS-Sigs 4 ¼ 3:03 : 7:000000ð25=0Þ [16/3]
j j sMeanFFT-Sigs 4 ¼ 0
j j j sfMXPx-A1s o 48 : 7.000000 (36/0) [17/0]
j j j sfMXPx-A1s 4 ¼ 48
j j j j sfMXPx-Sigs o 53 : 1.000000 (7/0) [4/0]
j j j j sfMXPx-Sigs 4 ¼ 53 : 7:000000ð7=0Þ [3/0]
sCF-Sigs 4 ¼ 0:13
j sRMS-Sigs o 4.98
j j sRMS-Sigs o 2.81 : 2.000000 (2/0) [1/0]
j j sRMS-Sigs 4 ¼ 2:81 : 1:000000 ð3=2Þ [3/1]
j sRMS-Sigs 4 ¼ 4:98 : 3:000000 ð70=0Þ [35/0]
REPTree IUDTSD
¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼¼
sCF-Sigs o 0.13
j sfMXPx-Sigs o 51
j j sRMS-Sigs o 7
j j j sRMS-Sigs o 4.57 : 2.000000 (7/0) [0/0]
j j j sRMS-Sigs 4 ¼ 4:57 : 1:000000 ð7=0Þ [0/0]
j j sRMS-Sigs 4 ¼ 7
j j j sCF-A2s o 0.07 : 2.000000 (15/0) [0/0]
j j j sCF-A2s 4 ¼ 0:07
j j j j sMeanFFT- Sigs o 0 : 2.000000 (6/0) [0/0]
j j j j sMeanFFT-Sigs 4 ¼ 0 : 7:000000 (10/0) [0/0]
j sfMXPx-Sigs 4 ¼ 51
j j sfMXPx-A2s o 53.5 : 1.000000 (10/1) [0/0]
j j sfMXPx-A2s 4 ¼ 53:5
j j j sMeanFFT-Sigs o 0 : 1.000000 (6/0) [0/0]
j j j sMeanFFT-Sigs 4 ¼ 0 : 7:000000 (16/0) [0/0]
sCF-Sigs 4 ¼ 0:13
j sRMS-Sigs o 5.11 : 2.000000 (2/1) [0/0]
j sRMS-Sigs 4 ¼ 5:11 : 3:000000 (26/0) [0/0]
82 N.E.I. Karabadji et al. / Engineering Applications of Artificial Intelligence 35 (2014) 71–83
REPTree IUDTAS
¼¼¼¼¼¼¼¼¼¼¼¼¼¼
sMeanFFT-A2s o 0
j sRMS-D2s o 1.28 : 2.000000 (15/0) [0/0]
j sRMS-D2s 4 ¼ 1:28
j j sRMS-D2s o 3.94
j j j sMeanFFT-A2s o 0
j j j j sRMS- D2s o 1.41
j j j j j sMeanFFT-A2s o 0
j j j j j j sMeanFFT-A2s o 0 : 2.000000 (5/1) [0/0]
j j j j j j sMeanFFT-A2s 4 ¼ 0 : 1:000000 (4/0) [0/0]
j j j j j sMeanFFT-A2s 4 ¼ 0 : 7:000000 (8/0) [0/0]
j j j j sRMS-D2s 4 ¼ 1:41
j j j j j sMeanFFT-A2s o 0 : 1.000000 (21/0) [0/0]
j j j j j sMeanFFT-A2s 4 ¼ 0
j j j j j j sMean-Freqdsit-A1s o 10.83 : 7.0000 (2/0) [0/0]
j j j j j j sMean-Freqdsit-A1s 4 ¼ 10:83 : 1:000 (8/0) [0/0]
j j j sMeanFFT-A2s 4 ¼ 0 : 7:000000 (24/1) [0/0]
j j sRMS-D2s 4 ¼ 3:94
j j j sMean-Freqdsit-A1s o 10.17 : 1.000000 (19/0) [0/0]
j j j sMean-Freqdsit-A1s 4 ¼ 10:17 : 2:000000 (14/1) [0/0]
sMeanFFT-A2s 4 ¼ 0
j sMeanFFT-A2s o 0.01
j j sCF-D1s o 0.14
j j j sRMS-D2s o 15.12 : 7.000000 (17/0) [0/0]
j j j sRMS- D2s 4 ¼ 15:12 : 3:000000 (2/0) [0/0]
j j sCF-D1s 4 ¼ 0:14
j j j sCF-D1s o 0.17 : 3.000000 (10/1) [0/0]
j j j sCF-D1s 4 ¼ 0:17 : 3:000000 (39/0) [0/0]
j sMeanFFT-A2s 4 ¼ 0:01 : 2:000000 (22/0) [0/0]
REPTree IUDT
¼¼¼¼¼¼¼¼¼¼¼¼
sSkew-Sigs o 0.23
j sMeanFFT-Sigs o 0.01
j j sMeanFFT-Sigs o 0
j j j sRMS-A1s o 2.18 : 2.000000 (9/0) [0/0]
j j j sRMS-A1s 4 ¼ 2:18
j j j j sRMS-A1s o 8.46
j j j j j sMeanFFT-Sigs o 0
j j j j j j sMeanFFT- Sigs o 0 : 1.000000 (9/0) [0/0]
j j j j j j sMeanFFT-Sigs 4 ¼ 0
j j j j j j j sMean-Freqdsit-Sigs o 11.83 : 7.00000 (7/0) [0/0]
j j j j j j j sMean-Freqdsit-Sigs 4 ¼ 11:83 : 1:0000 (8/0) [0/0]
j j j j j sMeanFFT-Sigs 4 ¼ 0 : 7:000000 (10/0) [0/0]
j j j j sRMS-A1s 4 ¼ 8:46
j j j j j sMean-Freqdsit-Sigs o 9.67 : 1.000000 (8/0) [0/0]
j j j j j sMean- Freqdsit-Sigs 4 ¼ 9:67 : 2:000000 (6/0) [0/0]
j j sMeanFFT-Sigs 4 ¼ 0
j j j sCF-D1s o 0.15 : 7.000000 (11/0) [0/0]
j j j sCF-D1s 4 ¼ 0:15 : 3:000000 (5/1) [0/0]
j sMeanFFT-Sigs 4 ¼ 0:01 : 2:000000 (13/0) [0/0]
sSkew-Sigs 4 ¼ 0:23 : 3:000000 (19/0) [0/0]
References Bermejo, P., Gámez, J., Puerta, J., 2008. On incremental wrapper-based attribute
selection: experimental analysis of the relevance criteria. In: Proceedings of IPMU’08,
pp. 638–645.
Agrawal, R., Srikant, R., et al., 1994. Fast algorithms for mining association rules, in:
Bermejo, P., de la Ossa, L., Gámez, J.A., Puerta, J.M., 2012. Fast wrapper feature
Proceedings of the 20th International Conference on Very Large Data Bases, subset selection in high-dimensional datasets by means of filter re-ranking.
VLDB, pp. 487–499. Knowl.-Based Syst. 25, 35–44.
Badaoui, M.E., Guillet, F., Daniere, J., 2004. New applications of the real cepstrum to Blake, C.L., Merz, C.J., 1998. UCI Repository of Machine Learning Databases 〈http://
gear signals, including definition of a robust fault indicator. Mech. Syst. Signal www.ics.uci.edu/ mlearn/mlrepository.html〉. University of California, Depart-
ment of Information and Computer Science, Irvine, CA, p. 460.
Process. 18, 1031–1046.
Bourrillion, K., et al., 2010. Guava: Google Core Libraries for Java 1.5 þ .
Baydar, N., Ball, A., 2001. A comparative study of acoustic and vibration signals in
Bradford, J.P., Kunz, C., Kohavi, R., Brunk, C., Brodley, C.E., 1998. Pruning decision
detection of gear failures using Wigner–Ville distribution. Mech. Syst. Signal trees with misclassification costs. In: Machine Learning, ECML-98. Springer,
Process. 15, 1091–1107. pp. 131–136.
N.E.I. Karabadji et al. / Engineering Applications of Artificial Intelligence 35 (2014) 71–83 83
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J., 1984. Classification and Mallat, S.G., 1989. A theory for multiresolution signal decomposition: the wavelet
Regression Trees. Wadsworth & Brooks, Monterey, CA. representation. IEEE Trans. Pattern Anal. Mach. Intell. 11, 674–693.
Caruana, R., Freitag, D., 1994. How useful is relevance? FOCUS 14, 2. Mingers, J., 1987. Expert systems-rule induction with statistical data. J. Oper. Res.
Chen, C.S., Chen, J.S., 2011. Rotor fault diagnosis system based on SGA-based Soc., 39–47.
individual neural networks. Expert Syst. Appl. 38, 10822–10830. Mingers, J., 1989. An empirical comparison of pruning methods for decision tree
Davey, B.A., 2002. Introduction to Lattices and Order. Cambridge University Press. induction. Mach. Learn. 4, 227–243.
Deng, S., Lin, S.Y., Chang, W.L., 2011. Application of multiclass support vector machines Mosher, M., Pryor, A.H., Lewicki, D.G., 2003. Detailed vibration analysis of pinion
for fault diagnosis of field air defense gun. Expert Syst. Appl. 38, 6007–6013. gear with time–frequency methods. National Aeronautics and Space Adminis-
Djebala, A., Ouelaa, N., Hamzaoui, N., 2008. Detection of rolling bearing defects tration, Ames Research Center.
using discrete wavelet analysis. Meccanica 43, 339–348. Niblett, T., Bratko, I., 1987. Learning decision rules in noisy domains. In: Proceedings
Esposito, F., Malerba, D., Semeraro, G., Kay, J., 1997. A comparative analysis of of Expert Systems' 86, The Sixth Annual Technical Conference on Research and
methods for pruning decision trees. IEEE Trans. Pattern Anal. Mach. Intell. 19, Development in Expert Systems III. Cambridge University Press, pp. 25–34.
476–491. Oates, T., 1997. The effects of training set size on decision tree complexity. In:
Fournier, D., Crémilleux, B., 2002. A quality index for decision tree pruning. Knowl. Proceedings of the 14th International Conference on Machine Learning. Morgan
Based Syst. 15, 37–43. Kaufmann, pp. 254–262.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H., 2009. The WEKA Piramuthu, S., 2008. Input data for decision trees. Expert Syst. Appl. 34, 1220–1226.
data mining software: an update. ACM SIGKDD Explor. Newslett. 11, 10–18. Provost, F., Domingos, P., 2003. Tree induction for probability-based ranking. Mach.
Hall, M.A., Holmes, G., 2003. Benchmarking attribute selection techniques for Learn. 52, 199–215.
discrete class data mining. IEEE Trans. Knowl. Data Eng. 15, 1437–1447. Quinlan, J.R., 1987. Simplifying decision trees. Int. J. Man–Mach. Stud. 27, 221–234.
Huerta, E.B., Duval, B., Hao, J.K., 2006. A hybrid GA/SVM approach for gene selection Quinlan, J.R., 1993. C4.5: Programs for Machine Learning, vol. 1. Morgan Kaufmann.
and classification of microarray data. In: Applications of Evolutionary Comput- Ruiz, R., Riquelme, J.C., Aguilar-Ruiz, J.S., 2006. Incremental wrapper-based gene
ing. Springer, pp. 34–44. selection from microarray data for cancer classification. Pattern Recognit. 39,
Ishibuchi, H., Nakashima, T., Nii, M., 2001. Genetic-algorithm-based instance and 2383–2392.
feature selection. In: Instance Selection and Construction for Data Mining. Sakthivel, N., Sugumaran, V., Babudevasenapati, S., 2010. Vibration based fault
Springer, pp. 95–112. diagnosis of monoblock centrifugal pump using decision tree. Expert Syst. Appl.
Kankar, P., Sharma, S.C., Harsha, S., 2011. Fault diagnosis of ball bearings using 37, 4040–4049.
continuous wavelet transform. Appl. Soft Comput. 11, 2300–2312. Stracuzzi, D.J., Utgoff, P.E., 2004. Randomized variable elimination. J. Mach. Learn.
Karabadji, N.E.I., Seridi, H., Khelf, I., Laouar, L., 2012. Decision tree selection in an Res. 5, 1331–1362.
industrial machine fault diagnostics. In: Model and Data Engineering. Springer, Sugumaran, V., Muralidharan, V., Ramachandran, K., 2007. Feature selection using
pp. 129–140. decision tree and classification through proximal support vector machine for
Khelf, I., Laouar, L., Bouchelaghem, A.M., Rémond, D., Saad, S., 2013. Adaptive fault fault diagnostics of roller bearing. Mech. Syst. Signal Process. 21, 930–942.
diagnosis in rotating machines using indicators selection. Mech. Syst. Signal Sugumaran, V., Ramachandran, K., 2007. Automatic rule learning using decision
Process. 40, 452–468. tree for fuzzy classifier in fault diagnosis of roller bearing. Mech. Syst. Signal
Kohavi, R., John, G.H., 1997. Wrappers for feature subset selection. Artif. Intell. 97, Process. 21, 2237–2247.
273–324. Xing, E.P., Jordan, M.I., Karp, R.M., et al., 2001. Feature selection for high-
Liu, H., 2010. Instance Selection and Construction for Data Mining. Springer-Verlag. dimensional genomic microarray data. In: ICML, pp. 601–608.
Liu, H., Yu, L., 2005. Toward integrating feature selection algorithms for classifica- Yang, B.S., Lim, D.S., Tan, A.C.C., 2005. VIBEX: an expert system for vibration fault
tion and clustering. IEEE Trans. Knowl. Data Eng. 17, 491–502. diagnosis of rotating machinery using decision tree and decision table. Expert
Luo, L., Zhang, X., Peng, H., Lv, W., Zhang, Y., 2013. A new pruning method for Syst. Appl. 28, 735–742.
decision tree based on structural risk of leaf node. Neural Comput. Appl., 1–10. Yildiz, O.T., Alpaydin, E., 2005. Linear discriminant trees. Int. J. Pattern Recognit.
Macaš, M., Lhotská, L., Bakstein, E., Novák, D., Wild, J., Sieger, T., Vostatek, P., Jech, R., Artif. Intell. 19, 323–353.
2012. Wrapper feature selection for small sample size data driven by complete Zhao, Y., Zhang, Y., 2008. Comparison of decision tree methods for finding active
error estimates. Comput. Methods Prog. Biomed. 108, 138–150. objects. Adv. Space Res. 41, 1955–1959.