Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Contribution Title⋆

First Author1[0000−1111−2222−3333] , Second Author2,3[1111−2222−3333−4444] , and


Third Author3[2222−−3333−4444−5555]
Princeton University, Princeton NJ 08544, USA
1
2
Springer Heidelberg, Tiergartenstr. 17, 69121 Heidelberg, Germany
lncs@springer.com
http://www.springer.com/gp/computer-science/lncs
3
ABC Institute, Rupert-Karls-University Heidelberg, Heidelberg, Germany
{abc,lncs}@uni-heidelberg.de

Abstract. Feature selection is an important preprocessing phase for


data classication. Filter feature selection is gaining popularity due to
their eciency and simplicity. The main task of lter feature selection is
to return the best-ranked feature subset based on the feature signicance.
Feature selection based on information measures is one of the most eec-
tive lter feature selection methods to estimate the feature signicance.
Unfortunately, most of the existing methods do not suggest a threshold
to return the best feature subset while depend on a xed or user-dened
threshold. Therefore, determining the best threshold is still an open re-
search problem for the feature selection process. This paper proposes a
new threshold based on fuzzy information measures (FIM). The proposed
threshold utilizes the joint discriminative ability to return the most dis-
criminative subset of features. Based on 15 benchmark datasets and four
classiers, the eectiveness of the proposed threshold is evaluated by six
feature selection methods and compared to four well-known thresholds.

Keywords: Threshold selection · Fuzzy information measures · Feature


selection · Classication.

1 Introduction
Nowadays, data classication can be founded in dierent domains such as en-
gineering, industry, and medical domains [3, 14]. However, the presence of un-
desirable features increases the size of the data and reduces its quality [13].
Consequently, the performance of data classication can be reduced while the
computational cost can be increased [13]. Feature selection is an important and
eective preprocessing phase for data classication, where it improves the data
quality by removing undesirable features [30].
Feature selection consists of three types: lter [15], embedded [9], and wrap-
per [11]. According to the evaluation approach, lter type is called classier-
dependent while embedded and wrapper are called classier-independent. Filter

Supported by organization x.
2 F. Author et al.

outperforms other types due to its simplicity in usage, eciency in handling


the high-dimensional data, and ability to work better with the dierent classi-
ers [21]. These are the main reasons to consider the lter feature selection in
the study.
The main task of lter feature selection is to return the best-ranked feature
subset based on the feature signicance [15]. Therefore, lter feature selection
uses well-known measures to estimate the feature signicance such as correlation
[18] and information measures [26]. Information measures are recommended over
correlation where they handle the linear and non-linear relations among features.
Moreover, information measures are suitable for any data type such as numerical
and categorical data [16].
Motivated by the main benets of information measures, many feature se-
lection methods have been developed in the literature [1, 7, 20, 25, 28]. Unfortu-
nately, most of the existing methods do not suggest a threshold to return the
best feature subset while depending on a xed or user-dened threshold. There-
fore, determining the best threshold is still an open research problem for the
feature selection process. As commonly used, a xed threshold or a user-dened
threshold are not preferred solutions where a xed threshold does not consider
all the data characteristics while a user-dened threshold requires extra experts
eort. In this paper, we propose a new threshold method, called threshold based
on fuzzy information measures (FIM). The proposed threshold starts with the
best ranked feature and stops automatically when adding new candidate features
does not increase the discriminative ability of the pre-selected feature.
The following is the paper's structure: Sect. 2 presents the related work.
Then, the proposed threshold method is introduced in Sect. 3. The experiment
setup is presented in Sect. 4, followed by the analysis of the experimental results
in Sect. 5. Finally, the conclusion of the paper is presented in Sect. 6.

2 Related work

Due to the advantages of information measures, many feature selection methods


based on information measures have been developed to improve the classication
performance. In the following, we present some of the related methods.
Peng et al. proposed a feature selection method, namely MRMR [20]. The
main objective of MRMR is to maximize the feature relevancy and minimize
the feature redundancy in the ranked features. However, MRMR has two lim-
itations. Firstly, it estimates the redundancy relation without considering the
class information, which can lead to a bias between the relevancy and redun-
dancy relations. Secondly, it may fail in redundancy overscaled problem [17].
Although JMI in [28] utilizes the class information and overcomes the second
limitation, it may also fail in another problem called undervalued estimation
of redundancy relation [17]. To overcome the previous limitation, three meth-
ods have been introduced in this direction, called, CMIM [7], JMIM [1], and
NJMIM [1]. However, the estimation of their objective functions are not an easy
Contribution Title 3

task where they consider three relations, which are relevancy, complementarity,
and redundancy [4].
Although information measures feature selection methods gained widely used
in the literature, determining the best threshold has been left to be selected
by users or experts. For this reason, dierent thresholds have been introduced
in [8, 10, 22, 27] to avoid the extra eort by users or experts as follows.
Half selection threshold (HS): HS threshold returns approximately 50%
of the full ranked feature set. The candidate feature f ∈ F is selected if the
following condition is satised:
|F |
pf ≤ (1)
2
where pf is the position of the ranked feature.
Mean selection threshold (MS): MS returns all the features that are
greater than or equivalent to the mean of the scores of the ranked features. The
candidate feature f ∈ F is selected if the following condition is satised:
X sf
sf ≥ (2)
|F |
f ∈F

where sf is the score of the ranked feature.


Selection by threshold (ST): ST tries to return the features with scores
that are approximately higher than their median. The candidate feature f ∈ F
is selected if the following condition is satised:
maxs − mins
sf ≥ mins + (3)
3
where mins is the minimum feature's score while maxs is the maximum feature's
score.
Log threshold (Log): Log threshold returns the top log2 |F | features. The
candidate feature f ∈ F is selected if the following condition is satised:
pf ≤ log2 |F | (4)
where pf is the position of the ranked feature.

3 Proposed method: FIM


Fuzzy information measures are extensions of information measures, where the
former outperforms the latter in dealing with continuous features without infor-
mation loss [29]. The main idea is to map each feature fi into a fuzzy relation
matrix F RM (fi ) [23]. However, the extra cost of size and time is required for
feature mapping. To avoid this limitation, we propose a new instance selection
based on multi-uncertainty regions of the data, called ISMUR. ISMUR is an
improved version of ISUR which was proposed in [24]. ISUR depends on the
overlapping between the minor class and other classes of data. But, it may not
4 F. Author et al.

cover the uncertainty regions well, especially if there is no intersection between


the minor class and any of the other classes data. To extract the uncertainty
regions well, we propose an improved version of ISMUR. ISMUR denes the
uncertainty regions based on the closed classes instead of minor class with each
of the other classes, as follows:
Step 1: divide the input data D into h subsets based on classes C =
{c1 , c2 , ..., ch } as follows: D = {D(c1 ), D(c2 ), ..., D(ch )}.
Step 2: calculate the mean instance of each data subset as follows:
{L(c1 ), L(c2 ), ..., L(ch )}.
Step 3: calculate all possible distance between any two mean instances as
follows: dis(L(ci ), L(cj )), where i ̸= j .
Step 4: for each data subset
Step 4.1: select the closest subset based on the minimum distance be-
tween the mean instances min(dis(L(ci ), L(cj ))).
Step 4.2: add k instances of D(ci ), which closest to D(cj ), into new
dataset newdataset, where k is the length of the minor data.
After extracting the uncertainty regions, the size of the input data will be
reduced. Then, we can calculate the joint discriminative ability (I ) of the best
threshold as follows:
Step 1: map the rst ranked feature and the class label into a fuzzy relation
matrices F RM (f1 ) and F RM (C), respectively.
Step 2: Let the feature representative matrix M = F RM (f1 ), threshold =
1, the discriminative ability score s(f1 ) = I(F RM (f1 ), F RM (C)), and the
best discriminative ability score best = s(f1 ).
Step 3: for each remaining candidate feature (fi ), where i = {2, ..., n}
Step 3.1: calculate the discriminative ability of the candidate feature
s(fi ) = I(F RM (fi ), M ; F RM (C)).
Step 2.2: if s(fi ) ≥ best, then threshold = i, best = s(fi ), and M =
min(M, F RM (fi )) else exit the loop.
Step 4: return threshold.

4 Experiment setup
An experimental framework of the proposed threshold FIM is shown in Fig. 1.
The main phases of the experiment are feature selection, threshold selection, and
evaluation.
Datasets: In the experiment, fteen benchmark datasets from dierent do-
mains with various characteristics have been used as shown in Table 1. All the
datasets are available in these resources 456 .
4
https://archive.ics.uci.edu/ml/datasets.php
5
https://jundongl.github.io/scikit-feature/datasets.html
6
https://github.com/klainfo/NASADefectDataset
Contribution Title 5

Feature selec�on Threshold selec�on Evalua�on

JMI3 Log NB Accuracy


JMIM SVM
Full Ranked HS Selected
features NJMIM features F-measure
feature subset KNN
Datasets ST
CMIM AUC
DT
JMI MS

MRMR FIM Average percentages of selected features

Fig. 1. An experimental framework of the proposed threshold FIM.

Table 1. Description of the used datasets.

Num. Dataset Instances Features Classes


1 CM1 327 37 2
2 GLIOMA 50 4434 4
3 JM1 7720 21 2
4 KC1 1162 21 2
5 MC1 1952 38 2
6 MW1 250 37 2
7 QSAR biodegradation 1055 41 2
8 Articial Characters 10218 7 10
9 Blood Transfusion Service Center 748 4 2
10 Colon 62 2000 2
11 Hayes-Roth 160 4 3
12 Leaf 340 15 30
13 Leukemia 102 5966 2
14 Lung discrete 73 325 7
15 Statlog Vehicle Silhouettes 946 18 4
6 F. Author et al.

Feature selection To prove the eectiveness of the proposed threshold,


FIM was tested by six well-known and widely used feature selection methods
as shown in Fig. 1. The feature selection methods are JMI3 [25], JMIM [1],
NJMIM [1], CMIM [7], JMI [28], and MRMR [20]. All the used method depends
on a user-dened threshold.
Threshold selection In the conducted experiment, the proposed threshold
was compared to four well-known thresholds. The comparative methods are Log
[8], HS [10], ST [22], and MS [10]. Both Log and HS depend on the dataset while
the remaining thresholds depend on the feature selection methods.
Evaluation The main goal of threshold methods is to return the best fea-
ture subset that gives the best classication performance. To show that, the
comparative thresholds have been evaluated by three measures (accuracy, F-
measure, and AUC) using four well-known classiers as NB [5], SVM [6], KNN
(k=3) [19], and DT [2]. The average classication results were computed by the
10-fold-cross-validation approach [12].

5 Results and analysis

Table 2 shows the average classication performance of (Accuracy, F-measure,


AUC) of the compared thresholds on ve feature selection methods using three
classiers. According to JMI3, FIM threshold achieved the best classication
accuracy using all classiers. FIM outperformed other thresholds in the ranges
0.5%-4% using NB, 2.8%-4.9% using SVM, 1.4%-3.1% using KNN, and 3%-6.2%
using DT. In F-measure, FIM threshold kept the outperformance using SVM and
DT in ranges 1.2%-3.1% and 4.1%-5.5%, respectively. For AUC, FIM achieved
the best result using all classiers except KNN.
Regarding JMIM, FIM threshold outperformed other thresholds in term of
accuracy. The proposed threshold improved the accuracy results of NB, SVM,
KNN, and DT in the ranges 2.1%-5.1%, 2.4%-6.1%, 1.2%-3.5%, and 5.6%-7.1%,
respectively. For F-measure, FIM achieved the best results in SVM and DT by
82.1% and 81.1%, respectively. Log threshold achieved the best F-measure in NB
while ST achieved the best result in KNN. The AUC results have been improved
by FIM threshold in the range 0.8%-2.3% using NB, 2.1%-2.8% using SVM, and
4.9%-6.1% using DT.
For NJMIM, FIM threshold increased the classication accuracy by ranges
1%- 5.2%, 2.4%-7.5%, 1%-5%, and 3.3%-6.9% using NB, SVM, KNN, and DT,
respectively. Based on F-measure and AUC, The proposed threshold outper-
formed other thresholds in three of four classiers. Log threshold returned the
best F-measure using NB while HS returned the best AUC using KNN.
In CMIM, FIM has the best results in all classiers in the terms of accuracy
and F-measure. For AUC, the proposed threshold outperformed other thresholds
in all classiers except KNN.
Using JMI, FIM threshold returned the best accuracy in all classiers. The
best F-measure results were achieved by Log threshold using NB while by the
Contribution Title 7

proposed threshold in other three classiers. Similar to accuracy, FIM outper-


formed other thresholds in AUC with all classiers.
According to MRMR, FIM outperformed other thresholds in accuracy and
F-measure with all classiers. For AUC, the proposed method returns the best
result in all classiers except KNN. In KNN, both Log and ST returned the best
result.

Table 2. The average classication performance of (Accuracy, F-measure, AUC) of


the compared thresholds on ve feature selection methods using three classiers. FIM
achieved the best results in most cases.

Method Classier Accuracy


Log HS ST MS
F-measure
FIM Log HS ST MS
AUC
FIM Log HS ST MS FIM
NB 69.8 66.3 66.6 66.8 70.3 76.3 73.7 73.9 74.1 74.4 78.7 77.5 78.3 78.5 79.9
JMI3 SVM 72.8 74.9 74.5 74.5 77.8 78.4 80.3 80.0 79.9 81.5 69.3 69.5 69.1 69.0 71.5
KNN 73.8 75.5 74.7 75.4 76.9 78.1 79.9 78.7 79.3 79.6 78.4 79.7 77.7 78.9 78.9
DT 74.4 72.3 71.3 71.6 77.4 76.8 77.1 75.7 76.0 81.2 70.6 70.1 69.1 69.3 74.6
NB 68.2 66.0 67.4 65.9 71.0 75.9 72.7 74.3 73.5 75.0 78.7 78.0 79.5 78.1 80.3
JMIM SVM 71.8 75.5 75.4 74.6 77.9 78.7 79.7 79.7 79.5 82.1 68.7 69.1 69.4 68.7 71.5
KNN 73.9 75.8 76.2 75.0 77.4 77.9 78.7 80.3 78.8 80.1 77.6 79.1 79.7 77.9 79.2
DT 72.6 72.1 71.9 71.1 78.2 74.3 75.3 75.3 75.5 81.1 68.5 69.6 69.3 68.9 74.5
NB 68.6 66.4 70.6 67.0 71.6 76.2 74.0 74.0 74.3 76.0 79.6 77.9 78.5 78.7 79.6
NJMIM SVM 72.5 76.1 71.0 73.9 78.5 78.7 81.3 80.3 79.8 82.4 68.3 69.6 67.9 69.1 71.5
KNN 75.1 77.6 73.6 76.0 78.6 79.4 81.0 75.5 79.1 81.3 79.6 79.9 77.9 79.1 79.7
DT 74.0 72.6 70.4 72.2 77.3 76.9 76.5 72.9 74.7 80.5 70.1 69.7 70.7 70.4 73.7
NB 68.8 68.1 71.6 68.4 71.9 73.9 73.4 75.8 74.0 76.2 79.3 77.6 80.0 79.1 80.8
CMIM SVM 72.4 76.8 76.1 75.4 78.6 77.3 79.9 78.7 78.5 81.8 68.9 69.0 69.1 68.6 71.3
KNN 75.5 77.4 78.7 77.6 78.9 78.5 79.4 80.5 80.8 81.2 79.3 79.9 79.5 80.2 79.5
DT 73.9 72.4 74.9 73.2 77.0 76.8 75.5 77.2 75.3 79.3 69.6 68.8 71.5 69.5 72.4
NB 68.9 66.1 67.0 66.0 70.7 76.0 71.9 72.2 71.8 75.1 79.4 78.2 77.1 77.2 80.3
JMI SVM 73.5 75.1 76.9 75.4 78.3 79.4 79.1 80.6 79.6 82.3 69.2 69.2 70.3 69.3 71.8
KNN 75.8 75.7 77.1 75.4 77.1 79.3 77.8 80.1 77.8 80.7 79.6 79.1 79.3 79.1 79.6
DT 74.3 72.2 74.6 72.0 77.5 78.3 75.1 78.1 75.2 81.8 70.2 69.3 72.8 69.4 74.7
NB 72.3 68.2 69.1 68.1 74.9 77.0 73.6 75.2 74.0 77.9 79.5 77.7 79.3 78.0 80.1
MRMR SVM 74.5 75.8 75.9 75.7 77.4 78.8 79.5 79.3 79.5 80.7 68.3 68.9 68.9 68.7 70.5
KNN 75.7 77.5 76.2 75.7 77.5 79.6 80.9 80.0 79.6 80.9 78.6 77.5 78.6 78.6 77.7
DT 74.6 73.4 72.4 72.3 77.7 78.5 77.5 76.5 76.8 81.3 71.5 69.3 69.7 69.2 74.2

Figures 2- 7 shows the average classication performance of the ve thresh-


olds for the six feature selection methods. The box-plot describes the median
(circle), mean (triangle), lower and upper quartiles. The blue color refers to the
highest values of mean and median.
In JMI3, FIM has the highest mean and median of the three classication
measures. In F-measure, Log threshold shared the same F-measure result with
FIM. Similarly, the proposed threshold achieved the highest mean and median
in the dierent classication measures. Regarding NJMIM and CMIM, FIM has
the best mean and median values in the terms of accuracy and F-measure. ST
threshold shared the highest median of F-measure with FIM in the case of CMIM.
In AUC the proposed threshold achieved the highest mean while both ST and MS
thresholds achieved the highest median. According to JMI, the proposed method
achieved the highest mean in all classication measures. For the best median, the
highest median values were achieved by Log, FIM, and ST thresholds in terms of
accuracy, F-measure, and AUC, respectively. In MRMR, the proposed method
8 F. Author et al.

kept the best mean values in all measures and the best median in F-measure.
For accuracy and AUC, the best median values were achieved by Log and MS
thresholds, respectively.

JMI3
Accuracy F-measure AUC
100 100 100

90 90
90
80 80

70 70
80
60 60

50 50 70

40
40
60
30
30
Log HS ST MS FIM Log HS ST MS FIM Log HS ST MS FIM
Threshold

Fig. 2. The box-plot of average classication performance for each threshold using
JMI3.

JMIM
Accuracy F-measure AUC
100 100 100

90 90
80 90
80
70
70
80
60
60
50
50 70
40
40
30
60
30
20
Log HS ST MS FIM Log HS ST MS FIM Log HS ST MS FIM
Threshold

Fig. 3. The box-plot of average classication performance for each threshold using
JMIM.

The average percentages of selected features of the six feature selection meth-
ods using the dierent thresholds are shown in Fig. 8. Log thresholds achieved the
minimum average percentage of selected features in all feature selection meth-
ods except NJMIM. The proposed threshold achieved the second-best method
in JMI and MRMR while the fourth-best method in methods of JMIM, NJMIM,
Contribution Title 9

NJMIM
Accuracy F-measure AUC
100 100 100

90 90
90
80 80

70 70
80

60 60

50 50 70

40
40
60
30
30
Log HS ST MS FIM Log HS ST MS FIM Log HS ST MS FIM
Threshold

The box-plot of average classication performance for each threshold using


Fig. 4.
NJMIM.

CMIM
Accuracy F-measure AUC
100 100 100

90 90
90
80 80

70 70
80
60 60

50 50 70

40 40
60
30 30
Log HS ST MS FIM Log HS ST MS FIM Log HS ST MS FIM
Threshold

Fig. 5. The box-plot of average classication performance for each threshold using
CMIM.

JMI
Accuracy F-measure AUC
100 100 100

90 90
90
80 80

70
70
80
60
60
50
50 70
40
40 30
60
30 20
Log HS ST MS FIM Log HS ST MS FIM Log HS ST MS FIM
Threshold

Fig. 6. The box-plot of average classication performance for each threshold using JMI.
10 F. Author et al.

MRMR
Accuracy F-measure AUC
100 100 100

90 90
90
80 80

70
70
80
60
60
50
50 70
40
40
30
60
30 20
Log HS ST MS FIM Log HS ST MS FIM Log HS ST MS FIM
Threshold

The box-plot of average classication performance for each threshold using


Fig. 7.
MRMR.

and CMIM. For JMI3, FIM threshold method required the maximum average
percentage of selected features.

80
70
Num of features

60

50
40
30
20
10
0
Threshold
Log

MS

Log

MS

Log

MS

Log

MS

Log

MS

Log

MS
FIM

FIM

FIM

FIM

FIM

FIM
HS

HS

HS

HS

HS

HS
ST

ST

ST

ST

ST

ST

Method
MRMR
CMIM
JMIM

JMI
NJMIM
JMI3

Fig. 8. Average percentages of selected features of the six feature selection methods
using the dierent thresholds.

Overall, the experimental results of the classication performance conrm the


outperformance of our proposed method in most cases This is expected where
the proposed method considers the joint discriminative ability of the ranked
features. On other hand, other methods suer from some limitations. Firstly,
both Log and HS thresholds depend only on the characteristics of data without
considering the signicant information of features. Although methods of ST and
MS overcome this limitation, both methods depend on the information measures
which may fail in the problem of information loss.
Contribution Title 11

6 Conclusion
Information measures are powerful tools to estimate the feature signicance in
order to develop eective feature selection methods. However, most of the ex-
isting methods rank the features according to their signicance without deter-
mining the best threshold. This task is left to experts. To avoid the extra eort
of experts, we propose a new threshold based on fuzzy information measures,
called FIM. The proposed threshold selection method considers the joint dis-
criminative ability of features to return the best feature subset. The experiment
has been conducted based on fteen benchmark datasets, six feature selection
methods, and four classiers. Compared to widely used threshold methods, the
results conrm the outperformance of our proposed threshold method accord-
ing to three measures of classication performance. For the average percentage
of selected features, the proposed method may return more features compared
to other methods. In future work, we plan to extend our study to cover the
multi-label classication problem and ensemble feature selection.

Acknowledgements This research has been supported by the National Natural


Science Foundation of China (62172309).

References
1. Bennasar, M., Hicks, Y., Setchi, R.: Feature selection using joint mutual informa-
tion maximisation. Expert Systems with Applications 42(22), 85208532 (2015)
2. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classication and regression
trees. CRC press (1984)
3. Cateni, S., Colla, V., Vannucci, M.: A method for resampling imbalanced datasets
in binary classication tasks for real-world problems. Neurocomputing 135, 3241
(2014)
4. Che, J., Yang, Y., Li, L., Bai, X., Zhang, S., Deng, C.: Maximum relevance min-
imum common redundancy feature selection for nonlinear data. Information Sci-
ences 409, 6886 (2017)
5. Cheng, J., Greiner, R.: Comparing bayesian network classiers. arXiv preprint
arXiv:1301.6684 (2013)
6. Cristianini, N., Shawe-Taylor, J.: An introduction to support vector machines and
other kernel-based learning methods. Cambridge university press (2000)
7. Fleuret, F.: Fast binary feature selection with conditional mutual information.
Journal of Machine Learning Research 5(Nov), 15311555 (2004)
8. Gao, K., Khoshgoftaar, T.M., Wang, H., Seliya, N.: Choosing software metrics
for defect prediction: an investigation on feature selection techniques. Software:
Practice and Experience 41(5), 579606 (2011)
9. Imani, M.B., Keyvanpour, M.R., Azmi, R.: A novel embedded feature selection
method: a comparative study in the application of text categorization. Applied
Articial Intelligence 27(5), 408427 (2013)
10. Jaganathan, P., Kuppuchamy, R.: A threshold fuzzy entropy based feature selection
for medical database classication. Computers in biology and medicine 43(12),
22222229 (2013)
12 F. Author et al.

11. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Articial intelligence
97(1), 273324 (1997)
12. Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation
and model selection. In: Ijcai. vol. 14, pp. 11371145 (1995)
13. Kotsiantis, S.B., Zaharakis, I.D., Pintelas, P.E.: Machine learning: a review of clas-
sication and combining techniques. Articial Intelligence Review 26(3), 159190
(2006)
14. Laradji, I.H., Alshayeb, M., Ghouti, L.: Software defect prediction using ensemble
learning on selected features. Information and Software Technology 58, 388402
(2015)
15. Lazar, C., Taminau, J., Meganck, S., Steenho, D., Coletta, A., Molter, C.,
de Schaetzen, V., Duque, R., Bersini, H., Nowe, A.: A survey on lter techniques for
feature selection in gene expression microarray analysis. IEEE/ACM Transactions
on Computational Biology and Bioinformatics (TCBB) 9(4), 11061119 (2012)
16. Lee, S., Park, Y.T., d'Auriol, B.J., et al.: A novel feature selection method based
on normalized mutual information. Applied Intelligence 37(1), 100120 (2012)
17. Macedo, F., Oliveira, M.R., Pacheco, A., Valadas, R.: Theoretical foundations of
forward feature selection methods based on mutual information. Neurocomputing
325, 6789 (2019)
18. Mo, D., Huang, S.H.: Feature selection based on inference correlation. Intelligent
Data Analysis 15(3), 375398 (2011)
19. Patrick, E.A., Fischer, F.P.: A generalized k-nearest neighbor rule. Information
and control 16(2), 128152 (1970)
20. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria
of max-dependency, max-relevance, and min-redundancy. Pattern Analysis and
Machine Intelligence, IEEE Transactions on 27(8), 12261238 (2005)
21. Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioin-
formatics. bioinformatics 23(19), 25072517 (2007)
22. Salamo, M., Lopez-Sanchez, M.: Rough set based approaches to feature selection
for case-based reasoning classiers. Pattern Recognition Letters 32(2), 280292
(2011)
23. Salem, O.A., Liu, F., Chen, Y.P.P., Chen, X.: Feature selection and threshold
method based on fuzzy joint mutual information. International Journal of Approx-
imate Reasoning (2021)
24. Salem, O.A., Liu, F., Chen, Y.P.P., Hamed, A., Chen, X.: Eective fuzzy joint
mutual information feature selection based on uncertainty region for classication
problem. Knowledge-Based Systems 257, 109885 (2022)
25. Sechidis, K., Azzimonti, L., Pocock, A., Corani, G., Weatherall, J., Brown, G.:
Ecient feature selection using shrinkage estimators. Machine Learning 108(8-9),
12611286 (2019)
26. Steuer, R., Kurths, J., Daub, C.O., Weise, J., Selbig, J.: The mutual informa-
tion: detecting and evaluating dependencies between variables. Bioinformatics
18(suppl_2), S231S240 (2002)
27. Xu, Z., Xuan, J., Liu, J., Cui, X.: Michac: Defect prediction via feature selection
based on maximal information coecient with hierarchical agglomerative cluster-
ing. In: 2016 IEEE 23rd International Conference on Software Analysis, Evolution,
and Reengineering (SANER). vol. 1, pp. 370381. IEEE (2016)
28. Yang, H., Moody, J.: Feature selection based on joint mutual information. In: Pro-
ceedings of international ICSC symposium on advances in intelligent data analysis.
pp. 2225 (1999)
Contribution Title 13

29. Yu, D., An, S., Hu, Q.: Fuzzy mutual information based min-redundancy and max-
relevance heterogeneous feature selection. International Journal of Computational
Intelligence Systems 4(4), 619633 (2011)
30. Yu, L., Liu, H.: Ecient feature selection via analysis of relevance and redundancy.
The Journal of Machine Learning Research 5, 12051224 (2004)

You might also like