Professional Documents
Culture Documents
Samplepaper
Samplepaper
1 Introduction
Nowadays, data classication can be founded in dierent domains such as en-
gineering, industry, and medical domains [3, 14]. However, the presence of un-
desirable features increases the size of the data and reduces its quality [13].
Consequently, the performance of data classication can be reduced while the
computational cost can be increased [13]. Feature selection is an important and
eective preprocessing phase for data classication, where it improves the data
quality by removing undesirable features [30].
Feature selection consists of three types: lter [15], embedded [9], and wrap-
per [11]. According to the evaluation approach, lter type is called classier-
dependent while embedded and wrapper are called classier-independent. Filter
⋆
Supported by organization x.
2 F. Author et al.
2 Related work
task where they consider three relations, which are relevancy, complementarity,
and redundancy [4].
Although information measures feature selection methods gained widely used
in the literature, determining the best threshold has been left to be selected
by users or experts. For this reason, dierent thresholds have been introduced
in [8, 10, 22, 27] to avoid the extra eort by users or experts as follows.
Half selection threshold (HS): HS threshold returns approximately 50%
of the full ranked feature set. The candidate feature f ∈ F is selected if the
following condition is satised:
|F |
pf ≤ (1)
2
where pf is the position of the ranked feature.
Mean selection threshold (MS): MS returns all the features that are
greater than or equivalent to the mean of the scores of the ranked features. The
candidate feature f ∈ F is selected if the following condition is satised:
X sf
sf ≥ (2)
|F |
f ∈F
4 Experiment setup
An experimental framework of the proposed threshold FIM is shown in Fig. 1.
The main phases of the experiment are feature selection, threshold selection, and
evaluation.
Datasets: In the experiment, fteen benchmark datasets from dierent do-
mains with various characteristics have been used as shown in Table 1. All the
datasets are available in these resources 456 .
4
https://archive.ics.uci.edu/ml/datasets.php
5
https://jundongl.github.io/scikit-feature/datasets.html
6
https://github.com/klainfo/NASADefectDataset
Contribution Title 5
kept the best mean values in all measures and the best median in F-measure.
For accuracy and AUC, the best median values were achieved by Log and MS
thresholds, respectively.
JMI3
Accuracy F-measure AUC
100 100 100
90 90
90
80 80
70 70
80
60 60
50 50 70
40
40
60
30
30
Log HS ST MS FIM Log HS ST MS FIM Log HS ST MS FIM
Threshold
Fig. 2. The box-plot of average classication performance for each threshold using
JMI3.
JMIM
Accuracy F-measure AUC
100 100 100
90 90
80 90
80
70
70
80
60
60
50
50 70
40
40
30
60
30
20
Log HS ST MS FIM Log HS ST MS FIM Log HS ST MS FIM
Threshold
Fig. 3. The box-plot of average classication performance for each threshold using
JMIM.
The average percentages of selected features of the six feature selection meth-
ods using the dierent thresholds are shown in Fig. 8. Log thresholds achieved the
minimum average percentage of selected features in all feature selection meth-
ods except NJMIM. The proposed threshold achieved the second-best method
in JMI and MRMR while the fourth-best method in methods of JMIM, NJMIM,
Contribution Title 9
NJMIM
Accuracy F-measure AUC
100 100 100
90 90
90
80 80
70 70
80
60 60
50 50 70
40
40
60
30
30
Log HS ST MS FIM Log HS ST MS FIM Log HS ST MS FIM
Threshold
CMIM
Accuracy F-measure AUC
100 100 100
90 90
90
80 80
70 70
80
60 60
50 50 70
40 40
60
30 30
Log HS ST MS FIM Log HS ST MS FIM Log HS ST MS FIM
Threshold
Fig. 5. The box-plot of average classication performance for each threshold using
CMIM.
JMI
Accuracy F-measure AUC
100 100 100
90 90
90
80 80
70
70
80
60
60
50
50 70
40
40 30
60
30 20
Log HS ST MS FIM Log HS ST MS FIM Log HS ST MS FIM
Threshold
Fig. 6. The box-plot of average classication performance for each threshold using JMI.
10 F. Author et al.
MRMR
Accuracy F-measure AUC
100 100 100
90 90
90
80 80
70
70
80
60
60
50
50 70
40
40
30
60
30 20
Log HS ST MS FIM Log HS ST MS FIM Log HS ST MS FIM
Threshold
and CMIM. For JMI3, FIM threshold method required the maximum average
percentage of selected features.
80
70
Num of features
60
50
40
30
20
10
0
Threshold
Log
MS
Log
MS
Log
MS
Log
MS
Log
MS
Log
MS
FIM
FIM
FIM
FIM
FIM
FIM
HS
HS
HS
HS
HS
HS
ST
ST
ST
ST
ST
ST
Method
MRMR
CMIM
JMIM
JMI
NJMIM
JMI3
Fig. 8. Average percentages of selected features of the six feature selection methods
using the dierent thresholds.
6 Conclusion
Information measures are powerful tools to estimate the feature signicance in
order to develop eective feature selection methods. However, most of the ex-
isting methods rank the features according to their signicance without deter-
mining the best threshold. This task is left to experts. To avoid the extra eort
of experts, we propose a new threshold based on fuzzy information measures,
called FIM. The proposed threshold selection method considers the joint dis-
criminative ability of features to return the best feature subset. The experiment
has been conducted based on fteen benchmark datasets, six feature selection
methods, and four classiers. Compared to widely used threshold methods, the
results conrm the outperformance of our proposed threshold method accord-
ing to three measures of classication performance. For the average percentage
of selected features, the proposed method may return more features compared
to other methods. In future work, we plan to extend our study to cover the
multi-label classication problem and ensemble feature selection.
References
1. Bennasar, M., Hicks, Y., Setchi, R.: Feature selection using joint mutual informa-
tion maximisation. Expert Systems with Applications 42(22), 85208532 (2015)
2. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classication and regression
trees. CRC press (1984)
3. Cateni, S., Colla, V., Vannucci, M.: A method for resampling imbalanced datasets
in binary classication tasks for real-world problems. Neurocomputing 135, 3241
(2014)
4. Che, J., Yang, Y., Li, L., Bai, X., Zhang, S., Deng, C.: Maximum relevance min-
imum common redundancy feature selection for nonlinear data. Information Sci-
ences 409, 6886 (2017)
5. Cheng, J., Greiner, R.: Comparing bayesian network classiers. arXiv preprint
arXiv:1301.6684 (2013)
6. Cristianini, N., Shawe-Taylor, J.: An introduction to support vector machines and
other kernel-based learning methods. Cambridge university press (2000)
7. Fleuret, F.: Fast binary feature selection with conditional mutual information.
Journal of Machine Learning Research 5(Nov), 15311555 (2004)
8. Gao, K., Khoshgoftaar, T.M., Wang, H., Seliya, N.: Choosing software metrics
for defect prediction: an investigation on feature selection techniques. Software:
Practice and Experience 41(5), 579606 (2011)
9. Imani, M.B., Keyvanpour, M.R., Azmi, R.: A novel embedded feature selection
method: a comparative study in the application of text categorization. Applied
Articial Intelligence 27(5), 408427 (2013)
10. Jaganathan, P., Kuppuchamy, R.: A threshold fuzzy entropy based feature selection
for medical database classication. Computers in biology and medicine 43(12),
22222229 (2013)
12 F. Author et al.
11. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Articial intelligence
97(1), 273324 (1997)
12. Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation
and model selection. In: Ijcai. vol. 14, pp. 11371145 (1995)
13. Kotsiantis, S.B., Zaharakis, I.D., Pintelas, P.E.: Machine learning: a review of clas-
sication and combining techniques. Articial Intelligence Review 26(3), 159190
(2006)
14. Laradji, I.H., Alshayeb, M., Ghouti, L.: Software defect prediction using ensemble
learning on selected features. Information and Software Technology 58, 388402
(2015)
15. Lazar, C., Taminau, J., Meganck, S., Steenho, D., Coletta, A., Molter, C.,
de Schaetzen, V., Duque, R., Bersini, H., Nowe, A.: A survey on lter techniques for
feature selection in gene expression microarray analysis. IEEE/ACM Transactions
on Computational Biology and Bioinformatics (TCBB) 9(4), 11061119 (2012)
16. Lee, S., Park, Y.T., d'Auriol, B.J., et al.: A novel feature selection method based
on normalized mutual information. Applied Intelligence 37(1), 100120 (2012)
17. Macedo, F., Oliveira, M.R., Pacheco, A., Valadas, R.: Theoretical foundations of
forward feature selection methods based on mutual information. Neurocomputing
325, 6789 (2019)
18. Mo, D., Huang, S.H.: Feature selection based on inference correlation. Intelligent
Data Analysis 15(3), 375398 (2011)
19. Patrick, E.A., Fischer, F.P.: A generalized k-nearest neighbor rule. Information
and control 16(2), 128152 (1970)
20. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria
of max-dependency, max-relevance, and min-redundancy. Pattern Analysis and
Machine Intelligence, IEEE Transactions on 27(8), 12261238 (2005)
21. Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioin-
formatics. bioinformatics 23(19), 25072517 (2007)
22. Salamo, M., Lopez-Sanchez, M.: Rough set based approaches to feature selection
for case-based reasoning classiers. Pattern Recognition Letters 32(2), 280292
(2011)
23. Salem, O.A., Liu, F., Chen, Y.P.P., Chen, X.: Feature selection and threshold
method based on fuzzy joint mutual information. International Journal of Approx-
imate Reasoning (2021)
24. Salem, O.A., Liu, F., Chen, Y.P.P., Hamed, A., Chen, X.: Eective fuzzy joint
mutual information feature selection based on uncertainty region for classication
problem. Knowledge-Based Systems 257, 109885 (2022)
25. Sechidis, K., Azzimonti, L., Pocock, A., Corani, G., Weatherall, J., Brown, G.:
Ecient feature selection using shrinkage estimators. Machine Learning 108(8-9),
12611286 (2019)
26. Steuer, R., Kurths, J., Daub, C.O., Weise, J., Selbig, J.: The mutual informa-
tion: detecting and evaluating dependencies between variables. Bioinformatics
18(suppl_2), S231S240 (2002)
27. Xu, Z., Xuan, J., Liu, J., Cui, X.: Michac: Defect prediction via feature selection
based on maximal information coecient with hierarchical agglomerative cluster-
ing. In: 2016 IEEE 23rd International Conference on Software Analysis, Evolution,
and Reengineering (SANER). vol. 1, pp. 370381. IEEE (2016)
28. Yang, H., Moody, J.: Feature selection based on joint mutual information. In: Pro-
ceedings of international ICSC symposium on advances in intelligent data analysis.
pp. 2225 (1999)
Contribution Title 13
29. Yu, D., An, S., Hu, Q.: Fuzzy mutual information based min-redundancy and max-
relevance heterogeneous feature selection. International Journal of Computational
Intelligence Systems 4(4), 619633 (2011)
30. Yu, L., Liu, H.: Ecient feature selection via analysis of relevance and redundancy.
The Journal of Machine Learning Research 5, 12051224 (2004)