Feature Selection CHI

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Algorithm 1: Particle Swarm Optimization

1: Initialize parameters of PSO


2: Randomly initialize particles
3: WHILE stopping criterion not met DO
4: calculate each particle's fitness value
5: For i = 1 to population size DO
6: update the Gbest of Pi
7: update the Pbest of Pi
8: END
9: FOR i = 1 to population size DO
10: FOR j = 1 to dimension of particle DO
11: update the velocity of Pi according to equation (1)
12: update the position of Pi according to equation (2)
13: END
14: END
15: END

Algorithm 2: Proposed Algorithm


1: Load feature space set F of unique features f
2: Set the parameter desired number of features d
3: Initialize the parameters of PSO
4: Randomly initialize particles with dimension d over the set F
5: WHILE stopping criterion not met DO
6: calculate each particle's fitness value as accuracy of underlying classifier
7: For i = 1 to population size DO
8: update the Gbest of Pi
9: update the Pbest of Pi
10: END
11: FOR i = 1 to population size DO
12: FOR j = 1 to d DO
13: update the velocity of Pi according to equation (1)
14: update the position of Pi according to equation (2)
15: END
16: END
17: END
18: Return the best feature subset of d features found by the PSO

Feature Desired number of features


Classifier
Selection D=600 D=700 D=800 D=900 D=1000 D=110 D=1200
NB 0.78 0.792 0.813 0.825 0.838 0.837 0.844
CHI SVM 0.803 0.813 0.835 0.843 0.865 0.874 0.878
KNN 0.771 0.783 0.795 0.802 0.813 0.824 0.835
NB 0.827 0.843 0.841 0.846 0.845 0.85 0.854
MI SVM 0.835 0.856 0.867 0.869 0.87 0.882 0.893
KNN 0.817 0.834 0.844 0.846 0.853 0.863 0.865
TF-IDF NB 0.762 0.774 0.783 0.786 0.802 0.814 0.824
SVM 0.782 0.785 0.789 0.806 0.825 0.834 0.843
KNN 0.827 0.843 0.841 0.846 0.845 0.85 0.854
NB 0.862 0.883 0.886 0.8910 0.894 0.905 0.915
Proposed
SVM 0.886 0.876 0.8841 0.895 0.901 0.913 0.924
Algorithm
KNN 0.857 0.863 0.874 0.885 0.865 0.87 0.893

The main motivation of this work is to push the boundary of


surrogate-assisted metaheuristics for solving high-dimensional
computationally problems. In the following, we examine
the performance of the SA-COSO on 100-dimensional test algorithms on 100-dimensional
test problems. As we can see,SA-COSO shows similar search dynamics as shown on the 50-

dimensional test problems. The RBF network is able to help


the SA-COSO algorithm locate the region where the optimum
is and keep improving the solution, indicating the cooperative
search of SL-PSO and PSO is very effective in finding
an optimal solution. The experimental results on the 100-
dimensional problems confirm the competitive performance of
SA-COSO on high-dimensional problems.
To further evaluate our method on higher dimensional
problems, we have also conducted the comparisons on 200-
dimensional problems and the results are provided in Supplementary
materials. These results confirm the good performance
of the proposed method for solving high-dimensional
problems.

D. Empirical Analysis of the Computational Complexity


The computational complexity of the proposed SA-COSO
is composed of three main parts, namely, the computation
time for fitness evaluations, for training the RBF network,
and for calculating the distances in updating the archive DB.
In this section, we empirically compare computation time
needed by compared algorithm for solving the 50-D and 100-
D optimization problems. All algorithms are implemented on

a computer with a 2.50GHz processor and 8GB in RAM.


Table V presents the computation time of the algorithms
under comparison averaged over 20 independent runs when
a maximum of 1000 fitness evaluations using the real fitness
function is allowed. From Table V, we find that PSO, whose
computation time is mainly dedicated to fitness evaluations,
requires the least computation time. For convenience, we use
the time for PSO to search for an optimum on a fixed budget of
1000 fitness evaluations as the baseline for comparison. From
Table V, we can see that the RBF-assisted SL-PSO requires the
longest time, meaning that training the surrogate model takes
up more time than calculating the distance in updating the
archive DB. We can also find that the average time required
by the proposed SA-COSO for solving 50-D and 100-D test
problems are approximately 0.6 and 1.4 seconds per fitness
evaluation using the expensive fitness function, respectively.
Compared to most time-consuming fitness evaluations in real
world applications where each fitness evaluation may take
tens of minutes to hours, this increase in computation time
for surrogate training and distance calculation can still be
considered negligible.

You might also like