Chapter 4

CHAPTER 4
ENSEMBLE BASED FEATURE SELECTION FOR INTRUSION DETECTION
4.1 INTRODUCTION
Real time intrusion data of various attacks were collected in cloud environment using
the Honeynet designed as discussed in the previous chapter 3. these datasets were subjected to
further analysis for intrusion detection. The efficiency of an intrusion detection system is
dependent on the selection of relevant features, among other factors. In this research work, we
study feature selection in the detection of network attacks for two purposes. Firstly, we study
the main purpose of feature selection, which is reducing the number of features in building the
attack detection predictive models. Our analysis provides a guideline for selecting a proper
feature selection method for an intrusion detection task. Secondly, we investigate the
determination of feature selection to discover the important features in the detection of a
specific attack. Such features provide more insight about the attack behavior and how it can be
distinguished from normal traffic. This information can be used in the process of feature
engineering for similar attacks [32].
In this research work, we used an ensemble of different filter feature selection methods
for finding the important feature for the detection of attacks in intrusion datasets. We proposed
a univariate ensemble feature selection technique for intrusion detection, this approach used
for the selection of valuable reduced feature set from given intrusion datasets, in which
simplicity and speed of five univariate filter methods were used for contributing features
towards intrusion detection were selected.
Then, we applied four different classification algorithms along with the univariate filter
feature selection methods to provide a good classification performance which influenced for
guiding the selection of valuable reduced feature set from given intrusion datasets. These
selected features were would be subjected to the classification algorithms discussed in next
chapter. The Pair-wise T-test was performed and the results obtained showed this proposed
feature selection technique as statistically significantly different from the exiting approaches.
4.2 PROBLEM DESCRIPTION
Elimination of redundant features in high dimension intrusion datasets has become a
major challenge for network intrusion detection. The focus of a particular feature selection
method on one specific region of the feature space may not provide better performance. Using
efficient feature selection methods in intrusion detection lead to better solution for handling
the high dimension intrusion datasets. However, distinctive feature selection algorithms would
have been chosen for different feature subsets.
In order to solve this issue, ensemble learning becomes an important part of most IDS
fields, this technique was used for combining the independent feature subsets obtained by a
function in order to get a robust feature subset. This study uses univariate ensemble-based filter
feature selection technique for intrusion detection, this approach used for the selection of
valuable reduced feature set from given intrusion datasets. The output of univariate filter
feature selection techniques namely, Information-gain, Gain-ratio, Chi-squared, Symmetric
Uncertainty and Relief have been combined to produce the final outcome.
In this study, two novel algorithms, namely Combined feature scoring(CFS) and
Minimum threshold value selection (MTVS) have been proposed, these two algorithms have
come up with the solutions for the following issues, It reduces the problems of ranking without
adopting any learning algorithms, computational overheads, statistical biases of existing
methods and used the Minimum threshold value selection to identify useful features.In this
investigation, the effect of ensemble feature selection on the model accuracy by looking deeply
into the ensemble feature selection, and performing a comparative study between existing
approaches were performed.
4.3 DATA MINING AND DIMENSIONALITY REDUCTION

Data Mining is the process of searching valuable information in a large volume of data
stored in many databases, data warehouses or any other information repositories [Hai, 2011].
Data mining is a highly interdisciplinary area spanning from a range of disciplines like
Statistics, Machine Learning [ML], Databases, Pattern Recognition and others. Data mining
technique processes the dataset which consist of more than thousands of features. Mining high
voluminous dataset often ends up with low classification accuracy due to presence of noisy,
irrelevant and redundant feature. Data mining consists of three key steps: pre-processing,
mining, and post-processing.
One of the most important tasks in machine learning applications is data pre-processing
[5]. The data collected for training in the machine learning tasks are not appropriate for the
training purposes initially. Making the data useful for such applications, requires processing.
Processing methods for handling missing data [6] and methods for detecting and handling noise
[7]. Data pre-processing is a task that used to formulate the data for input into machine learning
and mining processes. This involves reconditioning the data for increasing its quality and hence
the performance of the machine learning algorithms, such as predictive accuracy and reducing
the learning time. At the end of the data pre-processing stage, the final training set was
obtained. Feature selection is one of the tasks of data pre-processing, which focuses on
removing redundant and irrelevant feature.
Dimensionality reduction can be done by either feature selection or feature extraction

(Guyon and Elisseff (2003), van der Maaten et al. (2008)). Feature extraction reduces data
dimensionality by predicting the data into lower dimensional space formed by combinations of
features. It is a successful technique in the reduction in dimensionality and improve learning
performance. However, the new feature space is not physically linked to the original features.
Hence, there is the problem of interpretability and loss of the familiar meaning of features. The
task of feature selection improves the classification accuracy and understanding the learning
process. In the process of learning, large number of features requires huge memory space and
consumes longer time for its processing. In this scenario, Feature selection reduces the cost of
data acquisition and cost of computation. The reduction of features in terms of selecting highly
relevant features results in authentic conclusions. However, feature selection should not
victimize the highly informative features.
Feature selection essentially is categorised into search and evaluation processes. The
search process is applied for locating the features which have the viability to the target concept.
The search method is further categorised as exhaustive/complete search, heuristic search and
random search. One of these search techniques is used for searching the features from the
feature space. Every search method has its own merits and demerits on the feature selection
process. The search process comprises of three phases namely, starting phase, generation of
subset and stopping criterion of search. The evaluation process has two phases such as
individual feature evaluation and feature subset evaluation. It is otherwise said to be a
deterministic and nondeterministic evaluation of features. Figure (4.1) shows the four steps of
the feature selection process.
Figure 4.1: Feature Selection Process.
4.4 CATEGORIES OF FEATURE SELECTIONS

Feature selection is process of detecting the significant features in a space of feature
subsets for which specification of a starting point and eliminating the irrelevant features. An
assessment function and a stopping condition are required a strategy to traverse the space of
subsets [17]. In this research work, we used filter feature selection method. The following
subsections briefly introduce filter method.
4.4.1 Filter Method

This method chooses a subset of the relevant features that retains the significant data
originate in the complete set of features to the extent possible. The methods that use the filter
approach are independent of any particular algorithm as the function that they use for
evaluation relies completely on the properties of the data [24]. The relevance of the features is
calculated by considering the essential stuffs of the data. This involves the calculation of a
feature significance result and the features whose score are less and removed. Only the
remaining subset of features are used as input to the algorithm. This filter methods do not take
into consideration the relations between features but calculate the relevance of each feature in
isolation. However, now the filter methods take many criteria into consideration e.g. now the
filter methods select features with minimum redundancy. The process of Filter model is
explained in Figure (4.2).
Figure 4.2: The Filter Model.
A filter algorithm first ranks features based on some quality criteria. Features with the
highest weights or ranks are then selected to induce classification. Filter based feature ranking
methods are further split in to two sub categories such as univariate and multivariate.
Univariate filter methods evaluate relevance of each feature relevance independently of others
while multivariate methods take into account features dependencies while evaluating them.
Univariate method could be easily scaled to very high dimensional datasets. these methods are
known for speedy performance, computational simplicity and have high performance
characteristics as compared with the other approaches. while multivariate approaches are more
complex specially for data sets with large d as they introduce all feature dependencies. In that
sense, univariate features are advantageous. The most widespread univariate filter methods are
described as follows.
a) Information Gain (IG)

Information Gain (IG) is one of the top most widely used feature selection methods based on
mutual information (Quinlan (1993)). IG is simple and computationally efficient. It quantifies
the data between the jth feature fj and the class labels C, i.e. the amount of information in bits
about the class prediction, in the presence of that feature and knowing the corresponding class
distribution. Given S the set of training examples, IG of a feature is calculated as follows:
Sf j = v
IG(S, f j ) = H ( S ) å H(Sf j = v) (4.1)
v=values( f j ) S
Sf j = v
Where is the fraction of examples with fj having the value v, and H(S) is the entropy
S
given by:
K
H(S) = -å p(ck )log2 ( p(ck ))

k=1
(4.2)
Where p(ck) is the possibility of detecting class ck in the training set S and K is the number of
classes. H(Sfj=v) is calculated in the same way using only the subset of instances with fj having
the value v. A feature is relevant if it has a high IG. Univariate approach is used to select the
features. The Info gain is the most popular filter based feature selection technique in the filter
method, this is an information theoretic measure also called as symmetric measure for feature
selection. The Info Gain is defined as the following equation [26]:
Info Gain ( A) = Info ( D ) – InfoA ( D ) (4.3)
where Info Gain (A) is the IG of a feature A, and the entropy of the absolute dataset is an Info
(D), the sample entropy of attribute A is Info A(D).
b) Gain-ratio
The Gain ratio filter feature selection method is measured to be one of the discrepancy
measures that provides regularised notch to improve the Info Gain method score. The split
information value can be applied to measure the ratio, the formula for split information as
follows [26]:
v Dj Dj
Split InfoA ( D ) = - å * log2 (4.4)
j=1 D D
where the structure of v partitions represents the Split Info. The gain ratio formula as follows
[26]:
GainRatio( A) = InfoGain( A) / SplitInfo( A) (4.5)
c) Chi-squared
The Chi squared FS is the most widespread statistic measure feature selection technique, which
processes the relationship between two variables. It may assist to assess the independence of a
feature from its class. The Formula for Chi squared FS defined as follows [26]:
c 2 = å (Oij - Eij )2 / Eij (4.6)

ij
Where i and j are two variables and O represents Observed value and E represents expected
value and  represents value of Chi-squared.
N * (F1 F4 - F2 F3 )2
CHI( A,Ci ) = (4.7)
(F1 + F3 ) *(F2 + F4 ) *(F1 + F2 ) *(F3 + F4 )
CHI max ( A) = maxi (CHI( A, Ci )) (4.8)
The frequencies of incident of both feature A and class Ci represents F1, F2, F3, and F4, also
the total number of attributes indicates N.
d) Symmetric Uncertainty
This method is an information theoretic measure also called symmetric measure for feature
selection technique, it is used to evaluate the ranking of produced solutions.
This method can be defined by the following formula [27]:
2* IG( A / B)
SU ( A, B) = (4.9)
H( A) + H(B)
where IG calculated by an independent feature A and feature which is denoted by IG (A | B),
then, H (A) and H (B) indicates the entropy of the features A and B.
e) Relief
Relief is an efficient filter based feature selection method. Relief uses heuristic techniques to
generate candidate feature subset and distance to evaluate a candidate subset. This algorithm
is a feature weight based algorithm and uses statistics method to choose the relevant features.it
correlates the features and feature weight value, and can correctly estimate the quality of
features in problems with strong dependencies between features. The key idea of the original
RELIEF algorithm is to estimate the quality of features according to how well their values
distinguish between instances that are near to each other. It also uses the concept of NearHit
and NearMiss. For that purpose, given a randomly selected instance Ri, Relief searches for its
two nearest neighbors, one from the same class, called NearHit(H) and the other from the
different class, called NearMiss (M). The algorithm uses one function diff () to find difference
of same features in two different records.
W[ A]:= W[A]- diff ( A, Ri , H) / m + diff ( A, Ri , M) / m (4.10)
Where W[A] updates the quality estimation for all attributes A depending on their values for
Ri, M and H.
4.5 UNIVARIATE ENSEMBLE BASED FILTER FEATURE SELECTION (UEFFS)
Eliminating redundant features in intrusion datasets is becoming an ever-lasting challenge

in the network intrusion detection. During the pre-processing phase, missing attribute of the
features were added, or irrelevant attributes were filtered out. In this research work, we
proposed a univariate ensemble-based filter feature selection (UEFFS) technique for intrusion
detection. The following five univariate filter feature selection techniques have been used in
the proposed ensemble approach:
 Information Gain (IG)

 Gain_Ratio (GR)
 Chi_Squared (CS)
 Symmetric Uncertainty (SU)
 Relief (R)
The output of univariate filter feature selection techniques such as Info gain, Gain-ratio,
Chi-squared, Symmetric uncertainty and Relief were combined to enable production of the
final outcome. The proposed Feature ranking methodology has come up with solutions for the
following issues:
 Using an efficient feature ranking algorithm to reduce the problems of ranking without
adopting any learning algorithms, statistical bias of existing methods, computational
overheads.
 Using a least threshold value selection for holding important selective features to be
identified. we proposed two novel algorithms,
 Combined feature scoring algorithm (CFS)
 Minimum threshold value selection algorithm (MTVS)

4.5.1 COMBINED FEATURE SCORING (CFS)
In the proposed CFS algorithm based on ensemble methods, instead of selecting one
specific feature selection method, and taking its outcome as the final subset, different models
could be combined using ensemble feature selection approaches. This method not only
improves the classification performance but also helps the classifiers to get precise results
during attack detection. As discussed above, there are various filter approaches proposed and
selected as the most significant features for improving the predictive performance. This
proposed method has used the ranking approach, considering the reasons for considered a
desirable approach due to very fast, simple, measuring the relevance of a feature subset not
time consuming. Hence, this method could be very appropriate for the choice of the significant
features in intrusion data sets.
The proposed approach, evaluates the significance of the features by their association
to the class and classifies independent features corresponding to their degrees of weights.
Features with the highest weights or ranks are then selected for inducing classification. In order
to equalize the impact of dissimilar scales, the proposed approach changes the values to the
identical scale (i.e., range from 0 and 1). Features with the uppermost weights or ranks are then
selected to rank 1, the existing approach has been selected rank 0 to a feature with the
uppermost weights or ranks. Following this, the scale ranks are ordered in the ascending order
and combines them. Finally, the proposed algorithm calculates a mean to find the ranks and
significances of each feature.
An algorithm 1, first takes an intrusion datasets as input for proposed CFS scheme, and
calculates the weights of the attributes and the following key steps would clearly describe the
process of the algorithm. This algorithm, uses a univariate filter-based measures. The
univariate scheme does evaluation of each feature independent of the others, while the
multivariate scheme evaluates features in batches. The proposed CFS algorithm is presented as
follows:
Algorithm 1: CFS-Combined feature scoring
Input: Input Datasets (Honeypot, NSL-KDD, Kyoto)
Output: FR –Feature Rank
Step 1: Compute the number of features
totFeatures  totFeatures(data)
Step 2: Let n be the number of filter features measures.
(FtrEv1, FtrEv2, FtrEv3……and FtrEvn)
Step 3: Perform the features ranks using filter features measures
FRC1[] featureRanksCompute (data, FtrEv1)// Where FRC represents computed ranks;
FRC2[] featureRanksCompute (data, FtrEv2);
FRC3[] featureRanksCompute (data, FtrEv3);
FRCn [] featureRanksCompute (data, FtrEvn);
Step 4: Feature Scaling Ranks(FSR) - Invoke the Algorithm 2 for –FSR feature scaling ranks
scaledRanks1[]scaleRanks(CR1)
scaledRanksn []scaleRanks(CRn)
Step 5: Then combined the sum of all computed ranks
Step 6: SumofcombinedRanks 0;
Step 7: combinedRanks [];
Step 8: for ∀ totFeatures ∈ D do
Step 9: then adding all computed scaled ranks to find the ranks for each feature.
Step 10: combinedRanks1 ∑𝑛𝑗=1 𝑠𝑐𝑎𝑙𝑒𝑑𝑅𝑎𝑛𝑘𝑠ji
Step 11: SumofcombinedRanks= SumofcombinedRanks+ combinedRanksi;
Step 12: end
Step 13: Make the rank list in ascending order
Step 14: sortedRanks[]sort(combinedRanks);
Step 15: Find each feature of Score, weight, and priority
Step 16: for ∀ totFeatures ∈ D do
Step 17: ftrScoresi  combinedRanksi / SumofcombinedRanks;

Step 18: ftrWeightsi combinedRanksi/ SumofcombinedRanks;
Step 19: ftrPrioritiesi  ftrScoresi * ftrWeightsi;
Step 20: Put the rank ID for each feature based on Priority.
Step 21: FR[]assignRank(ftrPrioritiesi)
Step 22: end
Step 23: Return FR: Feature Ranks
In this research work, the first step of the CFS algorithm is to compute features from
the following three intrusion datasets namely, Honeypot, NSL-KDD, Kyoto. In step two, the
five-univariate filter based measures were used for ranking each feature in an intrusion datasets.
This process has been represented in step 4 to step 7 of algorithm 1. Then, all the computed
ranks were scaled using the first filter measure. This process was executed by algorithm 2 and
following key step repeated for the continuing (n − 1) measures which also represented in step
9 to step 12. After that, ranks aggregations were executed in step 14. Later, the feature score
and weight were computed, as presented in step 17 and step 18 of algorithm 1. Finally, the
priority value of each feature was computed based on the distinct measure score and weight.
Algorithm 2: Feature Ranks Scaling

Input : FRC: Input Feature Ranks
Output: FRS: Feature S Ranks Scaling
Step 1: lesser  ranks0 ;
Step 2: higher  ranks0;
Step 3: for ∀ totFeatures ∈ FRS do
Step 4: if ranki > Higher then

Step 5: Higher  ranki;
Step 6: else
Step 7: If ranki < lesser then
Step 8: Lesser  ranki
Step. 9: end
Step. 10: end
Step 11: end
Step 12: min  Lesser;
Step 13.: max  Higher;
Step 14: FRS[](ranks-min)/(max-min);
Step 15: Return FRS: feature scaled ranks
4.5.2 MINIMUM THRESHOLD VALUE SELECTION (MTVS)
The development of the minimum threshold value selection was described in algorithm 3. The
first step of this algorithm was consideration of the three intrusion datasets (Honeypot, NSL-
KDD, Kyoto) and four classifiers (naïve bayes, logistic regression, decision tree, SVM) as
input for evaluations. then, all of those datasets were fed to the info-gain filter measure to
compute the attributes ranks and then the attributes were arranged in the ascending order based
on their ranks, this is shown in step 3 and step 4 of algorithm 3. After that, all the datasets were
segregated into distinct chunks. The top ranked features of 80% datasets were retained and
lower ranks features of 20% were discarded. Once filtered datasets were generated, the next
step was to feed the filtered dataset to four classifiers from various types and different
characteristics as shown in step 6 to step 11 of the algorithm 3. In step 12, the predictive
accuracies of those classifiers were computed using 10-fold cross validation approach. Finally,
the minimum cut-off value was identified by the average predictive accuracies against each
chunk of dataset. This was shown in step 16 and step 20. Figure 4.5 illustrates the process of
the minimum threshold value selection.
Figure 4.3. Minimum Threshold Value Selection

The minimum threshold value selection (MTVS) algorithm is presented as follows:
Algorithm 3: MTVS (D, C)

Input: D - Datasets
C - Classifiers
Output: V –cutoff value
Step 1: initiation;
Step 2: for dii  in D do
Step 3: dii FeatureRankCompute(dii)
Step 4: dii  FeatureRanksortbyASC(dii)
Step 5: end
Step 6: PA100;
Step 7: for diin D do
Step 8: while PA > 5 do
Step 9: ky  sizeOf(dii) * (pa/100)
Step 10: PAcc  newSet()
Step 11: for Ci  in C do;
Step 12: PA Acc  predictiiveAccuracy (Ci, topKFeatures(di,k));
Step 13: Acc.add(Pacc)

Step 14: end
Step 15: PAVGaPcc  computeAVG(Acc) //compute average accuracy;
Step 16: GR  Plot(AVGPacc ,ky)
Step 17: PA  PA-5

Step 18: end
Step 19: end
Step 20:V  getCut-offValue(GR);
4.6 PERFORMANCE METRICS
Feature selection results are interpretable in many applications and selection methods leading
to meaningful results should be preferred to those that do not. However, in some cases the
interpretability of the selected features requires a deep knowledge of the application field that
a computer scientist may not have, making this evaluation not obvious. Due to this reason, the
following four performance metrics namely accuracy, precision, recall, and f-measure. In this
research work the focus is on the evaluation of the methods become applicable more than
interpretability of results. However, the classification accuracy evaluation relies on a predictive
algorithm. Hence, the predictive models were used and then several metrics for assessing
classification performances have been described.
4.6.1 CLASSIFICATION ALGORITHMS
Data classification is categorised under supervised learning where the objective is the
prediction of a class membership value, also called class label of unknown observations or
samples using a training data for which the class labels are known. Each observation in the
training or test data is represented by a feature vector associated with it. The process of data
classification involves generally two steps: training of a classifier and testing of the trained
classifier. There are many different classifiers that can be applied in different applications. The
details of those classification algorithms were discussed in section 1.6.2.
4.6.2 CLASSIFICATION PERFORMANCE
Classification performance is an important evaluation criterion of feature selection. Several

classification performance metrics can be found in the review of Costa et al. (2007). Let us
assume that a set of possible class labels consists of positive, P, and negative, N, labels.
The total number of positives is P and the total number of negatives is N. There are four
possible outcomes of a classification algorithm for this problem:
 True positive. (T.P): If the label sample is positive and predicted as positive.
 False negative. (F.N): If the label sample is positive and predicted as negative.
 False positive. (F.P): If the label sample is negative and predicted as positive.
 True negative. (T.N): If the label sample is negative and predicted as negative.
The results of any classification can be summarized with the help of a confusion matrix. A
confusion matrix shows the details of the actual and expected results given by the classifier.
This representation is made by comparing true and predicted labels. Comparison is done using
a confusion matrix where lines and columns are respectively true and predicted classes. TP and
TN represent correct decisions made by the classifier, while FN and FP are classification errors.
Given a set of known labels for positive and negative classes, any input data may be classified
into positive or negative class. According to the results the classification may be grouped as
True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN).
Accuracy: The most used performance criterion is the correct classification rate, known as
accuracy. This measure ranges from 0, with perfect misclassification, to 1 when the classifier
perfectly classifies the testing data. It is the most common and simplest measure to evaluate a
classifier. It specifies the overall effectiveness of the classifier. It is calculated as follows:
TP + TN
Accuracy = (4.16)
P+ N
Recall: Recall is also called true positive rate, it is the proportion of positive cases that were
correctly identified, it is calculated as follows:
TP
Re call = (4.17)
P
Precision: Precision is the ratio of the predicted positives to total number of samples.
it is calculated as follows:
TP
Pr ecision = (4.18)
TP + FP
F-measure: The F-measure is used to check test accuracy using precision p and recall r of the
test. It is defined as the weighted average of the precision and recall. The maximum value of F
score is 1 and the minimum value is 0. It is calculated:
2
Fmeasure = (4.19)
1/ precision +1/ recall
4.7 EVALUATION OF THE UEFFS METHOD
The proposed UEFFS method was evaluated on three intrusion datasets namely, NSL-
KDD, Kyoto and Honeypot and also compared with existing methods for the purpose of
proving the effectiveness of predicted informative feature subsets. In this study, used the
following four statistical measures such as predictive accuracy, precision, recall, and f-
measure. In order to get the better understanding, the investigation was conducted with two
studies upon text and non-text based datasets to check the effectiveness of the proposed
methodology.
4.7.1 EXPERIMENTAL SETUP
The proposed method was implemented on system configuration with macOs High
Sierra version 10.13.6 Mac Book Pro with Intel Corei5 Processor (3.10GHz) and 8GB RAM.In
this work, three datasets of varying complexity were chosen, namely, NSL-KDD, Kyoto and
Honeypot. The proposed method performed a pair-wise T-test with a 5% significance level to
investigate the Statistical significance of the classification accuracy result. Then p-values were
calculated (i.e. p < 0.05) for every three intrusion datasets and compared with five univariate
FS techniques. the results obtained showed this proposed feature selection technique as
statistically and significantly different from the exiting approaches.
The descriptions of the all final labelled features datasets are identified at both packet and flow
levels are obtained in Table 4.1 and Table 4.2.
Table 4.1: Features identified for packet level data
Table 4.2: Features identified for flow level data
4.7.2 Result and Discussion
The experiment was empirically evaluated for the effectiveness of proposed univariate-
ensemble based feature ranking methodology for selecting valuable reduced feature set upon
three intrusion datasets (Honeypot, NSL-KDD, Kyoto) and four classifiers (naïve bayes,
logistic regression, decision tree, SVM). This proposed methodology was compared with five
univariate filter measures in relations of four performance metrics. The proposed approach was
seen offering viable results as competed to the earlier FS measures. As discussed above, this
method was involved with two studies to ensure the effectiveness of the proposed
methodology.
The result achieved after applying three intrusion datasets (Honeypot, NSL-KDD, Kyoto) and
four classifiers such as naïve bayes, logistic regression, decision tree, SVM have been
summarized in Tables 4.3, Table 4.4 and Table 4.5. the proposed methodology shows
significantly better results compared to existing FS measures. In the first study, the comparison
of precision with existing FS measures were recorded in Tables 4.3.
Table 4.3 Comparison of precision with FS measures.
Datasets Filter Feature Selection Measures Proposed

Method
IG GR CS SU R UEFFS
Honeypot 0.948 0.922 0.943 0.948 0.952 0.953
NSLKDD 0.940 0.943 0.947 0.945 0.942 0.948
Kyoto 0.942 0.941 0.940 0.942 0.940 0.942
IG: Information Gain, GR: Gain ratio, CS: Chi-squared, SU: Symmetrical Uncertainty, R:
Relief
Similarly, Figure 4.4 depicts the comparison of the average classifier precision with FS
measures, the results obtained may better visualised and compared with existing approaches.
IG GR CS SU R UEFFS
0.96
0.955
0.95
0.945
0.94
0.935
Precision
0.93
0.925
0.92
0.915
0.91
0.905
Honeypot NSL-KDD Kyoto
Datasets
Figure .4.4 Comparison of Precision with FS measures

The results obtained shows the performance of the proposed method performs better
than that existing approaches in terms of precision. Table 4.4. Summarized the performance of
classifiers in terms of recall measure as compared with existing FS measures.
Table 4.4: Comparison of recall with existing FS measures.
Datasets Filter Feature Selection Measures Proposed

Method
IG GR CS SU R UEFFS
Honeypot 0.93 0.91 0.92 0.94 0.93 0.94
NSLKDD 0.92 0.89 0.91 0.92 0.91 0.93
Kyoto 0.91 0.88 0.90 0.91 0.90 0.92
I G: Information Gain, G R: Gain ratio, C S: Chi-squared, S U: Symmetrical Uncertainty, R:

Relief
Similarly, Figure 4.5 depicted the comparison of recall with FS measures respectively.
IG GR CS SU R UEFFS
0.95
0.94
0.93
0.92
0.91
Recall
0.9
0.89
0.88
0.87
0.86
0.85
Datasets
Figure.4.5 Comparison of Recall with FS measures

The results, shows the achievement of good outcomes by the proposed methodology compared
to the existing FS measures.
Table 4.5 provides a summary of predictive accuracy comparison of the proposed
method with three intrusion datasets (Honeypot, NSL-KDD, Kyoto), four classifiers and five
feature selection measures. The results are shown in Table 4.5. the results show the proposed
methodology achieving a good result better than the existing FS measures. Figure 4.6. depicts
a comparison of the predictive accuracy.
Table 4.5 Comparison of predictive accuracy with FS Measures
Dataset Filter Feature Selection Measures Proposed

Method
IG GR CS SU R UEFFS
Honeypot 94.24 92.13 92.28 93.18 94.11 95.39
NSLKDD 93.78 90.23 93.21 93.12 94.10 94.12
Kyoto 90.13 91.19 92.04 92.18 91.23 92.42
I G : Information Gain, G R: Gain ratio, C S: Chi-squared, S U: Symmetrical Uncertainty, R:

Relief
96 IG GR CS SU R UEFFS
95
Predictive accuracy(%)
94
93
92
91
90
89
88
87
Datasets
Figure 4.6 Comparison of predictive accuracy
The results were obtained after applying the proposed method on three intrusion datasets
(Honeypot, NSL-KDD, Kyoto), four classifiers and five feature selection measures in terms of
accuracy, TP and FP which have been recorded in the Table 4.6, which shows the proposed
method evaluated on Honeypot dataset indicating improved outcome compared to the other
two datasets. (NSL-KDD and Kyoto).
Table 4.6: Classification performance based on the three intrusion datasets

Honeypot NSL-KDD Kyoto 2006+
FS TP FP Accuracy TP FP Accuracy TP FP Accuracy

Methods
IG 0.948 0.23 94.24 0.940 0.23 93.78 0.942 0.34 90.13
GR 0.922 0.21 92.13 0.943 0.21 90.23 0.941 0.27 91.19
CS 0.943 0.41 92.28 0.947 0.41 93.21 0.940 0.25 92.04
SU 0.948 0.22 93.18 0.945 0.22 93.12 0.942 0.19 92.10
R 0.952 0.18 94.11 0.942 0.18 94.10 0.940 0.21 91.23
UEFFS 0.953 0.12 95.39 0.948 0.12 94.12 0.942 0.13 92.12
I G : Information Gain, G R : Gain ratio, C S : Chi-squared, S U : Symmetrical Uncertainty,

R: Relief
Figure 4.7 depicts the performance of the classification based on the three intrusion datasets.
IG GR CS SU R UEFFS
600
500
400
300
200
100
0
Accuracy
Accuracy
Accuracy
TP
TP
TP
FP
FP
FP
Figure 4.7 Classification performance based on the three intrusion datasets

In order to check whether proposed feature selection method is significantly different,
we performed a pair-wise T-test with a 5% significance level to investigate the statistical
significance of the classification accuracy result. Then p-value was calculated (i.e. p < 0.05)
for every three intrusion datasets and a comparison with five univariate FS techniques is shown
in Table 4.7.
Table 4.7: Predictive accuracy comparison of the proposed method with existing FS
measures.
Datasets Filter Feature Selection Measures Proposed Paired

Method samples T-
Test
IG GR CS SU R UEFFS P (Sig-two
tailed)
Honeypot 94.24 92.13 92.28 93.18 94.11 95.39 0.028
NSLKDD 93.78 90.23 93.21 93.12 94.10 94.12 0.024
Kyoto 90.13 91.19 92.04 92.18 91.23 92.42 0.032
I G : Information Gain , G R : Gain ratio, C S: Chi-squared, S U: Symmetrical Uncertainty,

R: Relief
In this second study, the result was achieved following evaluation on text datasets
namely, Spam assassin, MininewsGroups, and Course-cotrain. The proposed method was
compared with the existing feature measures in terms of classification predictive accuracy as
well as F-measure, as depicted in the Figure 4.8 shows the achievement of the better
performance of the proposed method than that of the existing methods.
1
0.9
0.8
0.7
F-Measure
0.6
0.5
0.4
0.3
0.2
0.1
0
Spam Assassin MiniNewsGroups Course-Cotrain
IG Relief DRB-FS GR Proposed Method
Figure 4.8. F-measure vs State-of the art-method
4.8 Conclusion
This research work proposed an ensemble-based univariate filter feature methodology for the
choice of the informative features from a given intrusion datasets. Two innovative algorithms
namely, the combined feature scoring algorithm and the minimum threshold value selection
have been proposed. The results obtained from these algorithms show the UEFFS technique
achieving better classification accuracy, f-measure and effectiveness than the single feature
selection method. This method was evaluated on three intrusion datasets, four classifiers and
five univariate feature selection measures, and also compared with the existing methods. The
Pair-wise T-test was performed and the results obtained showed this proposed feature selection
technique as statistically significantly different from the exiting approaches. This experiment
and the results provided the informative features with higher accuracy by removing the
insignificant and redundant features. Hence, the conclusion is that this proposed method has
the potential to bring about improvement in the accuracy and robustness of various
classification tasks, contributing to FS and not only key step in intrusion detection systems but
also in many different applications.

Chapter 4

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 4

Uploaded by

Copyright:

Available Formats

CHAPTER 4

ENSEMBLE BASED FEATURE SELECTION FOR INTRUSION DETECTION

4.3 DATA MINING AND DIMENSIONALITY REDUCTION

Dimensionality reduction can be done by either feature selection or feature extraction

4.4 CATEGORIES OF FEATURE SELECTIONS

4.4.1 Filter Method

a) Information Gain (IG)

H(S) = -å p(ck )log2 ( p(ck ))

Info Gain ( A) = Info ( D ) – InfoA ( D ) (4.3)

c 2 = å (Oij - Eij )2 / Eij (4.6)

CHI max ( A) = maxi (CHI( A, Ci )) (4.8)

W[ A]:= W[A]- diff ( A, Ri , H) / m + diff ( A, Ri , M) / m (4.10)

Eliminating redundant features in intrusion datasets is becoming an ever-lasting challenge

 Information Gain (IG)

adopting any learning algorithms, statistical bias of existing methods, computational

identified. we proposed two novel algorithms,

 Combined feature scoring algorithm (CFS)

 Minimum threshold value selection algorithm (MTVS)

Step 8: for ∀ totFeatures ∈ D do

Step 16: for ∀ totFeatures ∈ D do

Step 17: ftrScoresi  combinedRanksi / SumofcombinedRanks;

Algorithm 2: Feature Ranks Scaling

Step 3: for ∀ totFeatures ∈ FRS do

Step 4: if ranki > Higher then

Figure 4.3. Minimum Threshold Value Selection

Algorithm 3: MTVS (D, C)

Step 12: PA Acc  predictiiveAccuracy (Ci, topKFeatures(di,k));

Step 13: Acc.add(Pacc)

Step 16: GR  Plot(AVGPacc ,ky)

Step 17: PA  PA-5

4.6.1 CLASSIFICATION ALGORITHMS

4.6.2 CLASSIFICATION PERFORMANCE

Classification performance is an important evaluation criterion of feature selection. Several

4.7.1 EXPERIMENTAL SETUP

4.7.2 Result and Discussion

Datasets Filter Feature Selection Measures Proposed

Honeypot 0.948 0.922 0.943 0.948 0.952 0.953

NSLKDD 0.940 0.943 0.947 0.945 0.942 0.948

Kyoto 0.942 0.941 0.940 0.942 0.940 0.942

Figure .4.4 Comparison of Precision with FS measures

Table 4.4: Comparison of recall with existing FS measures.

Datasets Filter Feature Selection Measures Proposed

Honeypot 0.93 0.91 0.92 0.94 0.93 0.94

NSLKDD 0.92 0.89 0.91 0.92 0.91 0.93

Kyoto 0.91 0.88 0.90 0.91 0.90 0.92

I G: Information Gain, G R: Gain ratio, C S: Chi-squared, S U: Symmetrical Uncertainty, R:

Figure.4.5 Comparison of Recall with FS measures

Table 4.5 Comparison of predictive accuracy with FS Measures

Dataset Filter Feature Selection Measures Proposed

Honeypot 94.24 92.13 92.28 93.18 94.11 95.39

NSLKDD 93.78 90.23 93.21 93.12 94.10 94.12

Kyoto 90.13 91.19 92.04 92.18 91.23 92.42

I G : Information Gain, G R: Gain ratio, C S: Chi-squared, S U: Symmetrical Uncertainty, R:

Table 4.6: Classification performance based on the three intrusion datasets

FS TP FP Accuracy TP FP Accuracy TP FP Accuracy

GR 0.922 0.21 92.13 0.943 0.21 90.23 0.941 0.27 91.19

CS 0.943 0.41 92.28 0.947 0.41 93.21 0.940 0.25 92.04

SU 0.948 0.22 93.18 0.945 0.22 93.12 0.942 0.19 92.10

R 0.952 0.18 94.11 0.942 0.18 94.10 0.940 0.21 91.23

I G : Information Gain, G R : Gain ratio, C S : Chi-squared, S U : Symmetrical Uncertainty,

Figure 4.7 Classification performance based on the three intrusion datasets

Datasets Filter Feature Selection Measures Proposed Paired