Professional Documents
Culture Documents
f201503121426163128
f201503121426163128
IJIACS
ISSN 2347 – 8616
Volume 4, Issue3
March 2015
1
Computer Engineering, Sanjivani College Of Engineering Kopargaon (Pune University)
2
Computer Engineering, Sanjivani College Of Engineering Kopargaon (Pune University)
3
Computer Engineering, Sanjivani College Of Engineering Kopargaon (Pune University)
4
Computer Engineering, Sanjivani College Of Engineering Kopargaon (Pune University)
features, and irrelevant features provide no broad categories: the Embedded, Wrapper,
useful information in any context. Feature Filter, and Hybrid approaches [2].
selection techniques are a subset of the more
general field of feature extraction. Feature 1.2 Wrapper Filter:
extraction creates new features from functions of
Wrapper methods are widely recognized as a
the original features, whereas feature selection
superior alternative in super- vised learning
returns a subset of the features. Feature selection
problems, since by employing the inductive
techniques are often used in domains where
algorithm to evaluate alternatives they have into
there are many features and comparatively few
account the particular biases of the algorithm.
samples (or data points). The archetypal case is
How- ever, even for algorithms that exhibit a
the use of feature selection in analysing DNA
moderate complexity, the number of executions
microarrays, where there are many thousands of
that the search process requires results in a high
features, and a few tens to hundreds of samples.
computational cost, especially as we shift to
Feature selection techniques provide three main
more exhaustive search strategies. The wrapper
benefits when constructing predictive models:
methods use the predictive accuracy of a
predetermined learning algorithm to determine
Improved model interpretability, the goodness of the selected subsets, the
Shorter training times, accuracy of the learning algorithms is usually
Enhanced generalization by reducing high. However, the generality of the selected
over fitting. features is limited and the computational
complexity is large. The filter methods are
Feature selection is also useful as part of the data independent of learning algorithms, with good
analysis process, as shows which features are generality. Their computational complexity is
important for prediction, and how these features low, but the accuracy of the learning algorithms
are related. is not guaranteed [2], [3], and [4].
With such an aim of choosing a subset of good 1.3 Hybrid Approach:
features with respect to the target concepts,
feature subset selection is an effective way for The hybrid methods are a combination of
reducing dimensionality, removing irrelevant filter and wrapper methods by using a filter
data, increasing learning accuracy, and method to reduce search space that will be
improving result considered by the subsequent wrapper. They
comprehensibility [1]. Irrelevant features, mainly focus on combining filter and wrapper
along with redundant features, severely affect methods to achieve the best possible
the accuracy of the learning machines. Thus, performance with a particular learning algorithm
feature subset selection should be able to with similar time complexity of the filter
identify and remove as much of the irrelevant methods.
and redundant information as possible.
Moreover, “good feature subsets contain features In cluster analysis, graph-theoretic methods
highly correlated with (predictive of) the class, have been well studied and used in many
yet uncorrelated with (not predictive of) each applications. Their results have, sometimes, the
other.” best agreement with human performance. The
general graph-theoretic clustering is simple:
Many feature subset selection methods have compute a neighborhood graph of instances, then
been proposed and studied for machine learning delete any edge in the graph that is much
applications. They can be divided into four longer/shorter (according to some criterion) than
its neighbors. The result is a forest and each tree
in the forest represents a cluster. In our study, we into the second group. Traditionally, feature
apply graph-theoretic clustering methods to subset selection research has focused on
features. In particular, we adopt the minimum searching for relevant features. A well-known
spanning tree (MST)-based clustering example is Relief [7], which weighs each feature
algorithms, because they do not assume that data according to its ability to discriminate instances
points are grouped around centers or separated under different targets based on distance-based
by a regular geometric curve and have been criteria function. However, Relief is ineffective
widely used in practice. at removing redundant features as two predictive
Based on the MST method, we propose a but highly correlated features are likely both to
Fast clustering based feature Selection algorithm be highly weighted. Relief-F extends Relief,
(FAST). The FAST algorithm works in two enabling this method to work with noisy and
steps. In the first step, features are divided into incomplete data sets and to deal with multiclass
clusters by using graph-theoretic clustering problems, but still cannot identify redundant
methods. In the second step, the most features.
representative feature that is strongly related to 2.2 Drawbacks:
target classes is selected from each cluster to
1) Some works eliminates irrelevant
form the final subset of features.
features alone.
2) Some existing systems remove redundant
Features in different clusters are relatively
features alone.
independent; the clustering based strategy of
3) This reduces the speed and accuracy of
FAST has a high probability of producing a
learning algorithms.
subset of useful and independent features. The
proposed feature subset selection algorithm III. FEATURE SUBSET SELECTION ALGORITHM
FAST was tested various numerical data sets. 3.1 Proposed System:
The experimental results show that, compared
with other five different types of feature subset Quite different from these hierarchical
selection algorithms, the proposed algorithm not clustering-based algorithms, our proposed FAST
only reduces the number of features, but also algorithm uses minimum spanning tree-based
improves the classification accuracy. method to cluster features. Meanwhile, it does
not assume that data points are grouped around
LITERATURE SURVEY
II. centers or separated by a regular geometric
2.1 Existing System: curve. Moreover, our proposed FAST does not
limit to some specific types of data.
Feature subset selection can be viewed as the
process of identifying and removing as many
3.2 System Architecture and Definitions:
irrelevant and redundant features as possible.
This is because 1) irrelevant features do not Irrelevant features, along with redundant
contribute to the predictive accuracy, and 2) features, severely affect the accuracy of the
redundant features do not redound to getting a learning machines [6]. Thus, feature subset
better predictor for that they provide mostly selection should be able to identify and remove
information which is already present in other as much of the irrelevant and redundant
feature(s). Of the many feature subset selection information as possible. Moreover, “good
algorithms, some can effectively eliminate feature subsets contain features highly correlated
irrelevant features but fail to handle redundant with (predictive of) the class, yet uncorrelated
features [4] [5], yet some of others can eliminate with (not predictive of) each other.” Keeping
the irrelevant while taking care of the redundant these in mind, we develop a novel algorithm
features[5]. Our proposed FAST algorithm falls which can efficiently and effectively deal with
and Prof. Vinay Kumar A. for their kind support Proc. Fourth Pacific Asia Conf.
and valuable suggestions. Knowledge Discovery and Data Mining,
pp. 98-109, 2000.
REFERENCES