Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

International Journal of Mathematics and Computer Applications Research (IJMCAR) ISSN 2249-6955 Vol.

3, Issue 3, Aug 2013, 57-64 TJPRC Pvt. Ltd.

SOFT SET BASED TECHNIQUES FOR MINING UNCERTAIN DATA


BVST SAI Research Scholar, Department of Mathematics, Andhra University, Visakhapatnam, Andhra Pradesh, India

ABSTRACT
Soft sets theory was first introduced by Molodtsov in 1999. It is meant for dealing with uncertain data in terms of analysis and decision making. Soft sets are a special kind of information systems. They are also known as Boolean valued information systems. This paper provides insights into soft sets, the usage of the soft set theory, comparison with rough sets and the synergetic advantages of soft sets usage along with other soft computing techniques. It also focuses on the usage of soft sets in real time applications, the challenges encountered in the process and the possible solutions.

KEYWORDS: Soft Set, KDD, CSS and Fuzzy Set Theory INTRODUCTION
Due to the technologies in computer science and the availability of computing equipment for affordable prices, people of all walks of life started using computer in one way or other. Volumes of data are being added databases of organizations. Processing such voluminous data needs good techniques or methodologies. Traditional approaches to summarize data and extract knowledge from the data are not adequate. Data mining [1] domain has provided various techniques that can be used to extract trends or patterns from the data. Thus the discovered patterns can help in making useful decisions. The data present in the databases is of many types. It could be categorical or continuous data. Data mining has provided algorithms for dealing with both of them. Knowledge Discovery in Databases (KDD) [2] has become an essential activity in most of the organizations for making well informed decisions. The databases that store huge amount of data are characterized by various kinds of data such as textual, pictorial, symbolic, numeric and aural. The data might have errors or inconsistencies. KDD helps to discover trends in such data effectively. KDD is required by organizations where they need to make decisions after analyzing huge amount of historical data. The general KDD process is presented in figure1.

Figure 1: Block Diagram for KDD As can be seen in figure 1 KDD takes raw data as input and generates useful knowledge after preprocessing and processing. Preprocessing is a process of making the data ready for processing. The operations such as data wrapping, dimensionality reduction, data condensation and data cleansing are performed in the preprocessing phase. Then the

58

BVST Sai

preprocessed data is subjected to various data mining techniques which bring about trends or patterns from the data. These trends or patterns provide required business intelligence or knowledge that helps in making decisions. According to Chau et al. [3], the data mining techniques that have been around for many years focused on certain data. Relatively less research went on dealing with uncertain data. Applications like location based services, sensor monitoring etc. produce uncertain data. For instance data about moving objects. To know the location of object it is required to process the uncertain data. Many techniques came into existence to process uncertain data. The taxonomy of data mining techniques for uncertain data is presented in figure2.

Figure 2: Taxonomy of Data Mining on Uncertain Data As can be seen in figure 2 association rule mining, data classification, data clustering and soft set based methods are available for mining uncertain data. This paper focuses on soft computing techniques and their usage in the real world. The usage of soft sets comes under soft computing which is a collection of many methodologies like fuzzy logic, evolutionary algorithms, and neural networks [4]. In other words, the soft set theory is another technique in soft computing paradigm as shown in figure3.

Figure 3: Constituents of Soft Computing Including Soft Sets (Excerpt from [20])

Soft Set Based Techniques for Mining Uncertain Data

59

As can be seen in figure 3, the soft set is part of soft computing approach which also includes other techniques such as evolutionary algorithm, neural network and fuzzy theory that provide robust data mining solutions that help in making intelligent decisions. In this paper we give insights into soft sets, their usage, and comparison with other techniques besides presenting the review of literature on the same. We also observe the results and usefulness of soft set theory in various domains such as education, health care, entertainment etc. The rest of the paper is structured into the following sections. Section II provides review of literature which is relevant to soft sets and their state of the art. Section III discusses the findings while section VI concludes the paper.

RELATED WORK
Soft set theory was first introduced by Molodtsov in 1999 as a new paradigm for mining uncertain data. Soft sets overcome the inadequacy of other techniques such as interval mathematics [5], fuzzy set theories [6] and probability. Pei and Miao [7] explored information systems and soft sets in terms of relationship between them. The results of their experiments reveal that information systems and partition-type soft sets share a common formal structure. For instance fuzzy information systems and fuzzy soft sets are equal. Razak and Mohamad [8] proposed a group decision making method with criteria based on soft set based data mining approach. The weight of each criterion is computed using a method known as Analytic Hierarchy Process (AHP). The group decision making problem is solved using soft max min decision making method. Chetia and Das [9] extended Biswass method for evaluation of answer scripts of students. They assumed five satisfaction levels in order to evaluate the performance of students. They include unsatisfactory, satisfactory, good, very good and excellent. They have developed an algorithm that takes students statistics as input and build a soft set matrix before evaluating the performance of students. Herawan et al. [10] presented an approach to reduce dimensionality of soft set. The existing solutions on soft set are Boolean based. However, they may also have non-Boolean values. In case of multi-valued information systems, they presented an alternative approach for reducing attributes. They introduced the ideal of multi soft sets that are constructed from multi-valued information systems. Then they also used OR and AND operators on soft sets. They came to know from the experiments that the set of attributes (reduct) required in soft set theory are also same as that of rough set theory. The reduct approach was first introduced by Maji et al. [11]. They used it for decision making in soft set mining applications. Parameterization reduction is also possible in soft sets and related applications as presented by Chen et al. [12]. They further said that the approach followed by Maji was incorrect and also claimed that the reduct is not same for soft set theory and rough set theory. Their idea for reduction of attributes in soft sets was based on the optimal choice concept that addresses the problems of sub-optimal solutions. This problem was also analyzed by Kong et al. [13] and defined actual parameter reduction that can overcome the problems of sub-optimality. For decision making using soft set where there is data deficiency Zou [14] proposed a novel technique. This technique is based on computation of weighted average as per the distribution objects. In [15] it is proved that every rough set can be utilized as soft set for mining purposes. For this reason an alternative approach was invented in [16] to achieve reduct whose results again proved to be same as that of rough reduction. Xun Ge and Songlin Yang [17] investigated operations on soft set. They explored the operations defined in the prior works. The results of their work help others to choose right operators and operational rules while working with soft sets. Rose et al. [18] Proposed two techniques to compare incomplete datasets. The techniques are based on aggregate and calculated support values and parity bits of supported set. When a dataset is downloaded or taken from a source, it might be

60

BVST Sai

an incomplete dataset due to VIRUS attacks or any software or hardware problems. As the processing of incomplete datasets will yield inconsistent results, it is essential to know whether the data sets are complete or incomplete prior to using them in data mining algorithms. The results of comparison help in finding missing attributes and take necessary steps to rectify the problems before actually processing the data. Rajpoot et al. [19] proposed an association rule mining based on soft set approach using constraints with respect to initial support. The constraint is meant for filtering rarely occurred items and false frequent items. As the pruning reduces search space, the dataset is improved and it consumes fewer resources to mine association rules. Afterwards, the dataset is converted to Boolean valued information system. The resultant dataset is known as soft set. They compared both CSS approach and previous soft set based approach. The results revealed that constraint based soft set approach has exhibited higher performance. Table 1 shows the performance comparison of both. Table 1: Performance of Soft Set Vs. Constrained Soft Set (Excerpt from [19]) Min Sup 2 3 4 5 Min Conf .6 .6 .6 .6 Soft Set Approach (Execution Time in Sec) .07794 .06438 .06323 .05507 CSS Approach (Execution Time in Sec) .06225 .03994 .02759 .0189

As can be seen in table 1, with similar support and confidence, the CSS outperforms soft set approach in terms of execution time taken. Satya Ranjan Dash and Satchidananda Dehuri [20] applied soft data mining to Bioinformatics and explored the challenges encountered. Of late there has been considerable research on soft sets and their application to Bioinformatics. Soft set theory is used for data analysis and also decision making in Bioinformatics domain. Using adequate parameters membership is decided in soft set theory. Equivalent classes concept is used by rough set theory while the grade of membership is used by fuzzy set theory. Though these three are unique in their functionality, they all commonly deal with vagueness. Thus an idea came up to combine them. The relationship among them was explored by Gorzalzany [21]. All those techniques are widely used in applications but they have their own limitations as described by Molodtsov [22]. Soft data mining in Bioinformatics is presented in figure4

Figure 4: Application of Soft Data Mining in Bioinformatics (Excerpt from [19])

Soft Set Based Techniques for Mining Uncertain Data

61

As can be seen in figure 4, the synergy between data mining and soft computing techniques can be called as soft data mining. The soft set concept is also used in the data mining of bioinformatics. As specified in [20] application of data mining bioinformatics revealed the difficulties such as vagueness, local optimality and intractability. The solution to overcome these drawbacks is to combine soft data mining techniques and apply to bioinformatics in order to achieve best results. Lashari et al. [23] applied soft set theory to classify sounds of musical instruments. They have done experiments and found the viability of soft set theory for this purpose. Their results revealed that soft set theory can be successfully used for the classification of musical instruments. Thus the soft set theory has got importance in decision making applications. They studied the impact of audio length and frame size on their classification system is visualized in figure4.

Figure 5: Impact of Audio Length and Frame Size (Excerpts from [23]) As can be seen in figure 5, the soft set theory can be applied to classify music instruments. Results reveal that data distribution vs. percentage of classification performance for both audio length and frame size. It does mean that the audio length and frame size of the audio files have their impact on the classification system using soft set. Jothi. G and Hannah Inbarani. H [24] proposed an unsupervised feature selection algorithm using soft set. Used soft set theory, the reduction of attributes is achieved. Various datasets are used to test the efficiency of the algorithm in terms of its speed and performance. The experiments were made on the datasets collected from UCI machine learning repository [26]. Soft set theory has been applied to medical data also. Kumar et al. [25] proposed a new approach for generating classification rules using a variant of soft set known as bijective soft sets. The algorithm takes dataset as input and generates a set of rules that help in decision making.

DISCUSSIONS
This section discusses about the real time usage of soft set theory. The soft set theory is now a part of soft computing. It is the new approach in data mining which deals with uncertain data. Experiments reveal that in some aspects the soft sets and rough sets are similar. Soft sets are mainly used to analyze huge amount of data and help in taking well informed decisions. Therefore soft sets can be used in real world decision support systems. Such systems can help management of an organization to take decisions that can result in expected results or profits. Soft set theory is one of the ingredients of the soft data mining. Experiments as shown in literature indicate that soft set theory can be used in tandem with other some computing techniques in order to produce highly effective results. Soft sets can also be used to achieve reducts that will help in better performance. Soft sets are being used in various applications such as classification of medical data, classification of musical instruments, evaluation of student makrs, better grouping of data etc. In many

62

BVST Sai

applications soft sets can be used to generate decision rules that can help to take well thought out decisions as the soft sets are capable of providing required business intelligence.

CONCLUSIONS
In this paper we present the importance of soft set theory and how it has been used in some real time applications. It focuses on the soft set theory, its comparison with other data mining and soft computing techniques. It also throws light into the effect of synergetic usage of multiple soft data mining techniques in order to achieve best results. The soft sets are used especially to deal with uncertain data. However, in bio informatics application only soft set theory proved inadequate but it can produce best results when used along with other soft computing techniques. This paper also observed the usage of soft sets in classification of musical instruments and also medical data for decision making.

REFERENCES
1. J.G. Shanahan, Soft Computing for Knowledge Discovery: Introducing Cartesian GranuleFeature, Kluwer Academic, Boston, MA, 2000. 2. S.K. Pal, A. Pal (Eds.), Pattern Recognition: From Classical to Modern Approaches, WorldScientific, Singapore, 2002. 3. Michael Chau, Reynold Cheng, and Ben Kao, Uncertain Data Mining: A New Research Direction in Proceedings of the Workshop on the Sciences of the Artificial, Hualien, Taiwan, December 7-8, 2005. 4. 5. Zadeh, L A (1965). Fuzzy set. Information and Control, 8, 338-353. Yang, X B, et al. (2009). Combination of interval -valued fuzzy set and soft set. Computers and Mathematics with Applications, 58, 521-527. 6. 7. 8. Zadeh, L A (1965). Fuzzy set. Information and Control, 8, 338-353. Daowu Pei and Duoqian Miao, From Soft Sets to Information Systems Samsiah Abdul Razak and Daud Mohamad, A Soft Set based Group Decision Making Method with Criteria Weight, World Academy of Science, Engineering and T echnology 58 2011. 9. B. Chetia and P. K. Das, Application of Vague Soft Sets in students evaluation. Advances in Applied Science Research, 2011, 2 (6):418-423. 10. Tutut Herawan, Rozaida Ghazali, Mustafa Mat Deris, Soft Set Theoretic Approach for Dimensiona lity Reduction, International Journal of Database Theory and Application Vol. 3, No. 2, June, 2010. 11. Maji, P.K., Roy, A.R., and Biswas, R. An application of soft sets in a decision making problem. Compututer and Mathematics with Application, 44, 2002. 12. Chen, D., Tsang, E.C.C., Yeung, D.S., and Wang, X. Some notes on the parameterization reduction of soft sets. Proceeding of International Conference on Machine Learning and Cybernetics, 3, 2003, IEEE Press, 1442 1445. 13. Kong, Z., Gao, L., Wang, L., and Li, S. The normal parameter reduction of soft setsand its algorithm. Computers and Mathematics with Applications 56, 2008, 3029 3037. 14. Zou, Y. and Xiao, Z. Data analysis approaches of soft sets under in complete information. Knowledge Based Systems, 21, 2008, 941945.

Soft Set Based Techniques for Mining Uncertain Data

63

15. Herawan, T. and Mustafa M.D. A direct proof of every rough set is a soft set.Proceeding of International Conference AMS 2009, IEEE Press, 119-124. 16. Pawlak, Z. Rough sets: A theoretical aspect of reasoning about data. KluwerAcademic Publisher , 1991. 17. Xun Ge and Songlin Yang, Investigations on some operations of soft sets, World Academy of Science, Engineering and Technology 75 2011. 18. Ahmad Nazari Mohd. Rose, Mohd Isa Awang, Hasni Hassan , Mustafa Mat Deris, Comparison of Techniques in Solving Incomplete Datasets in Softset, International Journal of Database Theory and Application Vol. 4, No. 3, September, 2011. 19. Vikram Rajpoot, Prof. Shailendra ku. Shrivastava, Prof. Abhishek Mathur, An Efficient Constraint Based Soft Set Approach for Association Rule Mining, International Journal of Engineering Research and Applications (IJERA). 20. Satya Ranjan Dash and Satchidananda Dehuri, A Conspectus of Soft Data Mining in Bioinformatics, CSI Communications, September 2012. 21. Gorzalzany, M B (1987). A met hod of inference in approximate reasoning based on interval-valued fuzzy sets Fuzzy Sets Syst., 21, 1-17. 22. Molodtsov, D (2004). The Theory of Soft Sets (in Russian), URSS Publishers, Moscow. 23. Saima Anwar Lashari, Rosziati Ibrahim and Norhalina Senan, Soft Set Theory for Automatic Classification of Traditional Pakistani Musical Instruments Sounds, 2012 International Conference on Computer & Information Science (ICCIS). 24. Jothi. G and Hannah Inbarani.H, Soft Set Based Quick Reduct Approach for Unsupervised Feature Selection, 2012 IEEE International Conference on Advanced Communication Control and Computing Technologies (ICACCCT). 25. S. Udhaya kumar, H. Hannah Inbarani and S. Senthil kumar, Bijective Soft set based Classification of Medical Data, Proceedings of the 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering (PRIME) February 21-22. 26. C. L. Blake and C. J. Merz, UCI Repository of machine learning databases. [Online]. Available: http://www.ics.uci. edul-m1earnl.

You might also like