This document proposes a new clustering method based on the Inclined Planes system Optimization algorithm. It begins with an introduction to data clustering and heuristic algorithms inspired by natural phenomena. It then describes the Inclined Planes system Optimization algorithm, which simulates the motion of balls sliding down an inclined plane. Next, it provides a brief explanation of supervised data clustering. Finally, it discusses applying the new method to standard benchmark datasets and comparing its performance to similar clustering algorithms.
This document proposes a new clustering method based on the Inclined Planes system Optimization algorithm. It begins with an introduction to data clustering and heuristic algorithms inspired by natural phenomena. It then describes the Inclined Planes system Optimization algorithm, which simulates the motion of balls sliding down an inclined plane. Next, it provides a brief explanation of supervised data clustering. Finally, it discusses applying the new method to standard benchmark datasets and comparing its performance to similar clustering algorithms.
This document proposes a new clustering method based on the Inclined Planes system Optimization algorithm. It begins with an introduction to data clustering and heuristic algorithms inspired by natural phenomena. It then describes the Inclined Planes system Optimization algorithm, which simulates the motion of balls sliding down an inclined plane. Next, it provides a brief explanation of supervised data clustering. Finally, it discusses applying the new method to standard benchmark datasets and comparing its performance to similar clustering algorithms.
Mohammad Hamed Mozaffari, Hamed Abdi, Hamid Zahiri
clusters is known and was selected by user but unsupervised
Abstract—Data-mining is a branch of science which tends to clustering methods reach to the best number of clusters extract a series of futures and some meaningful information themself automatically. from a huge database in proper time and cost. Clustering is one The purpose of the heuristic algorithms is to reach of the popular methods in this field. The purpose of clustering is to use a database and group together its items with similar optimum solution of an optimization problem by inspiration characteristics. Application of clustering in many fields of of natural and physical phenomena. Heuristic algorithms science and engineering problems like Pattern recognition, data were introduced in the last decades in various forms. New retrieval, bio-informatics, machine learning and the Internet clustering methods are aimed at the compilation of past cause to have significantly developed in the last decades. A methods and heuristic algorithms. rapid growth in the volume of information in databases revealed The most popular and famous of these algorithms are weakness of traditional methods like K-means in facing with huge data. In this paper a new clustering method based on the Genetic Algorithm (GA) [1], Simulated Annealing (SA) [2], Inclined Planes system Optimization algorithm was proposed Harmony Search (HS) [3], Artificial Immune System (AIS) and evaluate on a series of standard datasets. Comparison study [4], Ant Colony Optimization (ACO) [5], and Particle Swarm revealed a significant superiority over other similar clustering Optimization (PSO) [6]. algorithms. Genetic Algorithm was formed by observing the laws of natural selection and genetics, based on the Darwin’s theory Index Terms – Heuristic Algorithms, Inclined Planes system of evolution. Simulated Annealing was designed by using the Optimization, Clustering, Data mining. process of annealing in metallurgy. Harmony Search algorithm mimics musician’s behaviors in the process of I. INTRODUCTION improvisation. Artificial Immune System was inspired by the biological immune system. Ant Colony Optimization was S olving real life problems by using natural computing methods which are inspired from nature and physics simulated from the foraging behavior of real ants when they searched for food and Particle Swarm Optimization was phenomena is a novel approach recognized in the last developed by simulating the social behavior in flock of birds decades. when migrating. Stochastic behavior and using randomized Data clustering is a data-mining method which is used in phenomena is a usual strategy for these algorithms to many fields of science for finding similar items in datasets for simulate natural characteristics, similar to their actual pattern many purposes like finding a word in the internet database, while in some other algorithms like Central Force diagnose of cancer type, gene extraction in bio-information Optimization (CFO), there is no randomization. CFO is a and etc. deterministic heuristic algorithm based on the metaphor of Purpose of clustering methods is to use a database and gravitational kinematics [7]. cluster its items to number of groups that each group's In this study new heuristic algorithm, Inclined Planes (cluster's) items have similar characteristics with each other system Optimization was used for solving supervised (Similarity) and with others groups have lowest similarity clustering problem [8]. (Dissimilarity). Data features (mostly distance between items) Performance of the proposed method tested on numbers of are used for measuring similarity and dissimilarity by using a standard benchmarks. Comparison study with similar criterion function. Clustering methods are divided in two methods, revealed reliability and powerfulness of this method category, Supervised and Unsupervised clustering. In for clustering problem. supervised clustering methods like Kmeans, number of The rest of this paper is organized in 5 sections. IPO algorithm is demonstrated in section 2, In Section 3 will M.H.Mozaffari, H. Abdi and H. Zahiri are with the Department of Electrical offer a brief explain about supervised data clustering while Engineering, University of Birjand, Birjand, Iran (e-mails: hamed.mozaffari2@yahoo.com, hamed.abdy@gmail.com and experimental results on data is provided in section 4. Finally shzahiri@yahoo.com) section 5 concludes the paper. II. INCLINED PLANES SYSTEM OPTIMIZATION (IPO) xid t 1 k1.rand1.aid t .t 2 The method of IPO is design based on the dynamic of (5) k2 .rand2 .vid t .t xid t sliding motion along a frictionless inclined surface. In this d algorithm each agent, named “tiny ball” (similar to the vid t xbest t xid t (6) particles in PSO or ants in ACO) searching the problem t space to find the optimum solution. In nature and on surface of the Earth, when an object (here, tiny ball) is elevated, it loses potential energy and automatically goes to lower elevation levels. These III. PRINCIPLE OF CLUSTERING phenomena in Physic named as gravitational force which Clustering techniques defines as a problem solving apply to any objects by the Earth. method with capacity to divide number of items in a feature In IPO algorithm, each ball has 3 specifications: position space to number of groups where items in each group would ( x ), height ( h ) and angels ( ) in relation to other balls. have the most similar characteristic with each other while Ball's positions are feasible solutions for the problem and items in two different groups would have the least similarity fitness function is used to calculate the height of each ball. with each other [9]. One dataset with n data which each one For clear explanation assume a system by N ball as bellow: is in d dimension, is defined as O O1 , O2 , , On T where ximin xi ximax , 1 i N d (1) Where xi is a decision variable, i is the dimension and Oi o1i , oi2 , , oid is i-th data vector and oij is the j-th feature of i-th data. Thus On d is data matrix in n row and d N d number of dimension. column. Clustering methods tend to group dataset items to k cluster G C1 , C2 , , Ck where clusters provide the following conditions: Ci , i 1,2, , k (7) Ci C j , i, j 1,2, , k , i j (8) k Fig. 1. An example of search space with 3 balls Ci O (9) i 1 An example of search space in IPO algorithm was shown One of the most important components of clustering in Fig. 1) With 3 balls. Purpose of IPO is to find minimum of algorithm is a criterion used to find the best cluster centers fitness function f(x1,x2,…,xNd) which defined on the problem with a proper fitness. For this reason various validity space. Angle between the i-th ball and j-th one at interval t is functions and criteria are designed. In this paper Mean- calculated in the following equations where, fi(t) is the value square quantization error (MSE) was used as a fitness of fitness function (height) for the i-th ball in time t. function for IPO algorithm and defined as equation (10) f j t f i t where Oi Z l is distance between data (Oi) and the cluster ijd t tan 1 d , d center (Zl). In this study, Euclidean distance is used for xi t x j t (2) distance measure as equation (11). for d 1, , n and i, j 1,2, , N , i j n The acceleration amount and direction are calculated using J (O, G ) Min Oi Z l i 1 2 , l 1,2,, k (10) the equation bellow: N d aid t U f j t fi t . sin ijd t (3) L(Oi , O j ) Oim O mj 2 (11) j 1 m 1 Where U(.) is the unit step function: In the proposed method each ball contains k cluster 1 w 0 centroids in d dimension. Fitness of Balls in every iteration U w (4) evaluates by MSE and better balls move toward the better 0 w 0 ones. An example of ball in IPO clustering method is In each iteration of IPO algorithm, position of each ball is illustrated bellow: updated by using equation (5) which k1 and k2 are two Z1 Zk constant, rand1 and rand2 are two random variable in range [0,1] and vid t is i-th ball velocity in dimension d in time t. z1 z1 z1 z k z k z kd 1 2 d 1 2
Where is k-th cluster center which is in d dimension [8, 9].
In equation (6) xbest is the best ball position until current iteration [8]. IV. EXPERIMENTAL RESULTS exception. So the proposed method provides a powerful To evaluate the proposed method performance, we have indication for its better speed than other methods. used and tested on 4 well-known standard benchmarks. To V. CONCLUSION compare the data clustering results, four other algorithm results were used such as Particle Swarm Optimization This study investigated the power of Inclined Planes (PSO), Gravitational Search Algorithm (GSA) [10], Genetic system Optimization algorithm in counter with supervised Algorithm (GA) and Kmeans [11]. Results are average of 20 data clustering problem. The proposed results on four times run of algorithm which was shown in table (1) and standard datasets revealed the powerfulness and reliability of contain of the best fitness, average fitness and the worst IPO-clustering algorithm. Simulation results show that the fitness of the algorithms. Standard datasets which are used in proposed method can find the solution in proper time and the proposed method include: Iris, Wine, CMC and Cancer higher quality than other similar techniques. [8]. 1- Iris: this is perhaps the most famous dataset in literature and in the field of clustering and was collected by Anderson VI. REFERENCES (1935). The Iris dataset consists of 150 instances with four [1] E K.S .Tang, K.F.Man, S.Kwong and Q.He, " Genetic algorithms and numeric features and contains three classes of 50 instances their applications", IEEE Signal Processing Magazine 13 (6), 1996. [2] S.Kirkpatrick, , C.D.Gelatto and M.P.Vecchi, "Optimization by simulated each, where each class refers to a type of iris plant. annealing", Science 220 (4598), 671–680. 1983. 2- Wine: There are 178 instances in the Wine dataset, [3] Geem, Z., Kim, J., & Loganathan, G. (2001). A New Heuristic characterized by 13 numeric features. The features explain in Optimization Algorithm: Harmony Search. Simulation, 60-68. [4] J.D. Farmer, N.H.Packard and , A.S.Perelson," The immune system, the chemical analysis of three types of wine. There are three adaptation, and machine learning", Physica D 22, 187–204. 1986. categories of data: 59 objects in class 1, 71 objects in class 2, [5] M.Dorigo, V.Maniezzo, and A.Colorni, "The Ant System: optimization and 48 objects in class 3. by a colony of cooperating agents.", IEEE Transaction on systems, Man, and Cybernetics-part B, vol. 26, no.1, 1996, pp. 1-13. 3- Wisconsin Breast Cancer: There are 683 instances with 9 [6] J.Kennedy and R.C.Eberhart, “Particle swarm optimization”, numeric features in this dataset. This dataset consists of 444 Proceedings of IEEE International Conference on Neural Networks, vol. objects in class 1 (malignant) and 239 objects in class 2 4,1995. [7] R.A. Formato, Central force optimization: a new nature inspired (benign). computational framework for multidimensional search and optimization, 4- Contraceptive Method Choice (CMC): this dataset consists Studies in Computational Intelligence 129 (2008) 221–238. of 1473 samples, including 3 classes. Samples characterized [8] Hamed Mozaffari, Hamed Abdi, Seyed-Hamid Zahiri: An inclined planes system optimization algorithm, Information Science Elsevier Journal by 9 features. There are 629 instances in class 1, 334 (submitted) instances in class 2 and 510 instances in class 3. [9] S.H. Zahiri, "Introduction of A novel unsupervised clustering method Problem setup for IPO-clustering parameters is C1=C2=1, based on Artificial Immune System", Iranian journals of computer and shift1=shift2=100, scale1=scale2=0.02 and number of iteration electrical engineering, year 6, vol 2, summer 2008 [10] Abdolreza Hatamlou: Application of Gravitational Search Algorithm on is 500 for the IPO algorithm. K1 and k2 are variable with time Data Clustering, Springer-Verlag Berlin Heidelberg 2011 and was defined as bellow: c1 k1 t (12) 1 expt shift1 scale1 c2 k2 t (13) 1 expt shift2 scale2
T ABLE 1. RESULTS OF THE IPO-CLUSTERING ALGORITHM AND COMPARISON
WITH OTHER SIMILAR METHODS Data Result Kmeans GA PSO GSA IPO type type Iris Best 97.33 113.98 96.89 96.69 96.65 Average 106.05 125.19 97.23 96.72 96.65 Worst 120.45 139.77 97.89 96.76 96.67 Wine Best 16555.6 16530.5 16345.9 16315.3 16302.4 Average 18061.0 16530.5 16417.4 16376.6 16301.2 Worst 18563.12 16530.5 16562.3 16425.5 16304.3 CMC Best 5842.2 5705.6 5700.9 5698.1 5693.7 Average 5893.6 5756.5 5820.9 5699.8 5693.7 Worst 5934.4 5812.6 5923.2 5702.0 5693.8 Cancer Best 2999.1 2999.3 2973.5 2967.9 2964.4 Average 3251.2 3249.4 3050.0 2973.5 2964.3 Worst 3521.5 3427.4 3318.8 2990.8 2965
In Table 1, IPO clustering method results from each datasets
shows better results than other methods with no any