Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

Social Spider Optimization, Colliding Bodies

Optimization and their Applications in


Data Clustering

Dr. Satyasai Jagannath Nanda, Senior Member IEEE


Dept. of Electronics & Communication Engineering
Malaviya National Institute of Technology Jaipur
Rajesthan-302017, India
Email : sjnanda.ece@mnit.ac.in, +91-9549654237
Supervised Learning

Supervised learning, you train the machine using data which is well "labeled."
It means some data is already tagged with the correct answer. It can be compared
to learning which takes place in the presence of a supervisor or a teacher.

Example : System Identification,


Communication Channel Equalization,
Time Series Prediction,
Multi Class Classification of dataset
Type of Supervised Learning

Classification: Regression:
Classification means to group the output Regression technique predicts a single
inside a class. If the algorithm tries to output value using training data.
label input into two distinct classes, it is Example: You can use regression to predict
called binary classification. Selecting the house price from training data. The
between more than two classes is input variables will be locality, size of a
referred to as multiclass classification. house, etc.
Example: Determining whether or not
someone will be a defaulter of the loan.
Unsupervised Learning
Unsupervised learning is a machine learning technique, where you do not need to
supervise the model. The model is allowed to work on its own to discover information.
Importance :
 Unsupervised machine learning finds all kind of unknown patterns in data.
 Unsupervised methods help you to find features which can be useful for categorization.
 It is taken place in real time, so all the input data to be analyzed and labeled in the
presence of learners.
 It is easier to get unlabeled data from a computer than labeled data, which needs
manual intervention.
Clustering :
Supervised / Unsupervised Learning
Parameters Supervised machine learning Unsupervised machine learning
technique technique
Process In a supervised learning model, input and In unsupervised learning model, only
output variables will be given. input data will be given
Input Data Algorithms are trained using labeled data. Algorithms are used against data which is
not labeled
Algorithms Used Support vector machine, Neural network, Unsupervised algorithms can be divided
Linear and logistics regression, random into different categories: like Cluster
forest, and Classification trees. algorithms, K-means, Hierarchical
clustering, etc.
Computational Supervised learning is a simpler method. Unsupervised learning is computationally
Complexity complex
Use of Data Supervised learning model uses training Unsupervised learning does not use
data to learn a link between the input and output data.
the outputs.
Accuracy of Highly accurate and trustworthy method. Less accurate and trustworthy method.
Results
Real Time Learning method takes place offline. Learning method takes place in real time.
Learning
Data Clustering
Data Clustering refers to the unsupervised classification of a dataset into
groups, such that elements within a group are similar to each other.
Example : IRIS Flower Data Set
 150 Observations (50 Observations for each flower)
 4 attributes (Sepal Length, Sepal Width, Petal length, Petal Width)

7 7

6 6
Setosa
5

Petal Length
5
Petal Length

4 4

3 3

2 2
Versicolor
1
1
5
5
8
8 4
4 7
7
3 6
3 6
5
5
Sepal Length 2 4
Sepal Length 2 4 Sepal Width
Versinica Sepal Width

3D Plot of unlabeled IRIS dataset 3D Plot after cluster analysis


Data Clustering

Hierarchical Clustering Fuzzy Clustering Partitional Clustering


Dataset is decomposed into Each element of a Dataset is divided into a
several groups and output is dataset belongs to all number of partitions based
expressed in the form of the groups with a fuzzy upon certain criterion
Dendogram plot. membership grade. (fitness measure)
Effectiveness
Low Dimensional Dataset Overlapping Dataset High Dimensional Datasets
Challenges
 Extensive computation to  De-fuzzification  Will explain in-details
create dendogram. process at times with the progress in this
 Ineffective for overlapping introduces more presentation.
dataset complexity
Popular Algorithms
 Agglomerative algorithms  Fuzzy C-Means (FCM)  K-Means
 Divisive algorithms  Fuzzy C-Shells (FCS)  K-Medoids
Data Clustering

Hierarchical Clustering Fuzzy Clustering Partitional Clustering


Dataset is decomposed into Each element of a Dataset is divided into a
several groups and output is dataset belongs to all number of partitions based
expressed in the form of the groups with a fuzzy upon certain criterion
Dendogram plot. membership grade. (fitness measure)
Effectiveness
Low Dimensional Dataset Overlapping Dataset High Dimensional Datasets
Challenges
 Extensive computation to  De-fuzzification  Will explain in-details
create dendogram. process at times with the progress in this
 Ineffective for overlapping introduces more presentation.
dataset complexity
Popular Algorithms
 Agglomerative algorithms  Fuzzy C-Means (FCM)  K-Means
 Divisive algorithms  Fuzzy C-Shells (FCS)  K-Medoids
Partitional Clustering
 Given a dataset

 It represents N patterns each having D attributes (also called features)


 Objective : Partition the dataset into M groups (Clusters) such that

 Every cluster must contain at least one pattern

 There should not be common patterns between any two clusters

 Dataset consists of sum of all the cluster


Partitional Clustering
 K-means based Partitional clustering (Loyad-1963, Bell Lab)
 Uses minimization of Eucliden distance as the similarity measure
 Effective for circular/spherical shape cluster
 Density based based Partitional clustering (Easter et al. - 1996)
 Formulate smaller clusters based on the density of data in a region
 Merge the small clusters to formulate large clusters
 Effective for arbitrary shape clusters

Spherical shape cluster Arbitrary shape cluster Problem


Partitional Clustering
Nature Inspired
Algorithms (Next slide)
K-means based algorithms
Density based algorithms
Algorithms Year Researcher
Algorithms Year Researcher
K-Means 1963 L Loyad
DBSCAN 1996 Ester et al.
K-nearest neighbor 1967 Cover and Hart
(K-NN) Incremental 1998 Ester et al.
DBSCAN
K-medoids 1987 Kaufman
Parallel 1999 Xu et. al.
Bisecting K-means 2000 Steinbach DBSCAN
X-means 2000 Pelleg BRIDGE 2001 Dash et. al.
K-harmonic means 2001 B. Zhang DBCluC 2002 Zaiane and lee
K-modes 2001 Chaturvedi ADBC 2004 Ma and Zhan
Sort-means 2002 Phillips ST-DBSCAN 2007 Birant
Fast K-medoids 2009 Park and Jun VDBSCAN 2007 P Liu et al.
Parallel K-means 2009 Zhao et al. DVBSCAN 2010 Ram et al.
Distributed K- 2011 Forero et. al. Corr_DBSCA 2013 Nanda and Panda
means N
K-Means Clustering
K-Means Clustering
K-Means Clustering

Step 0: Input Data as a D dimensional Vector.


Step 1: K Vectors are randomly assigned to be the Centroids of K Clusters.

Step 2: each input Vector is assigned to its nearest Cluster/ Centroid.

Manhattan Euclidian
K-Means Clustering
Step 3: Update the centroid of each Cluster.

Step 4 : Check whether iteration should stop or not?, as per Total distortion

Step 5: Output Yi can be the nearest centroid of the cluster containing Xi.
Data Clustering as an Optimization Problem
Nature Inspired Algorithms for Clustering

Type Single Objective Multi- Objective


Algorithm Inventor Clustering Algorithm Clustering
Evolutionary Genetic Algorithm (GA) Holland-1975 Bezdek-1994 NSGA-II Bandyopadhyay-
Algorithms 2007
Evolutionary Strategy (ES) Rechenberg- Babu & Murty- MO-ES Costa-2002
1989 1994
Genetic Programming (GP) Koza-1992 Falco-2006 MO-GP Coelho-2010
Differential Evolution (DE) Storn-1997 Paterlini-2004 MO-DE Suresh-2009

Physical Simulated Annealing (SA) Kirkpatrick- Selim -1991 MO-SA Bandyopadhyay-


Algorithms 1983 2008
Memetic Algorithm (MA) Moscato-1989 P. Merz-2003 MO-MA Knowles-2000

Harmony Search (HS) Geem-2001 Mahdavi-2008 MO-HS Ricart-2011

Shuffled Frog-Leaping Eusuff-2006 Amiri-2009 MO-SFL J. Liu-2012


Algorithm (SFL)
Swarm Ant Colony Opt. (ACO) Dorigo-1992 Shelokar-2004 MO-ACO Santos-2009
Intelligence
Particle Swarm Opt. (PSO) Kennedy-1995 Omran-2002 MO-PSO S. Das-2008
Nature Inspired Algorithms for Clustering
Type Single Objective Multi- Objective
Algorithm Inventor Clustering Algorithm Clustering
Swarm Artificial Bee Colony (ABC) Basturk-2006 Zhang-2010 MO-ABC Akbari-2012
Intelligence
Fish Swarm Algorithm(FSA) Li et al.-2002 Cheng-2009 MO-FSA Jiang-2011
Bio-inspired Artificial Immune System(AIS) Charsto-2002 Nasraoui-03 MO-AIS Gong-2007
Algorithms
Bacterial Foraging Opt. (BFO) Passino-2002 Wan-2011 MO-BFO Niu-2012
Krill herd Algorithm Gandomia-12
Cat Swarm Opt. (CSO) S C Chu-2006 Santosa-2009 MO-CSO Pradhan-2012
Other Nature
Cuckoo Search Algo.(CSA) X.S. Yang- 2009 Goel-2011 MO-CSA X.S. Yang- 2012
Inspired
Algorithms Firefly Algorithm (FA) X.S. Yang- 2009 Senthilnath- MO-FA X.S. Yang- 2012
2011
Invasive Weed Optimization Mehrabian- Chowdhury- MO-IWO R.Liu-2012
Algo. (IWO) 2006 2011
Gravitational Search (GS) Rashedi-2007 Hatamlou- MO-GS Hassanzadeh-
2011 2010
Bat-inspired Algorithm X.S. Yang- 2010 MO-BAT X.S. Yang- 2009
Steps of Benchmark Nature Inspired Algorithms
Fitness Evaluation/Cluster Validity Index
 Cluster validity indices represent statistical functions used for quantitative evaluation
of the clusters derived from a dataset (can also can be used as fitness functions)

Statistical Function Nature for Used in Research Papers


Clustering
Medoid Distance Minimization Lucasius -1993, Sheng & Liu-2004
Centroid Distance Minimization Murty-1996, Maulik -2000, Zhang-2011
Distortion Distance Minimization Krishna & Murty-1999, Kivijarvi-2003, Lu-2004
Variance Ratio Criterion Maximization Cowgill-1999, Casillas-2003
Intra and Inter Cluster Min/Max Tseng & Yang- 2001, Das- 2008, Nanda-2012
Distance depend on ratio Handl and Knowles- 2004, 2007, 2010
Dunn's index Maximization Dunn-1974, Zhang and Cao-2011
Davis-Bouldin index Minimization Davis & Bouldin-1974, Bandyopadhyay-2002
CS Measure Minimization Chou-2004, Das-2008
Silhouette Index Maximization Kaufman-2008, Hruschka-2009
Social Spider Algorithm

 Swarm based approach


 Search space : Communal Web
 Two different agents: Male and Female spider
 Female biased populations
 Solution : position of the spider
 Fitness : Weight of spider
 Communication : Web
Social Spider Algorithm

Swarm of spiders Baby spiders

Dominant Male

Female spider

Non-Dominant
Male
Social Spider Algorithm

1. 60–90% of the total spider population is composed of females.

2. The swarm size is constant.

3. Fitness of each spider is imitated in terms of their weight.

4. The communication is designed with help of vibration on the web.

5. The Percentage Factor(PF) determines the female socialization behavior and it

is a constant with a value of 0.7.

6. Mating is carried out when a dominant male is successfully able to locate a

group of female in his mating radius within search space.


Social Spider Algorithm
1. Population Initialization
 P= Total number of spiders
 NF = floor(0.9 - rand x (0.25)) x P
 NM=P-NF NF= no of females NM= no of males
 T is the dimensionality
2. Fitness Evaluation

Minimization of Euclidean Distance between each data point to cluster


center, where the spider position represents the cluster nodes
Social Spider Algorithm
3. Assigning Weight to the Spiders

Maximum weight is the best spider in the colony

4. Male population fragmentation


Communication Among Spiders
Vibrations received by ith spider from
Vibrations received by ith spider from
the spider having highest weight
the nearest (or local spider)
among all (or global spider)
Communication Among Spiders

Vibrations received by ith dominant male spider from the nearest female spider
5.Female Moments

Exploration
6. Dominant Male Movements
Mating
6. Non-Dominant Male Movements
Exploitation
Step 7. Mating T2

T1
8. Mating for T1 set

Spider Position Weight P(si)


S1 M1 0.2 0.4 0.26
S2 F1 0.45 0.3 0.19
S3 F2 0.1 0.5 0.36
S4 F3 0.02 0.29 0.19
1.490
Snew 0.1 0.5

M1
26% F1
19%
F1
19% F2
36%
Original Social Spider Algorithm
START

INIT POP
&
DATA-BASE PARA TEMP
STORAGE

FITNESS
ANALYSIS F1<F2
F2 STOP

If stopping
FINAL POP criterion
VIBRATION reached
DESIGN AFTER
GEN-1

FEMALE
UPDATE
NEW POP MATING

DOM-MALE
UPDATE

NON-DOM
MALE
UPDATE
Parallel Social Spider Algorithm
START

INIT POP
&
DATA-BASE PARA TEMP
STORAGE

FITNESS
ANALYSIS F1<F2
F2 STOP

If stopping
FINAL POP criterion
VIBRATION reached
DESIGN AFTER
GEN-1

NON-DOM
FEMALE DOM-MALE
MALE
UPDATE UPDATE
UPDATE
NEW POP

MATING
Benchmark Datasets Used for Analysis
Small Dimensional
Sr.no Data Set Observation Attributes Class Train Test
1 Iris 150 4 3 112 38
2 Balance 625 4 3 469 156
3 Wine 178 13 3 133 45
4 Breast Cancer 699 10 2 524 175
5 Thyroid 215 5 3 162 53
6 Credit 690 14 2 518 172
7 Heart 303 14 5 227 76
8 Glass 214 10 7 161 53
9 Horse 364 28 3 273 91

Labeled High Dimensional


Sr.no Data Set Observation Attributes Class
1 DIM1 1024 1024 16
2 DIM2 1024 32 16
3 DIM3 1024 512 16
4 KDD CUP04 BIO 145741 74 10
Results Analysis
No of iteration :1000
Independent runs: 10
Evaluation Metric :CEP (Classification error percentage) Rate of miss-classification

Sr.no Data Set SSO ABC [1] PSO [2] RGA k-mean
1 Iris 0 0 2.63 0 0
2 Balance 30.12 15.38 25.47 28.20 32.69
3 Wine 0 0 2.22 22.72 15.90
4 Breast Cancer 1.32 0 2.87 25.24 41.37
5 Thyroid 3.77 3.77 5.55 18.86 5.66
6 Credit 21.25 13.37 22.96 20.26 30.81
7 Heart 15.66 14.47 17.46 55.21 56.00
8 Glass 32.07 41.50 39.05 52.63 50.94
9 Horse 36 38.26 40.98 30.12 29.33

[1] D. Karaboga, C. Ozturk, A novel clustering approach: Artificial bee colony (abc) algorithm,
Applied soft computing 11 (1) (2011) 652–657.
[2] I. De Falco, A. Della Cioppa, E. Tarantino, Facing classification problems with particle
swarm optimization, Applied Soft Computing 7 (3) (2007) 652–658.
KDD CUP04 BIO Dataset with 10 Clusters
Computational Time Analysis
Sr.no Parameter Setting Value
1 Pop size 50
2 No of independent iteration 10
3 Mating probability 75%

P-SSO O-SSO
Sr.no Dataset Time CEP Times CEP
(sec) (sec)
1 Statlog 25.0623 7.3055 44.4226 8.455
2 Handwritten Digit 25.0435 2.7988 35.3781 3.5313
3 Dermatology 1.4142 13.3880 1.626 14.235
4 Mice 5.0680 6.3889 7.6621 13.250
Ex. Analysis on Flood Affected Area of Chennai

Image Captured on Oct. 21, 2015 Water retention before flood

Image Captured on Dec. 08, 2015 Water retention after flood


Publication & References

1) U. P. Shukla and S. J. Nanda, Parallel social spider clustering algorithm for high
dimensional datasets. Engineering Applications of Artificial Intelligence; Elsevier,
vol. 56, pp : 75-90, 2016.
2) U. P. Shukla and S. J. Nanda, A Binary Social Spider Optimization Algorithm for
Unsupervised Band Selection in Compressed Hyperspectral Images, Expert Systems
with Applications, Elsevier, vol. 97, pp: 336-356, 2018.
3) U. P. Shukla and S. J. Nanda. "Dynamic clustering with binary social spider algorithm
for streaming dataset." Soft Computing , Springer, vol. 23, no. 21 , pp : 10717-10737,
2019.
4) U. P. Shukla and S. J. Nanda. "Designing of a Risk Assessment Model for Issuing Credit
Card Using Parallel Social Spider Algorithm." Applied Artificial Intelligence, Taylor &
Francis, vol. 33, no. 3, pp: 191-207, 2019.
5) R. Gupta, S. J. Nanda, and U. P. Shukla. "Cloud detection in satellite images using
multi-objective social spider optimization." Applied Soft Computing, Elsevier, vol. 79
pp : 203-226, 2019.
6) S. Aggarwal, P. Chatterjee, R. P. Bhagat, K. K. Purbey, S. J. Nanda, A Social Spider
Optimization Algorithm with Chaotic Initialization for Robust Clustering, Procedia
Computer Science, Elsevier, 2019.
Colliding Bodies Optimization

A. Kaveh and V. Mahdavi, “Colliding bodies optimization: a novel meta-heuristic method,”


Computers & Structures, Elsevier, vol. 139, pp. 18–27, 2014.

• Inspired by physical phenomenon of collision between bodies which


move them to a lower energy level in the search space.
• Parameter free algorithm (no tuning of parameter is required).
• Consist of simple equations which govern the velocity and position
update thus computational complexity involve is very low.

BOOK : A. Kaveh, V. R. Mahdavi, Colliding Bodies Optimization: Extensions and Applications,


Springer Verlag, Switzerland, 2015.
Colliding Bodies Optimization
Moving Body

Local Optima ->

Stationary
Body

Global Optima
Minimum Energy Level

Collision based on Coefficient of Restitution ϵ


Colliding Bodies Optimization (CBO)

Conservation of Momentum m1v1  m2 v2  m1v1'  m2 v2'


1 1 1 1
Conservation of Kinetic Energy m1v1  m2 v 2  m1v1  m2 v 2' 2  Q
2 2 '2

2 2 2 2
New velocities after collision

( m1  m2 )v1  ( m2  m2 )v2 (m2  m1 )v2  (m1  m1 )v1
v 
'
v 
'

m1  m2 m1  m2
1 2

Where Coefficient of Restitution (CR) is 


Colliding Bodies Optimization (CBO)

 Step 1 : Initialize the positions of colliding bodies :


In CBO each solution xi is consider as colliding body. The random initialization of initial
position of K number of colliding bodies are determined with

xi  xmin  randxmax  xmin , i  1,2 ,...,K


 Step 2 : Evaluate the fitness of Colliding Bodies
Calculate the fitness of each bodies f(xi) and short them by best fitness value.
 Step 3 : Calculate the mass of Colliding Bodies
1
For minimization problem :
fit i 
mi  K , i  1,2 , ..., K
1
i 1 fit i 

1
For maximization problem is replaced by fit i  .
fit i 

BOOK : A. Kaveh, V. R. Mahdavi, Colliding Bodies Optimization: Extensions and Applications,


Springer Verlag, Switzerland, 2015.
Colliding Bodies Optimization (CBO)

Step 4 : Initial velocities of Colliding Bodies :

Initialize half of the most fittest bodies as stationary bodies and others are moving bodies

Stationary CBs : Moving CBs :


K vi  xi  x ;
vi  0 ; i  1,2 ,..., i
K
2
2
K K
i   1,  2, ..., K
2 2
Step 5: Velocities of Bodies after collision :
 
 m K  m K v K
Stationary CBs :  i i  i
vi 
'  2 2  2
; i  1,2,...,
K
mi  m K 2
i
2

 
Moving CBs :  mi  m K vi
 i 
vi'   2  K K
; i   1,  2,..., K
mi  m K 2 2
i
2
Colliding Bodies Optimization (CBO)
• Step 6 : Update the Positions of Colliding Bodies :
K
Stationary CBs : i
new
x  x  rand * v
i
'
i
i  1,2 , ...,
2
K K
Moving CBs : x new
i  xi  K / 2  rand * v '
i
i   1,  2 , ..., K
2 2
Where rand is a random number uniformly distributed in the range (-1,1).

Step 7 : Calculation of Coefficient of Restitution :

We consider COR decreases linearly to zero and defined as

iter
  1
itermax
Step 8 : Termination Criteria

The Steps 2-8 are repeated until a termination criterion such as maximum number of
iteration or get a sufficiently good fitness value is achieved.
Selected Journal Publications
Sl. Authors Title Name of Journal Vol. Page Year Impact
Factor
1 R. Gupta, Improved framework of IET Image Processing 14 1495- 2020 1.995
S. J. Nanda many‐objective evolutionary 4807
algorithm to handle cloud
detection problem in satellite
imagery
3 R. K. Vijay, Shared Nearest Neighborhood IEEE Journal of 12 1619 - 2019 3.392
S. J. Nanda Intensity Based Declustering Selected Topics in (5) 1627
Model for Analysis of Spatio- Applied Earth
Temporal Seismicity Observations and
Remote Sensing
4 R. Gupta, Cloud detection in satellite Applied Soft 79 203-226 2019 4.873
S. J. Nanda, images using multi-objective Computing, Elsevier
U. P. Shukla social spider optimization
5 U. P. Shukla, A Binary Social Spider Expert Systems with 97 336-356 2018 4.577
S. J. Nanda Optimization Algorithm for Applications,
Unsupervised Band Selection Elsevier
in Compressed Hyper. Images
Selected International Conference Proceedings:
Sl. Authors Title Name of Conference Place Year
1 R. Gupta, A Binary NSGA-III for Unsupervised IEEE Congress on Wellington, 2019
S. J. Nanda Band Selection in Hyper-spectral Evolutionary Computation New Zealand
Satellite Images (IEEE-CEC-2019)
2 P. Khare, Clustering Networks based on IEEE Congress on Wellington, 2019
S. J. Nanda, Physarum Optimization for Seismic Evolutionary Computation New Zealand
R. K. Vijay Catalogs Analysis (IEEE-CEC-2019)
3 D. K. Point Symmetry Distance based K- IEEE Int. Conf. on Systems, Bari, Italy 2019
Kotary, Means Algorithm for Distributed Man and Cybernetics
S. J. Nanda Clustering in Peer to Peer Networks (IEEE-SMC-2019)
4 R. K. Vijay, De-clustering of an earthquake catalog IEEE Congress on San 2017
S. J. Nanda based on ergodicity using parallel Grey Evolutionary Computation Sebastian,
Wolf Optimization (IEEE-CEC-2017) Spain
5 I. Kumawat, Multi-objective whale optimization IEEE Region 10 Conference, Penang, 2017
S. J. Nanda (IEEE-TENCON-2017) Malaysia

You might also like