Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

2013 NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2013)

Hardware-based Parallel Firefly Algorithm for


Embedded Applications

Daniel M. Muñoz∗ , Carlos H. Llanos† , Leandro dos Santos Coelho‡ and Mauricio Ayala-Rincón§
∗ Electronics Engineering Graduate Program
Faculty of Gama, University of Brasilia, Brasilia, DF, Brazil
Email: damuz@unb.br
† Department of Mechanical Engineering, Automation and Control Group/GRACO

University of Brasilia, Brasilia, DF, Brazil


Email: llanos@unb.br
‡ Industrial and Systems Engineering Graduate Program, LAS/PPGEPS

Pontifical Catholic University of Parana, Curitiba, Parana, Brazil


Email: leandro.coelho@pucpr.br
§ Departments of Computer Science and Mathematics

University of Brasilia, Brasilia, DF, Brazil


Email: ayala@unb.br

Abstract—The firefly algorithm (FA) is a new population- adaptive filters, adaptive control, parameter estimation, mobile
based metaheuristic bioinspired on the behavior of the flashing robotics, among others.
characteristics of fireflies. As a population-based algorithm, the
FA suffers from large execution times specifically for embedded Field Programmable Gate Arrays (FPGAs) devices are a
optimization problems with computational limitations. For reduc- suitable solution for exploring the parallel capabilities of the
ing execution times we propose a hardware parallel architecture FA when applied to embedded optimization problems. In this
of the FA algorithm that facilitates the implementation in Field context, the implementation of the FA algorithm on FPGAs
Programmable Gate Arrays (FPGAs). In addition, this work devices can provide good performance hardware solutions
proposes the application of the opposition-based learning (OBL) operating with slow clock frequencies, allowing the imple-
approach to the FA algorithm. The respective hardware imple-
mentation (HPOFA) was mapped into a Virtex5 FPGA device
mentations to save energy consumption in comparison with
and numerical experiments using four well-known benchmark common Desktop solutions.
problems demonstrate that the opposition-based approach allows In order to improve the quality of the solutions, in terms
the FA algorithm to improve its functionality, preserving the
swarm diversity and avoiding the premature convergence problem.
of the approximation to the optimal solution, several adaptive
Synthesis results point out that the HPOFA architecture is techniques have been applied to the population-based algo-
effectively mapped in hardware and is suitable for embedded rithms. One of these techniques is the use of artificial diversity,
applications. which intends to maintain the global search capabilities of
the algorithms, avoiding the premature convergence problem,
Keywords—Optimization engines; swarm intelligence; firefly which produces clustered particles at sub-optimal points when
algorithm; FPGAs; floating-point arithmetic
solving multimodal problems. However, including these tech-
niques implies an additional cost regarding the execution time
I. I NTRODUCTION of the algorithms.
Bioinspired algorithms have been widely and successfully This paper proposes an FPGA implementation of a novel
used in many fields to solve optimization problems. The firefly firefly algorithm. The proposed algorithm makes use of the
algorithm (FA) is a new population-based stochastic technique opposition-based learning approach (OBL). The OBL is a
inspired by the flashing behavior of fireflies. As a population- suitable technique that can be used in two ways: (1) to improve
based technique, the FA algorithm has an intrinsic parallelism the efficiency of algorithms by enhancing the searching process
that can be explored in order to accelerate the execution in the opposite direction of the current search and (2) to in-
time, which is the main drawback of the swarm intelligence troduce artificial diversity avoiding the premature convergence
techniques. The large execution time is more evident for problem.
embedded applications with limited computational capabilities.
A distinguishing feature of the proposed circuits is the use
The FA algorithm has been used for solving combinatorial of floating-point arithmetic representation which allows for a
and unimodal/multimodal numerical optimization problems in large dynamic range and high precision computations. All the
which the computation time is not a restriction and then the arithmetic and trigonometric operators were developed using
optimization process can be executed offline. However, some the error as a design criteria besides area cost, performance
applications require to solve optimization processes under and power consumption [1], [2]. In this work, a 27 bit width
strict time constraints and even in real-time. Examples of these representation (8 and 18 bits for the exponent and mantissa
applications are from fields of online training neural networks, words, respectively) was used. It allows the operators to be

978-1-4673-6383-9/13/$31.00 ©2013 IEEE 39


2013 NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2013)

implemented using 50% less DSP blocks and mantain a similar where Ui is an uniformly distributed random number in the
dynamic range than a single precision format (32 bits) [3]. range [0, 1] and α is a randomization parameter. It can be
notice that the equation of movement is defined by three com-
Synthesis results demonstrate that the proposed architec- ponents: (a) individual component; (b) attraction component;
ture are effectively mapped on commercial FPGA devices. and (c) randomization component. In practice, β0 = 1 and
The architecture, called Hardware Parallel Opposition-based α ∈ [0, 1] has a decreasing behavior as shown in equation (3)
Firefly Algorithm (HPOFA), was validated for four well-known below, where δ ≈ 1, δ ≤ 1, allowing the solutions to be refined
benchmarks using different swarm sizes and dimensionality during the last iterations.
of the problems. Numerical experiments and statistical sig-
nificance analysis demonstrated the correct behavior of the α(t+1) = α(t) · δ (3)
proposed solution. Execution time results pointed out that
in the worst case, the hardware implementation of the FA The pseudocode of the original FA is presented in Algorithm
achieves a speed-up factor of 1156 times, in comparison with (1) [8]. It can be observed that a single firefly with less inten-
the implementation on a MicroBlaze embedded soft processor. sity is attracted to all the others fireflies with higher intensity
Finally, the execution time results of the HPOFA operating at (see the nested for-loops in lines 6 and 7). Notice that when
100MHz were similar to a typical Desktop solution operating fireflies move to solutions without fitness improvement, they
at 1.6GHz. do not return to their previous position (with best fitness). This
lack of local memory allows the fireflies to explore sub-optimal
The remaining of this paper is organized as follows: an solutions and, as a consequence, affects the performance of the
overview of the FA algorithm and OBL approach is provided in optimization algorithm.
Section II. The proposed hardware implementation is described
in Section III and synthesis and validation results are reported Algorithm 1 Pseudocode for the FA algorithm
in Section IV. Finally, Section V concludes the work. 1: Set swarm size S, dimensionality N , search space domain [xmin , xmax ],
α(1) , δ, and maximum number of iterations maxIter
2: Generate the initial population of S random fireflies
II. BACKGROUND 3: Compute the intensity values Ii evaluating the objective function f (xi )
4: Define the absorption coefficient γ
A. Firefly Algorithm 5: repeat
6: for i = 1 to S do
The FA is a stochastic optimization technique, inspired by 7: for k = 1 to S do
the social behavior of lampyridae insects, such as lighting 8: if (Ik < Ii ) in case of a minimization problem then
bugs or fireflies. These luminous insects produce rhythmic 9: Compute the distance rik and attraction β using equation (1)
bioluminescent patterns which are used to attract mates or 10: Create a new solution xi using equation (2)
11: Evaluate new solution f (xi ) and update the light intensity Ii
prey. Recently, the FA has been applied to numerical and 12: end if
combinatorial optimization problems [4], [5], [6], [7]. 13: end for
14: end for
In the original FA algorithm, introduced by Yang [8], [9], 15: Rank the fireflies according to their fitness value and update the best
the flashing patterns are used for simulating the communication intensity value Ibest
process between fireflies. Three main considerations were 16: Update the best solution xg
assumed: (a) fireflies are unisex. Therefore, one firefly can 17: Compute α using equation (3)
18: iter = iter + 1
be attracted to all the other fireflies; (b) attractiveness is 19: until iter = maxITER
proportional to their brightness and then, the less brighter
fireflies are attracted to the more brighter ones; (c) the bright-
ness is associated to the fitness function to be optimized, i.e,
I(x) ∝ f (x). B. Global Firefly Algorithm

The attractiveness β is proportional to the light intensity In order to avoid the above mentioned problems and
I. However, since the light is absorbed by the medium, the to facilitate a hardware parallel implementation of the FA
light intensity decreases with the distance from the emission algorithm, several simplifications were conducted. This paper
source. Thus, assuming a medium with constant absorption proposes a modified FA in which the fireflies are attracted
coefficient γ, the attractiveness between two adjacent fireflies exclusively to the direction of the firefly with the global
can be computed using a monotonic decreasing function, as best solution. This approach, called Global Firefly Algorithm
shown in equation (1). (GFA), is presented in Algorithm (2).
m
β = β0 e−γ·rik , (m ≥ 1). (1) C. Opposition-based Learning
where β0 is the attraction at rik = 0 and rik is the Cartesian The OBL approach, first introduced by Tizhoosh [10], is
distance between fireflies i and k. The parameter γ indicates a simple technique which allows the population-based algo-
the attractiveness’s variation and affects directly the conver- rithms to search for an optimal point in the opposite direction
gence velocity of the optimization algorithm. of the current search. The basic idea is that whenever a
(unsatisfactory) solution is being exploited in a direction, it is
The movement of the firefly i being attracted to another beneficial to consider the opposite direction as well [11]. The
adjacent firefly k, with a brighter light, is determined by OBL approach is based on the definition of opposite number,
equation (2). given by equation (4)
(t+1) (t) (t) (t)
xi = xi + β(xk − xi ) + α(Ui − 1/2) (2) x̆ = a + b − x (4)

40
2013 NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2013)

where x is a real number defined in the range [a, b] and x̆ is Algorithm 3 Pseudocode for the GOFA algorithm
the opposite number of x. This definition is also valid for an 1: Set swarm size S, dimensionality N , search space domain [xmin , xmax ],
N -dimensional point xi defined in the range [ai , bi ], i=1,...,N . α(1) , δ, maximum number of iterations maxIter and maximum number
of iterations without fitness improvement maxF N C
2: Generate the initial population of S random fireflies
Algorithm 2 Pseudocode for the GFA algorithm 3: Compute the intensity values Ii evaluating the objective function f (xi )
4: Define the absorption coefficient γ
1: Set swarm size S, dimensionality N , search space domain [xmin , xmax ], 5: repeat
α(1) , δ, and maximum number of iterations maxIter 6: for i = 1 to S do
2: Generate the initial population of S random fireflies 7: if (Ibest < Ii ) in case of a minimization problem then
3: Compute the intensity values Ii evaluating the objective function f (xi ) 8: if triali = maxF N C then
4: Define the absorption coefficient γ 9: triali = 0
5: repeat 10: for j = 1 : N do
6: for i = 1 to S do 11: if rand > 0.5 then
7: if (Ibest < Ii ) in case of a minimization problem then 12: xij = −xij + U [−1, 1]/2 //apply OBL
8: Compute the distance rik and attraction β using equation (1) 13: end if
9: Create a new solution xi using equation (2) 14: end for
10: Evaluate new solution f (xi ) and update the light intensity Ii 15: else
11: end if 16: Compute the distance rik and attraction β using equation (1)
12: end for 17: Create a new solution xi using equation (2)
13: Rank the fireflies according to their fitness value and update the best 18: end if
intensity value Ibest 19: Evaluate new solution f (xi ) and update the light intensity Ii
14: Update the best solution xg 20: if Ii ≥ find−i then
15: Compute α using equation (3) 21: triali + +
16: iter = iter + 1 22: else
17: until iter = maxITER 23: triali = 0
24: find−i = Ii
25: end if
The OBL technique was initially applied to genetic algo- 26: end if
rithms in which anti-chromosomes allow the search process to 27: end for
be accelerated. Also the OBL was applied to neural computing 28: Rank the fireflies according to their fitness value and update the best
in which the concepts of opposite-weight and opposite-network intensity value Ibest
29: Update the best solution xg
can be used to improve the results [10]. Rahnamayan et al. [12] 30: Compute α using equation (3)
formally demonstrate that, in the case of an unknown function 31: iter = iter + 1
in an N -dimensional space, x̆i has a higher chance to be closer 32: until iter = maxITER
to the solution than xi . Additionally, an empirical verification
of these mathematical proofs was performed, demonstrating
the feasibility of the OBL approach. Recently the OBL ap- III. FPGA I MPLEMENTATION
proach has been applied to Differential Evolution (DE), Ant
Colony Optimization (ACO) and Particle Swarm Optimization The proposed GOFA algorithm was described in VHDL
(PSO) algorithms [11],[13],[14]. hardware description language and was mapped on a Virtex5
FPGA (chip xc5vlx110t). All the hardware implementations
It is important to take into account that the OBL approach are based on several floating-point operators previously devel-
can be used for improving the solutions of the optimization oped [1], [2]. A tradeoff analysis, presented in [3], demon-
problem by performing the search process in the opposite strates that a 27 bit width representation allows the arithmetic
direction of the current search as well as by introducing and trigonometric operators to be implemented using 50% less
artificial diversity in the swarm. Another important aspect is embedded multipliers than a single precision representation (32
that the OBL approach can be implemented with low computa- bits) as well as that it performs computations with a small as-
tional requirements in comparison with others diversity guided sociated error. Therefore, the hardware implementation of the
optimization techniques [15], [16]. This fact allows the OBL- optimization engine is based on a 27 bit width representation.
based algorithms to be easily implemented in hardware using
the not operator (case of symmetric search spaces) and a few The stochastic behavior of the FA algorithm is based on
addition operators (case of non symmetric search spaces). several agents moving randomly on a search space. Equation
(2) states that an uniform distributed random number generator
(RNG) in the range [-0.5,0.5] is required. In this work an
RNG based on a 20-bits linear feedback shift register (LFSR)
D. Global Opposition-based Firefly Algorithm was used (see Fig. 1). Since the LFSR operates in fixed-point,
a fixed to float converter module was implemented allowing
In this work the OBL approach is applied to the GFA the numbers to be represented in a floating-point format. In
algorithm. The pseudocode of the proposed approach, called addition, the most significant bit of the random sequence is
Global Opposition-based Firefly Algorithm (GOFA), is pre- addressed to the signal bit of the floating point representation,
sented in Algorithm (3). It can be observed, from lines 8 providing positive and negative random numbers.
to 15, that the OBL approach is applied randomly to some
dimensions when the firefly i achieves a maximum number Figure 2 shows the hardware architecture of the update
of iterations without fitness improvement (maxF N C). In position process for the firefly i being attracted to the adjacent
addition, a uniformly distributed random number in the range firefly k, as stated by equation (2). This architecture is con-
[-0.5,0.5] is used for producing a small random modification trolled by a Finite State Machine (FSM) which synchronizes
over the new solution (see line 12). one RNG unit, one addition/subtraction unit (F P add) and one

41
2013 NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2013)

−⋅r 2ik −⋅[ x i1 − x k1 2  x i2− x k2 2 ... x i N − x kN 2 ]


LFSR 20 bits =e =e
acc -γ
xi - ×

...
fixed to float converter
b
+ + × exp()
rik CordicTaylorexp
... ...
S Ew Mw xk - ×

...
Fig. 1. Hardware architecture for the floating-point pseudo-random number d=d+1
generator, where Ew = 8 bits and Mw = 18 bits.
Fig. 3. Hardware architecture for the attraction computation. The operators
are shared during the states and all the operations in each state are performed
multiplier unit (F P mul). in parallel. A hybrid CORDIC-Taylor architecture is used for computing the
exponential function as explained in [1].
d=d+1 t 1
xi
t  t   t
= xi ⋅ x k − xi ⋅U i −1/2
opposite number
xi xmax GNA
...

1.0 - × xi xi, j(t)


b + ...

xk d=d+1
× + ≤
...

not
RNG xi, j(t+1)
d=d+1 RNG
[-0.5,0.5]
[-0.5,0.5]
×
xmax
a
selection attraction mutation add comparator Fig. 5. Hardware architecture for computing the opposition number for
symmetric search spaces.
Fig. 2. Hardware architecture for update the firefly position. The operators
are shared during five states and all the operations in each state are performed
in a parallel way. each firefly’s position is evaluated using S hardware parallel
descriptions of the fitness function. At the fourth state, the
best solution is identified computing the minimum value (in
As explained in Section II, the original FA algorithm case of a minimization problem) among the S fitness values.
is based on the luminous attraction between fireflies. The Then, the global best position xg is updated. At the fifth state,
attractiveness can be computed as a monotonic decreasing the new fireflies positions are computed in parallel. To do
function. In order to simplify the hardware implementation of that, the attraction between the firefly i and the best firefly in
the attraction architecture, a second order decreasing function the swarm is computed using the attraction architecture (see
(m = 2) has been used in equation (1). This choice allows Figure 3). Afterwards, the fireflies update the position using
the square root computation during the distance calculation to the movement architecture described in Figure 2. Finally, the
be avoided as well as to expand the attraction region between algorithm returns to the third state and a new iteration starts.
fireflies.
Figure 3 depicts the hardware architecture used for the Each firefly has a trial counter which is incremented
computation of the attractiveness. It is important to point when the new position does not improve the fitness value.
out that the same hardware resources used for updating the After a predefined number of iterations without fitness change
firefly position were used for computing the attraction between (maxF N C), the opposite signal indicates to the movement
two fireflies. This choice is justified taking into account the architecture that the opposite number of the current position
sequential behavior of these two processes (see lines 8 and 9 must be computed in order to perform the search in the oppo-
on Algorithm 2 and lines 16 and 17 on Algorithm 3). site direction. Figure 5 presents the hardware implementation
of the opposite number calculation. This architecture makes
use of an RNG unit which indicates if the not operator is
A. Hardware Parallel Opposition-based Firefly Algorithm applied to the most significant bit of the current position
The general hardware architecture for the GOFA algorithm x(t). In addition, several bits of the mantissa word of the
is presented in Figure 4. This architecture, called Hardware random number are addressed to the mantissa of the new firefly
Parallel Opposition-based Firefly Algorithm (HPOFA) makes position, accomplishing a small random variation over the new
use of S parallel fireflies updating their positions in a parallel position. It is important to take into account that this process
way and evaluating their solutions using S parallel fitness is repeated for each dimension in the N dimensional search
functions implementations. space.
This architecture is based on an FSM with five states, The proposed architecture uses LookUp Tables (LUTs)
namely, waiting, initialization, fitness, best detection and fire- for storing the N dimensional fireflies positions xi instead
fly movement. At the first state the architecture waits for of embedded RAM blocks. This fact allows the proposed
a start signal indicating that the optimization process can architectures to be mapped on different FPGA families and
be initialized. At the second state the initial N dimensional technologies. The RS-232 block is used to communicate with
position of each firefly is randomly computed. At the third state an external environment, developed in Matlab, sending the

42
2013 NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2013)

start
FSM
FSM RAM xg rr1k RAM x1
GNA
GNA11 RAM x1 f(x
f(x11)) tr1=tr1+1 state 1k

...
exp()
exp()
= maxFNC RS-232
find1 RS-232 opp1 xx1k RAM x1
opp1 1k
Initial seed
fmin xg
tr2=tr2+1 RAM xg rr2k RAM x2
GNA
GNA22 RAM x2 f(x
f(x22)) min
≤ min(f(f11,f,f22,…,
,…,ffSS)) 2k

...
= maxFNC
exp()
exp()

...
global
global best best
Initial seed
find2 opp2
opp2 xx2k RAM x2
...

2k

...
kxg
RAM x1
GNA RAM xS f(x trS=trS+1
GNASS f(xSS)) ≤ RAM x2 RAM xg rrSk RAM xS
...

= maxFNC
RAM xg Sk

...
...
findS exp()
exp()
Initial seed oppS RAM xS oppS xxSk RAM xS
Sk

2. initialization 3. fitness 4. best detection 5. firefly movement

Fig. 4. Hardware architecture for the GOFA algorithm (HPOFA). The oppi signal indicates that the new firefly position will be performed computing the
opposition number of the current position.

TABLE I. E XPERIMENTAL CONDITIONS


global best position xg (solution of the optimization problem)
when the maximum number of iterations is reached. Parameter Value
Number of fireflies 8,10,12
Dimensionality 4,6,10
IV. R ESULTS Max. number iterations 10000
Max. iterations for OBL maxF N C 40
In order to analyze the scalability of the proposed hardware Coeficiente de absoro γ 0.8
Inicial attraction β0 0.8
architectures, several experiments for different number of Minimum attraction βmin 0.2
particles and dimensions of the optimization problem were Parameter m 2.0
conducted. The architectures were implemented for 8, 10 Parameter α (decrease linearly) [1.0,0.001]
and 12 parallel fireflies and 4, 6 and 10 dimensions. To do
that, an automatic VHDL code generator tool was developed
in Matlab. This flexible tool, allows the user to set several A. Synthesis Results
parameters of the algorithm, such as, bit width representation, The proposed HPOFA architecture was synthesized in the
number of parallel fireflies, dimensionality of the optimization Xilinx ISE10.1 development tool for a Virtex5 FPGA device
problem, number of iterations, search space values, among (chip xc5vlx110t). Table II presents the synthesis results. The
others parameters of the optimization engine. cost in logic area is reported in terms of flip-flops (FF), LUTs
A minimization objective is intended in order to find and dedicated DSP blocks consumption. The performance is
the global minimum point of the Sphere (3), Quadric (4), presented in Megahertz.
Rosenbrock (6) and Rastrigin (5) benchmarks. All the TABLE II. S YNTHESIS R ESULTS FOR THE HPOFA ARCHITECTURE
fitness functions have a global minimum value equal to zero ( CHIP XC 5 VLX 110 T ), 27 BITS
(f (x)=0) at the positions x(i)=0 for the Sphere, Quadric and Dimensions, Fitness FF LUTs DSP48E Freq.
Rastrigin problems and x(i)=1 for the Rosenbrock problem, fireflies function 69120 69120 64 MHz
where i=1...N f1 6720 20957 17 130.100
N
X N =4, f2 6760 21098 17 130.100
f1 (~x) = x2i (3) S=8 f3 6952 21091 17 130.100
f4 9656 28314 17 120.902
i=1 f1 7152 22239 17 130.100
 2 N =6, f2 7192 22673 17 130.100
N i S=8 f3 7065 21334 17 130.100
X X f4 10120 30380 17 126.211
f2 (~x) =  xj  (4) f1 8024 24866 17 130.100
i=1 j=1 N =10, f2 8064 25388 17 130.100
S=8 f3 8256 25015 17 130.100
f4 11016 33121 17 127.857
N/2
X 2 f1 8876 27956 21 129.914
2
f3 (~x) = 100 x2i − x22i−1 + (1 − x2i−1 ) (6) N =6, f2 8926 28353 21 130.100
S=10 f3 9166 28196 21 129.914
i=1 f4 12576 37629 21 120.744
f1 10598 33855 25 129.729
N
X N =6, f2 10658 34395 25 130.100
x2i − 10cos(2πxi ) + 10

f4 (~x) = (5) S=12 f3 10946 33956 25 129.729
f4 15062 45306 25 125.688
i=1

The experimental conditions for the experiments are listed As expected, the hardware resource utilization depends
in table I. on the complexity of the optimization problems. It can be

43
2013 NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2013)

observed that the Rastrigin function, which implements the HPFA and HPOFA architectures. Although it is not reported
cosine function, requires more flip-flops and LUTs than the here, the convergence results of the HPFA for the unimodal
others benchmark problems. It can be observed that at the benchmark problems are similar to those obtained by the
worst case (twelve parallel fireflies and six dimensional prob- HPOFA architecture. Table III presents the convergence results
lem) the cost in logic area is around 22% of flip-flops, 66% achieved for the multimodal Rosenbrock (f3 ) and Rastrigin
of LUTs and 39% of DSP blocks. (f4 ) benchmarks. This table allows for comparisons results
between the HPFA and the HPOFA (which uses the with OBL
In general, the operational frequency of the proposed cir- approach) architectures. In the table, the tests with positive
cuits is around 130 MHz. As expected, the frequency decreases results are labeled with ‘+’ at the last column, indicating
with the complexity of the fitness functions. that statistical significance was found between the proposed
In addition, it can be observed that the variation of the architectures. In addition, the best result for each test is
number of parallel particles results on more hardware resources highlighted in gray color.
consumption than the variation of the number of dimensions.
TABLE III. C ONVERGENCE RESULTS COMPARISON FOR THE HPFA
It can be explained taking into account that each parallel AND HPOFA ARCHITECTURES . T HE HPFA DOES NOT USE THE OBL
firefly requires the implementation of a fitness function unit OPERATOR
and the implementation of more arithmetic and trigonometric Problem HW Mean Median Min. Std. Goals
operators, whereas an additional dimension can take advantage f3 4D HPFA 9.5905 10.9928 1.7528 4.4379 0/16
of the same hardware resources, requiring only several registers HPOFA 0.6599 0.5996 0.1532 0.5132 13/16 +
for storing the fireflies positions. f4 4D HPFA 5.1614 4.9748 1.09E-4 3.7794 1/16
HPOFA 6.73E-5 6.49E-5 2.05E-5 2.95E-5 16/16 +
f3 6D HPFA 9.4744 10.0992 2.7108 4.2008 0/16
B. Convergence Results HPOFA 0.9220 0.9275 0.0381 0.5217 8/16 +
f4 6D HPFA 21.1429 20.3967 3.9799 11.7640 0/16
HPOFA 8.72E-5 8.89E-5 3.89E-5 1.98E-5 16/16 +
A validation environment tool, developed in Matlab, was f3 10D HPFA 11.0203 9.1553 5.0206 5.1038 0/16
used to send through the RS232 communication several com- HPOFA 3.6639 3.6879 0.0344 2.2179 2/16 +
mands for configuring internal parameters of the algorithm. f4 10D HPFA 35.9430 35.3212 20.8942 10.1523 0/16
HPOFA 1.37E-4 1.25E-4 6.53E-5 5.11E-5 16/16 +
This tool is also used for decoding the final results of the
optimization process. In addition, a digital oscilloscope was
used for measuring the execution time of the algorithms. It can be observed that the HPFA architecture has a poor
The proposed hardware architectures were validated for a performance for solving the multimodal problems. On the
minimization case of the four benchmark problems. For each other hand, the HPOFA architecture, which makes use of the
benchmark problem 16 runs were executed, each one using OBL approach, presents a significant improvement in the case
different initial positions of the swarm of fireflies. It can be of the Rastrigin problem, achieving the fitness threshold for
done modifying some dip switches connected directly to the all the experiments. In the case of the Rosenbrock function,
initial seed register of the RNG unit, which creates the initial the HPOFA architecture also achieves better results than the
position of each firefly. The global best position xg and the best HPFA architecture, specifically for small dimensional prob-
fitness obtained for each experiment were used for computing lems. Notice that the HPOFA architecture presents statistical
the mean, median, minimum and standard deviation values. significance for all the experiments.
Also, the number of goals, i.e. the number of experiments Table IV presents the convergence results for the HPOFA
which achieve the fitness threshold, is computed. The search architecture with variation of the dimensionality and number
space was limited to the range [−8.0, 8.0] and the thresholds of parallel fireflies. It is possible to conclude that the proposed
values were configured to 0.01 for the unimodal and the HPOFA architecture achieves the optimal point for the Sphere
Rastrigin benchmarks and to 1.0 for the Rosenbrock function. (f1 ) and Quadric (f2 ) problems with a success rate of 100%
In addition, a performance comparison with a hardware-based for all the size problems. In these cases, the increment on the
solution of the well known Particle Swarm Optimization swarm size does not affect the final solution.
algorithm was performed in order to validate the obtained
results. In the case of the six-dimensional Rosenbrock problem,
it can be observed that the use of more parallel fireflies
In order to validate the search capabilities of the pro- allows the final solution to be improved in terms of the
posed architectures with statistical significance, the following approximation to the optimal point. In this case, the success
methodology has been applied. Firstly, a Kolmogorov-Smirnov rates are approximately 50%, 50% and 75% for 8, 10 and
test was applied in order to check whether the value of the 12 parallel individuals, respectively. Finally, in the case of
results were drawn from a normal distribution or not. Once the Rastrigin problem, the HPOFA architecture achieves the
the assumption of a normal distribution was rejected, the non- optimal point for all the experiments. As expected, the size
parametric Wilconxon rank sum test was applied to compare problem affects directly the final result.
the medians of the algorithms, as discussed in [17] and
[18]. The statistical tests were performed using the statistical In order to validate the suitability of the proposed circuits,
toolbox provided by Matlab. A confidence level of 95% was Table V presents a convergence comparison between the
considered. HPOFA architecture and a hardware-based solution of the PSO
algorithm with OBL approach. This last architecture, called
A hardware architecture of the GFA algorithm, without HPOPSO (Hardware Parallel Opposition-based PSO) was also
OBL operator (Algorithm 2), called HPFA, was also imple- implemented using the same arithmetic floating-point blocks
mented in order to compare the solutions achieved by the and was mapped and validated in the same FPGA device.

44
2013 NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2013)

TABLE IV. C ONVERGENCE RESULTS FOR THE HPOFA TABLE VI. C ONVERGENCE COMPARISON BETWEEN HARDWARE AND
ARCHITECTURE SOFTWARE IMPLEMENTATIONS OF THE GOFA ALGORITHM

Dim., Problem Mean Median Min. Std. Goals Dim., Implementation Median
Part. Dev. Part. f1 f2 f3 f4
f1 6.50E-9 5.38E-9 2.32E-9 3.30E-9 16/16 N =4 HPOFA 5.38E-9 5.48E-9 0.5996 6.49E-5
N =4, f2 6.85E-9 5.48E-9 6.5E-10 7.61E-9 16/16 S=8 GOFA 0.0 0.0 0.0806 0.0
S=8 f3 0.6599 0.5996 0.1532 0.5132 13/16 N =6 HPOFA 1.18E-8 2.09E-8 0.9275 8.89E-5
f4 6.73E-5 6.49E-5 2.05E-5 2.95E-5 16/16 S=8 GOFA 0.0 2.90E-5 0.3019 2.25E-5
f1 1.13E-8 1.18E-8 4.97E-9 3.93E-9 16/16 N =10 HPOFA 2.55E-8 6.08E-8 3.6879 1.25E-4
N =6, f2 2.14E-8 2.09E-8 7.95E-9 9.85E-9 16/16 S=8 GOFA 3.55E-5 7.26E-4 3.1022 2.84E-3
S=8 f3 0.9220 0.9275 0.0381 0.5217 8/16 N =6 HPOFA 1.05E-8 1.21E-8 1.0162 7.50E-5
f4 8.72E-5 8.89E-5 3.89E-5 1.98E-5 16/16 S=10 GOFA 0.0 2.00E-6 0.1691 3.00E-6
f1 2.71E-8 2.55E-8 1.17E-8 1.10E-8 16/16 N =6 HPOFA 1.51E-8 1.61E-8 0.6916 8.59E-5
N =10, f2 6.90E-8 6.08E-8 3.46E-8 3.51E-8 16/16 S=12 GOFA 0.0 5.00E-6 0.1104 0.0
S=8 f3 3.6639 3.6879 0.0344 2.2179 2/16
f4 1.37E-4 1.25E-4 6.53E-5 5.11E-5 16/16
f1 1.04E-8 1.05E-8 1.55E-9 4.30E-9 16/16
N =6, f2 1.54E-8 1.21E-8 5.04E-9 1.06E-8 16/16 taking into account that the software implementation uses
S=10 f3 0.9006 1.0162 0.0137 0.4870 8/16
f4 7.95E-5 7.50E-5 3.29E-5 3.54E-5 16/16 32 bits floating-point representation whereas the hardware
f1 1.34E-8 1.51E-8 3.79E-9 5.25E-9 16/16 solution uses 27 bits (8 bits for the exponent word and 18
N =6, f2 1.91E-8 1.61E-8 3.57E-9 1.49E-8 16/16 bits for the mantissa word).
S=12 f3 0.7446 0.6916 0.2538 0.3898 12/16
f4 8.45E-5 8.59E-5 2.58E-5 3.87E-5 16/16

C. Execution Time Comparison


TABLE V. M EDIAN COMPARISON BETWEEN THE HPOFA AND
HPOPSO ARCHITECTURES The execution time of the proposed HPOFA implemen-
Problem HPOPSO HPOFA tation was compared with the hardware-solution of the PSO
f1 4D 1.7E-38 5.38E-9 algorithm as well as with two different software approaches.
f1 6D 2.4E-38 1.18E-8 The first one is based on a C code implementation executed
f1 10D 4.1E-38 2.55E-8
f2 4D 2.1E-38 5.48E-9 on an Intel Core Duo processor, working at 1.6GHz, 2 GB
f2 6D 3.7E-38 2.09E-8 RAM and Windows XP OS. The second one is based on a C
f2 10D 3.1E-18 6.08E-8
f3 4D 2.18E-5 0.5996
code implementation executed on a MicroBlaze soft processor
f3 6D 1.2557 0.9275 embedded in the same FPGA device, working at 100MHz and
f3 10D 7.4097 3.6880 64KB program memory. This second implementation intends
f4 4D 4.85E-5 6.49E-4
f4 6D 7.76E-5 8.89E-5
to compare the performance of the proposed algorithms for em-
f4 10D 1.43E-4 1.25E-4 bedded systems applications, where the operational frequencies
are lower than conventional solutions using Desktops.
The execution time per iteration of the HPOFA are of the
It can be observed that both bioinspired hardware solutions
order of microseconds, whereas that for MicroBlaze it is of
achieves the global minimum for the unimodal optimization
the order of milliseconds, see table VII. Thus, the acceleration
problems. However, the HPOPSO achieves more refined so-
factors between these two implementations are 1222, 1156,
lutions due to the fact that the PSO based algorithms make
1498, 6940 times for the Sphere, Quadric, Rosenbrock and
use of an inertia factor which decrease in the time, allowing
Rastrigin problems, respectively. It can be observed that the
the particles to exploit the solutions around an optimal point.
HPOPSO achieves better results for the unimodal benchmarks;
On the other hand, the HPOFA achieves better results for
however in the case of complex problems, for instance the
solving the multimodal Rosenbrock problem, particularly for
multimodal Rastrigin function, the execution times between
the more complex problems. In the case of the multimodal
these two architectures are similar. Table VIII presents the
Rastrigin benchmark, the HPOFA achieves similar results than
total execution time of the proposed architectures after 10000
the PSO-based solution for the 4 and 6 dimensional problems
iterations. It can be observed that the the HPOFA architecture
and outperform the HPOPSO in the case of 10 dimensions.
(operating at 100MHz) achieves speed factors of 1.5, 1.54,
Table VI presents a convergence comparison between the 1.30 and 3.81 for the Sphere, Quadric, Rosenbrock and Rast-
HPOFA architecture (27 bits representation) and a software im- rigin problems, respectively, in comparison with the Desktop
plementation of the GOFA algorithm, implemented in C code implementation (operating at 1.6GHz).
using 32 bits representation. The same experimental conditions
listed in Table I were used for both implementations. Both TABLE VII. E XECUTION TIME COMPARISON PER ITERATION , 10
DIMENSIONAL PROBLEM
the hardware and software implementations achieve results
close to the optimal solution. However, some differences can Implementation Execution time
be highlighted. In the case of the unimodal benchmarks, f1 f2 f3 f4
the HPOFA hardware architecture presents solutions closer hardware, HPOFA 3.06µs 3.51µs 3.51µs 4.68µs
FPGA, 100MHz
to the optimal point than the software implementation for hardware, HPOPSO 2.28µs 2.73µs 2.73µs 4.43µs
the 10 dimensional problem. In the case of the Rosenbrock FPGA, 100MHz
function, the software implementation presents better results Sof tware, GOFA 3.74ms 4.06ms 5.26ms 32.48ms
for the small dimensional problems (N =4 and N =6), while FPGA, uBlaze, 100MHz
similar results were achieved for the 10 dimensional problem. Sof tware, O-PSO 15.53 ms 15.76 ms 17.18 ms 38.27 ms
FPGA, uBlaze, 100MHz
Finally, in the case of the Rastrigin function, the software
implementation achieves better solutions. It can be explained

45
2013 NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2013)

TABLE VIII. T OTAL EXECUTION TIME , 10 DIMENSIONAL PROBLEM


R EFERENCES
Implementation Execution time
[1] D. M. Muñoz, D. Sánchez, C. Llanos, and M. Ayala-Rincón, “FPGA-
f1 f2 f3 f4
based floating-point library for CORDIC algorithms,” in Proc. Interna-
hardware, HPOFA 28.00ms 32.40ms 32.20ms 48.80ms tional Southern Programmable Logic Conference. Porto de Galinhas,
FPGA, 100MHz Brazil: IEEE, 2010, pp. 55–60.
Sof tware, GOFA 42ms 50ms 42ms 186ms [2] D. M. Muñoz, D. Sánchez, C. Llanos, and M. Ayala-Rincón,, “Tradeoff
Intel Core2 Duo, 1.6GHz of FPGA design of a floating-point library for arithmetic operators,”
International Journal of Integrated Circuits and Systems, vol. 5, no. 1,
pp. 42–52, 2010.
Based on the above mentioned results, one can conclude [3] D. M. Muñoz, “Otimização por inteligência de enxames usando ar-
that the hardware parallel HPOFA architecture is a feasible quiteturas paralelas para aplicações embarcadas,” Ph.D. dissertation,
University of Braslia, Brazil, 2012.
solution for embedded applications, accelerating the execution
[4] L. dos Santos Coelho, D. de Andrade Bernert, and V. Mariani, “A
time of the optimization problems. The FPGA implementation chaotic firefly algorithm applied to reliability-redundancy optimization,”
allows the FA algorithm to exploit its parallel capabilities. in Proc. Int. Congress on Evolutionary Computation, New Orleans,
Additionally, it is possible to expect a power consumption USA, 2011, pp. 517 –521.
reduction in comparison with conventional Desktop implemen- [5] T. Hassanzadeh, H. Vojodi, and A. Moghadam, “An image segmen-
tations. The convergence comparison between the HPFA and tation approach based on maximum variance intra-cluster method and
HPOFA architectures demonstrates that the opposition-based firefly algorithm,” in Proc. of the International Conference on Natural
Computation, vol. 3, Shanghai, China, 2011, pp. 1817 –1821.
learning approach is suitable for avoiding the premature con-
[6] T. Apostolopoulos and A. Vlachos, “Application of the firefly algorithm
vergence problem, improving the solution of the optimization for solving the economic emissions load dispatch problem,” Interna-
engine. tional Journal of Combinatorics, vol. 2011, pp. 1–23, 2011.
[7] R. Falcon, M. Almeida, and A. Nayak, “Fault identification with binary
adaptive fireflies in parallel and distributed systems,” in Proc. Int.
V. C ONCLUSION Congress on Evolutionary Computation, 2011, pp. 1359 –1366.
[8] S. Yang, Nature-Inspired Metaheuristic Algorithms. Cambridge, UK:
This work has presented an FPGA implementation of Luniver Press, 2008.
the FA algorithm with opposition-based learning approach [9] S. Yang, “Firefly algorithms for multimodal optimization,” Lecture
(HPOFA). The proposed hardware architecture takes advantage Notes on Computers Sciences: Stochastic Algorithms: Foundations and
of using a simple operator (not operator) for improving the Applications, vol. 5792, pp. 169–178, 2009.
quality of the solution and preserving swarm diversity, avoid- [10] H. Tizhoosh, “Opposition-based learning a new scheme for machine
ing premature convergence. The HPOFA architecture uses intelligence,” in Proc. Int. Conference on Computational Intelligence
for Modelling, Control and Automation, Vienna, Austria, 2005, pp. 695–
the suitable floating-point arithmetic allowing high precision 701.
computation with a large dynamic range. [11] F. AlQunaieer, H. Tizhoosh, and S. Rahnamayan, “Opposition based
computing a survey,” in Proc. Int. Joint Conference on Neural Networks,
Synthesis results demonstrate that the proposed architecture Barcelona, Spain, 2010, pp. 1098–7576.
is effectively mapped on FPGAs and achieves an operational [12] S. Rahnamayan, H. Tizhoosh, and M. Salama, “Opposition versus
frequency of 130MHz. Convergence results and statistical randomness in soft computing techniques,” Journal Applied Soft Com-
significance analysis point out that the HPOFA achieves puting, vol. 8, pp. 906–918, 2008.
satisfactory results for optimizing unimodal and multimodal [13] A. Malisia and T. H.R., “Applying opposition-based ideas to the
benchmark problems. It was observed that in the worst case ant colony system,” in Proc. IEEE Swarm Intelligence Symposium,
(8 parallel fireflies optimizing a 10 dimensional problem), the Honolulu, HI, 2007, pp. 182–189.
proposed HPOFA architecture achieves median values of 2.5E- [14] H. Jabeen, Z. Jalil, and A. Baig, “Opposition based initialization in
particle swarm optimization (O-PSO),” in Proc. ACM Conference on
8, 6.0E-8, 3.7 and 1.2E-4 for the Sphere, Quadric, Rosenbrock Genetic and Evolutionary Computation, Montreal, Canada, 2009, pp.
and Rastrigin benchmarks, respectively. In addition, execution 2047–2052.
time results pointed out that in the worst case, the HPOFA [15] D. M. Muñoz, C. Llanos, L. S. Coelho, and M. Ayala-Rincón, “Hard-
achieves a speed-up factor of 1156 times, in comparison with ware particle swarm optimization based on the attractive-repulsive
the implementation on a MicroBlaze embedded soft processor. scheme for embedded applications,” in Proc. Int. Conf. on Reconfig-
Finally, in the worst case, the HPOFA operating at 100MHz urable Computing and FPGAs. Cancún, México: IEEE, 2010, pp.
55–60.
achieves a speed-up factor of 1.3 times in comparison with a
[16] D. M. Muñoz, C. Llanos, L. S. Coelho, and M. Ayala-Rincón,
typical Desktop solution operating at 1.6GHz. “Hardware particle swarm optimization with passive congregation for
embedded applications,” in Proc. Int. Southern Programmable Logic
As future works we intend to estimate the power con- Conf., Córdoba, Argentina, 2011, pp. 173–178.
sumption of the proposed hardware architectures as well as [17] J. Durillo, J. Garcı́a-Nieto, A. J. Nebro, C. Coello, F. Luna, and
to develop some applications in the area of adaptive digital E. Alba, “Multi-objective particle swarm optimizers: An experimental
filters and adaptive learning in the field of mobile robotics. comparison,” in Proc. of the Int. Conference on Evolutionary Multi-
Criterion Optimization, 2009, pp. 495–509, Nantes, France.
[18] J. Demšar, “Statistical comparison of classifiers over multiple data sets,”
ACKNOWLEDGMENT J. Machine Learning Research, vol. 7, pp. 1–30, 2006.

The authors would like to thank the National Council


of Scientific and Technological Development of Brazil -
CNPq (Process 142033/2008-1), the Grandes Desafios Project
(MCT/CNPq) for its financial support; and Xilinx University
Program.

46

You might also like