Professional Documents
Culture Documents
Sorting Unsigned Permutations by Reversals Using Multi-Objective Evolutionary Algorithms With Variable Size Individuals
Sorting Unsigned Permutations by Reversals Using Multi-Objective Evolutionary Algorithms With Variable Size Individuals
Abstract—Sorting by reversals is a simplified version of the πn+1 = n + 1 to the end of each permutation to count the
genome rearrangement problem that seeks to discover the evo- breakpoints at the start and end of permutation accordingly.
lutionary relationship between different genomes, and is one of As the goal identity permutation has no breakpoints, sorting
the many challenging problems in Bioinformatics. Solving the
problem optimally has been proved to be NP-Hard and so a by reversals corresponds to finding a series of reversals that
selection of approximation algorithms have been developed. In eliminates all breakpoints.
this paper a new mapping order is introduced to solve the There are two types of permutations, signed and unsigned.
problem of sorting unsigned permutations using a specialized For signed permutations each πi has a positive or negative sign
multi-objective genetic algorithm. Our modified genetic algorithm reflecting the orientation of that block of genes in genome.
uses a population with variable length individuals to maintain a
worst time running time complexity of O(n4 log2 n), where n The problem of sorting a signed permutation by reversals can
is the problem size. The results show that this approach is more be solved sub-optimally in O(n2 ) time [4], with the number
effective than the 3/2 heuristic method and previous genetic of reversal steps no greater than 3× the optimal solution [2].
algorithm approaches. However, having signed permutations is not always possible
Index Terms—Sorting by reversals, genome rearrangement, due to limitations in equipment and costs, so at this time the
sorting unsigned permutations, variable size individuals, multi-
objective genetic algorithm application of sorting by reversals to the unsigned permutation
problem has wider applicability.
From a combinatorial mathematics point-of-view, identify-
I. I NTRODUCTION
ing the optimal sequence of reversals for sorting of unsigned
Analysis of genome rearrangements in molecular evolution permutations is an NP-Hard problem [5], therefore error
was pioneered by Dobzhansky and Sturtevant in 1938, who bounded heuristic solutions have been proposed [2] [6] [7].
published a milestone paper with an evolutionary tree pre- The lowest guaranteed error bound thus far is the 1.375
senting a rearrangement scenario with 17 inversions linking algorithm proposed by Berman et al [8], meaning that the
the species D. pseudoobscura and D. Miranda [1] [2]. Every length of the sequence found will be within 1.375× the length
genome rearrangement study involves solving a combinato- of the optimal sequence. Auyeung and Abraham [9] suggested
rial puzzle to find a series of genome rearrangements that a genetic algorithm (GA) approach to solve the problem by
transform one genome into another [3]. Reversal is the most mapping the unsigned reversal problem into 2n possible signed
commonly observed mechanism in rearrangement of genes reversal problems, then using the GA to heuristically search
and this makes sorting by reversal to be one of the most this combinatorial space. Results showed that their method
challenging problems in bioinformatics in past decade. was effective in many cases, but the estimated time complexity
In their simplest form, rearrangement events can be modeled of their method was O(n5 ). Here we use a modified version
by series of reversals that transform one genome to another of the standard GA that employs different size individuals
[3]. The order of genes in a genome can be represented by a to decrease the running time of algorithm such that it takes
permutation Π = hπ1 , π2 , π3 . . . πn i, where n is the number O(n4 log2 n) in the worst case, but empirical studies demon-
of genes and πi is the gene id in position i. This problem is strate that the performance is considerably reduced in the
reduced to sorting by reversals problem that can be described average case. It is difficult to determine a guaranteed bound for
as: given a permutation Π, find a shortest series of reversals our algorithm since it is stochastic, however as our empirical
hρ1 ρ2 . . . ρm i that transforms it into the identity permutation. results show, the method works well for smaller permutation
To solve this problem, we introduce the breakpoint distance of problems.
a permutation to assess the progress of the algorithm towards The rest of this paper is organized as follow: in Section 2
the solution. We call a pair of neighboring elements πi and our proposed method is explained and our modified genetic
πi+1 ∈ Π, with 0 ≤ i ≤ n an adjacency if πi and πi+1 , are algorithm is described in detail by defining specific crossover
consecutive numbers, otherwise we call the pair a breakpoint. and mutation operators. Results are presented in Section 3 and
Then the breakpoint distance is the number of breakpoints finally we conclude the paper with some discussion about our
in a permutation. Note that we add to π0 = 0 the start and proposed method and a future open research problems.
Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:16:02 UTC from IEEE Xplore. Restrictions apply.
The genetic algorithm attempts to solve this problem by bal-
ancing these two competing objectives in its fitness function.
Given two individual solutions χxi and χyj , i is selected over j
probabilistically when i dominates j, or in other words, when
x < y (i is shorter than j) and B(Πi ) < B(Πj ). The method
(a) follows the framework of the standard genetic algorithm, but
uses special mutation and crossover operators that extend
and shrink the number and arrangement of reversals in each
solution:
• Create an initial random population with size n log n and
containing a diversity of short individuals and set it as the
current population. The maximum length of each reversal
(b)
sequence is set to 10% of the input permutation length,
n.
• Evaluate current population with the two objective func-
tions; the result will be the breakpoint distance of each
individual and its length.
• Select the best individuals (individuals with smallest
(c) breakpoint distance and length based on domination)
from current population. These individuals can be of
Fig. 1. Different crossover operators used in the method. (a) Regular 2-points different sizes.
crossover; (b) Absorption crossover; (c) Adjunction crossover
• Apply the modified crossover (see below) on this selected
individuals to create new offspring.
II. M ETHOD • Apply the modified mutation (see below) on these off-
293
Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:16:02 UTC from IEEE Xplore. Restrictions apply.
Fig. 3. Quality of solutions found by the three methods for increasing size of
Fig. 2. A sample run of the algorithm for a permutation with length 35, problem. Note that the method described here produces on average solutions
showing the progression of the best individual of the population. that are shorter by 20% than the 3/2 algorithm and on average 4% better than
the previous GA method.
294
Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:16:02 UTC from IEEE Xplore. Restrictions apply.
correlation so that common sub-sequences become schema in
the solutions and therefore accelerate problem solving. This
issue is left as an open research problem.
ACKNOWLEDGMENT
Authors thank Dr. Minghui Jiang at Utah State University
for his help in this research.
R EFERENCES
[1] T. Dobzhansky and A. H. Sturtevant, “Inversions in
the Chromosomes of Drosophila Pseudoobscura.” Genetics,
vol. 23, no. 1, pp. 28–64, Jan. 1938. [Online]. Available:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1209001/
[2] V. Bafna and P. A. Pevzner, “Genome Rearrangements and Sorting by
Reversals,” SIAM J. Comput., vol. 25, no. 2, pp. 272–289, Feb. 1996.
[Online]. Available: http://dx.doi.org/10.1137/S0097539793250627
[3] N. C. Jones and P. A. Pevzner, An Introduction
to Bioinformatics Algorithms (Computational Molecular Bi-
ology). The MIT Press, Aug. 2004. [Online]. Avail-
able: http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-
20&path=ASIN/0262101068
[4] H. Kaplan, R. Shamir, and R. E. Tarjan, “Faster and simpler algorithm
for sorting signed permutations by reversals,” in Proceedings of
the eighth annual ACM-SIAM symposium on Discrete algorithms,
ser. SODA ’97. Philadelphia, PA, USA: Society for Industrial
and Applied Mathematics, 1997, pp. 344–351. [Online]. Available:
http://portal.acm.org/citation.cfm?id=314318
[5] A. Caprara, “Sorting by reversals is difficult,” in Proceedings of the first
annual international conference on Computational molecular biology,
ser. RECOMB ’97. New York, NY, USA: ACM, 1997, pp. 75–83.
[Online]. Available: http://dx.doi.org/10.1145/267521.267531
[6] J. Kececioglu and D. Sankoff. (1995) Exact and Ap-
proximation Algorithms for Sorting By Reversals, With
Application to Genome Rearrangement. [Online]. Available:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.49.9970
[7] D. A. Christie, “A 3/2-approximation algorithm for sorting by
reversals,” in Proceedings of the ninth annual ACM-SIAM symposium
on Discrete algorithms, ser. SODA ’98. Philadelphia, PA, USA:
Society for Industrial and Applied Mathematics, 1998, pp. 244–252.
[Online]. Available: http://portal.acm.org/citation.cfm?id=314711
[8] P. Berman, S. Hannenhalli, and M. Karpinski. (2001) 1.375-
Approximation Algorithm for Sorting by Reversals. [Online]. Available:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.9.5673
[9] A. Auyeung, “Estimating Genome Reversal Distance by
Genetic Algorithm,” The IEEE Congress on Evolutionary
Computation, vol. 2003, pp. 1157–1161. [Online]. Available:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.75.6866
[10] P. Hart, N. Nilsson, and B. Raphael, “A Formal Basis for the Heuristic
Determination of Minimum Cost Paths,” IEEE Transactions on Systems
Science and Cybernetics, vol. 4, no. 2, pp. 100–107, Feb. 1968.
[Online]. Available: http://dx.doi.org/10.1109/TSSC.1968.300136
295
Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:16:02 UTC from IEEE Xplore. Restrictions apply.