Ji 等。 - 2015 - One-dimensional pairwise CNN for the global alignm

Neurocomputing 149 (2015) 505–514
Contents lists available at ScienceDirect
Neurocomputing
journal homepage: www.elsevier.com/locate/neucom
One-dimensional pairwise CNN for the global alignment of two

DNA sequences
Luping Ji n, Xiaorong Pu, Hong Qu, Guisong Liu
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, PR China
art ic l e i nf o a b s t r a c t
Article history: The cellular neural network (CNN) is one of the classic artificial neural networks. During the past
Received 17 September 2013 decades, the one-dimensional CNN models and their applications have not yet been paid enough
Received in revised form enthusiasm too. For this reason, this paper proposes a simplified one-dimensional CNN model and then
30 July 2014
designs a pairwise network using this model to demonstrate its applicability. This pairwise CNN consists
Accepted 7 August 2014
of two parallel one-dimensional CNNs, a fixed master and a movable slave. Using this pairwise CNN, an
Communicated by R.W. Newcomb
Available online 23 August 2014 algorithm is developed to perform the global alignment of two DNA sequences. In this algorithm, the
slave moves forward step by step, and the cell states of the master are computed in the meanwhile.
Keywords: Based on all the states obtained in all time steps, a state selection array is generated then a global
Cellular neural network
alignment path is determined from this array. Under the guidance of the alignment path, two DNA
DNA sequences
sequences are globally aligned by inserting blank spaces in the appropriate positions of these two
State selection array
Global alignment sequences. Experiments on aligning the DNA sequences from the publicly available databases of the NCBI
Similarity computation with this method are carried out in this paper and compared with the other two methods. Through
evaluating computation time and similarity, these experiments prove that the proposed one-
dimensional CNN model is effective, and the alignment algorithm based on a pairwise CNN of the
model is efficient, obtaining higher similarity with less computation time than the other two.
& 2014 Elsevier B.V. All rights reserved.
1. Introduction In general, the conventional two-dimensional CNN [1,2] model

can be mathematically expressed in the following differential
As one of the classic neural network models, the cellular neural forms with the following rules of changing cell sates and outputs
network (CNN) was originally proposed by Chua and Yang [1,2]. From at time t:
then on, CNNs have been attracting a great deal of enthusiasm of 8
researchers in the world. During the past decades, most research >
> dx ðtÞ 1
< C ij ¼ xij ðtÞ þ ∑ Akl ykl ðtÞ þ ∑ Bkl ukl þ I ij ;
work on CNNs focused on two main fields. One is on CNN theories, dt Rx kl A Nij ðrÞ kl A Nij ðrÞ ð1Þ
such as the stability analysis [3–7], the convergence analysis [8], the >
>
: y ðtÞ ¼ f ðx ðtÞÞ;
ij ij
chaos [9,10] and the improved CNN models [11–15]; another is on
CNN applications, such as the hardware implementations [16–18],
the image processing [19–21], the pattern recognition [22,23] and the where both the subscripts ‘ij’ and ‘kl’ represent the location
associative memories [24–26]. coordinates of the cells in the CNN, Nij(r) represents a coordinate
Like the other neural networks, CNNs have also all kinds of set, which consists of the position coordinates of all cells in the
possible models with different dimensions. The publications dur- neighbor domain of Cði; jÞ, xij(t) represents the state of cell Cði; jÞ at
ing the past decades clearly show that the research on the two- time t, yij(t) denotes the cell output of Cði; jÞ at time t, and ukl,
dimensional CNNs has attracted the overwhelming attentions of independent of the time t, is the external link input of the
the researchers. As a result, obviously inadequate research efforts neighbor cell Cðk; lÞ in the neighbor domain of Cði; jÞ. Moreover, C
have been made on either studying some other CNN models such and Rx are two constants, and Iij is an independent bias constant
as the one-dimensional CNN [27], the three-dimensional CNN and for cell Cði; jÞ. In addition, A and B are the feedback template and
the CNN even in higher dimensions or continuously exploring the the control template, respectively. Correspondingly, Akl and Bkl are
potential applications of these models. respectively the feedback coefficient and the control coefficient
with the coordinate ’kl’ in the templates A and B for cell Cði; jÞ.
In Eq. (1), the constraint conditions for the CNN include
n
Corresponding author. jxij ð0Þj r 1, juij j r 1, C 4 0, and Rx 4 0. Furthermore, the activation
http://dx.doi.org/10.1016/j.neucom.2014.08.023
0925-2312/& 2014 Elsevier B.V. All rights reserved.
506 L. Ji et al. / Neurocomputing 149 (2015) 505–514
function f(x) for the cell output is defined as the following form: cells in the master at all time steps. Then, an alignment path is
obtained by back-tracing the states in the array. Finally, under the
1
f ðxij ðtÞÞ ¼ ðxij ðtÞ þ1 xij ðtÞ 1Þ guidance of the alignment path, two sequences are globally aligned
2
8 by inserting some blank spaces at the appropriate positions. More-
> 1; xij ðtÞ Z 1; over, in order to evaluate this algorithm, some numerical experi-
<
¼ xij ðtÞ; ∣xij ðtÞ∣o 1; ð2Þ ments on the DNA sequences from the publicly available NCBI
>
: 1; xij ðtÞ r 1: databases have been conducted in this paper. Two typical criteria
including similarity and computation time are adopted to numeri-
Besides those classic two-dimensional CNNs, some one- cally evaluate the algorithm performances in these experiments.
dimensional CNN models have also been proposed, such as the The remainder of this paper is organized as follows. In Section 2,
discrete-time model of Manganaro et al. [14], and the two-layer a simplified one-dimensional CNN structure model is proposed.
model [27] by Takahashi et al. Moreover, the sufficient conditions Section 3 designs a one-dimensional pairwise CNN and develops an
of the one-dimensional CNN for detecting the connected compo- algorithm for the global alignment of two DNA sequences and
nent are also analyzed by Takahashi et al. [28]. These one- similarity computation using the designed pairwise CNN. In order
dimensional models exhibit many novel characteristics, and may to evaluate the model and the algorithms, Section 4 shows a few
be potentially applied to some engineering fields such as sequence numerical experiments on the NCBI databases, and analyzes the
alignment. However, almost no further research work was carried experiment results. Finally, Section 5 concludes this paper.
out since then. By now, the research on the one-dimensional CNNs
has almost fallen into silence completely.
During the past decades, as one of the most important research
problems in bioinformatics [29], the similarity computation of
DNA sequences has attracted many eye-sights of researchers in the 2. The proposed one-dimensional CNN model
world. As we know, DNA carrying the genetic secrets of livings
consists of only the four different nucleotide bases: A, C, G and T. The conventional two-dimensional CNN models usually consist of
It is highly believed that, by similarity comparison on the DNA a typical cell array with n rows and m columns, and each cell in the
sequences obtained from different biological samples, their biolo- network is mutually connected to its neighbor cells [1,2]. Differing
gical structure characteristics can be predicted in advance, their from the two-dimensional models, the proposed one-dimensional
evolutionary paths can be traced, and their biological functions model is only made up of a linear cell chain with m cells which are
can also been accurately identified. arranged in a single row. Thus, any cell C(i), iA f0; 1; …; m 1g, in the
In general, to complete the similarity computation of two DNA one-dimensional network has only one direct link to each of its two
sequences, a global alignment process is extremely critical and neighbor cells, Cði 1Þ and Cðiþ 1Þ. Fig. 1(a) and (b) exhibits the brief
even inevitably necessary. So far, many classic alignment algo- network structure of the model with a linear cell arrangement and
rithms have been developed to do the global alignment of two the single cell diagram, respectively.
DNA sequences, such as the dot pot [30], the dynamic program- As shown in Fig. 1, xi ; ui and yi represent the cell state signal,
ming [29,31], and the heuristic algorithms such as FASTA [32] and the link input signal, and the cell output signal of the given cell C
BLAST [33]. Moreover, as another classic problem, the alignment of (i), respectively. f ðxi Þ is a modulation function formulated for the
multiple DNA sequences is still a very difficult NP problem, and it cell output, which is usually defined as a piecewise-linear func-
has not been well-solved yet though there have been some tion. Moreover, A is a feedback template, which is usually used to
efficient algorithms proposed to do it, for instance, the iteration modulate all the feedback inputs from the output signals of cell C
algorithm in [34], the progressive alignment in [35], and the graph (i) and its neighbors. Similarly, B represents a control template,
theory approach in [36]. Not considering the global alignment of which is usually used to modulate all the link-input signals from
multiple DNA sequences, this paper only addresses the global cell C(i) and its neighbors. Furthermore, both C and Rx are constant
alignment problem of two DNA sequences. circuit parameters, which are usually determined empirically. Ii is
During the past decades, although CNNs have been widely used the cell bias constant (or threshold), and xi ð0Þ indicates the initial
in some application fields such as the image processing [19], they cell state of C(i) at the time step t ¼0.
have not been applied to the similarity computation of DNA Similar to the two-dimensional models, Eq. (1), the dynamic
sequences yet. This paper proposes a new one-dimensional CNN state of cell C(i) in the one-dimensional CNN can be briefly
model differing from those traditional two-dimensional models,
then designs a one-dimensional pairwise CNN using the proposed
model. Based on the one-dimensional pairwise CNN, this paper
also develops a global alignment and similarity computation
algorithm for two DNA sequences.
Significantly differing from the conventional two-dimensional
CNN models with row-column cell structure, the proposed one-
dimensional CNN model consists of only one cell chain, in which
each cell only has two neighbor cells at the most at any time: a
right one and a left one. Furthermore, the pairwise CNN consists of
two parallel one-dimensional CNNs, a master sub-network and a
slave sub-network. The master is fixed, while the slave is movable.
As running the pairwise CNN, the master sub-network always
keeps immobile and the slave regularly moves forward a fixed
distance (the critical distance) at each step along the direction
parallel to the master.
Using the pairwise one-dimensional CNN, an alignment method
is developed for two DNA sequences. First, it generates a state Fig. 1. Illustration of the proposed one-dimensional CNN model. (a) Linear chain
selection array according to the dynamic states and outputs of the network structure; (b) Single cell diagram.
L. Ji et al. / Neurocomputing 149 (2015) 505–514 507
described in the following forms: If let Δt ¼ 1, Eq. (5) is transformed into Eq. (6), which is
a discrete model of the one-dimensional CNN:
8
> dx ðtÞ 1 !
<C i ¼ xi ðtÞ þ ∑ Ak yk ðtÞ þ ∑ Bk uk þ I i 1 1
dt Rx k A Ni ðrÞ k A N i ðrÞ ð3Þ xi ðt þ1Þ ¼ xi ðtÞ þ xi ðtÞ þ ∑ Ak yk ðtÞ þ ∑ Bk uk þI i :
>
: y ðtÞ ¼ f ðx ðtÞ; I Þ C Rx k A Ni ðrÞ k A N i ðrÞ
i i i
ð6Þ
where t indicates the time step, and k represents the position In Eq. (6), t is used to represent the discrete time step,
coordinate of a cell in the one-dimensional CNN. Ni(r) represents t ¼ 0; 1; 2; 3; … . Moreover, all the other symbols or expressions
the coordinate set of the neighbor cells of C(i), and r is defined as in Eq. (6) have the same meanings as those defined in Eq. (3).
the critical distance, beyond which any cell cannot have any link to
its neighbors. Ak and Bk are two matrix coefficients from the
templates A and B, respectively. yk and uk are the outputs from the 3. The global alignment of two DNA sequences using the CNN
neighbor cell C(k) and the link input to C(k), respectively. More-
over, at time t, as shown in Eq. (4), the output modulation function Usually, a DNA sequence consists of only the four nucleotide
f ðÞ is redefined as bases: adenine, guanine, cytosine and thymine. In bioinformatics,
these four nucleotide bases are usually abridged as the corre-
sponding characters ‘A’, ‘G’, ‘C’ and ‘T’, respectively. In nature,
1 if xi ðtÞ ¼ I i ;
yi ðtÞ ¼ f ðxi ðtÞ; I i Þ ¼ ð4Þ a species has its own distinctive DNA sequences with an exclusive
0 else:
arrangement structure of nucleotide bases. It is the differences of
Differing from Eq. (2), this function is not a piecewise-linear the category, quantity and arrangement order of the nucleotide
function. In fact, from the output characteristic's point of view, bases in DNA sequences that makes the DNA sequences different
Eq. (4) is a typical two-valued function with a pulse output of the from each other. Moreover, the DNA sequence is also believed as
cell in the CNN. It means that if xi(t) reaches the threshold Ii at time the most direct and essential reason of the biological diversity in
t, the cell C(i) outputs 1, yi ðtÞ ¼ 1, otherwise no pulse outputs, nature.
yi ðtÞ ¼ 0. Moreover, of course, Eq. (3) still should follow both the Since a DNA sequence consists of only the four fundamental
two constraints jxi ð0Þj r1 and jui j r1. nucleotide bases, any given sequence can be equivalently trans-
As shown in Fig. 1, the proposed one-dimensional CNN consists formed into a pure character sequence of fA; G; C; Tg by converting
of a simple cell chain, so it has a much simpler cell neighbor all the nucleotide bases using the corresponding characters to
domain than a two-dimensional CNN. Fig. 2 gives out a typical them. So, the global alignment of two DNA sequences can be
structure of the cell neighbor domain in the proposed one- equivalently transformed into the global alignment of two char-
dimensional model. acter sequences. The following content will show how to globally
In the one-dimensional CNN above, cell C(i) has only the two align a pair of DNA sequences using the one-dimensional CNN
links to its two neighbor cells, Cði 1Þ and Cði þ 1Þ. So the neighbor model, and how to evaluate the alignment performance.
domain of C(i) consists of only the three cells Cði 1Þ, Cði þ 1Þ and
itself, namely N i ðrÞ ¼ fi 1; i; i þ1g. Furthermore, C(i) just receives 3.1. The designed one-dimensional pairwise CNN
the two feedback inputs Ai 1 yi 1 and Ai þ 1 yi þ 1 from the
outputs of its two neighbor cells Cði 1Þ and Cði þ 1Þ, respectively. As described in Section 2, the one-dimensional CNN model has
These two feedback inputs are modulated by the feedback tem- a linear cell arrangement structure, as shown in Fig. 1(a). Each cell
plate A. At the same time, C(i) also receives the two cell link inputs has only two link cells at the most, as illustrated in Fig. 1(b). These
Bi 1 ui 1 and Bi þ 1 ui þ 1 from its neighbors Cði 1Þ and Cði þ1Þ, characteristics of the model can be effectively utilized to develop
respectively. These two cell link inputs are modulated by the an alignment algorithm for two DNA sequences. In order to
control template B. In general, cell C(i) also receives the two extra develop such an algorithm, first a pairwise CNN is constructed
signals from itself, the self-feedback input Ai yi and the self-link using the one-dimensional model proposed in Section 2. The
input Bi ui . designed pairwise CNN is different from those traditional CNNs
For most of the time, Eq. (3) is usually used for depicting the because it only consists of two separate one-dimensional CNNs, as
state dynamics of the cells in a continuous-time CNN. However, in shown in Fig. 3.
some special applications such as digital image processing, a Fig. 3 exhibits the pairwise network at the initial state, t ¼0.
discrete-time CNN is more suitable. From Eq. (3), the discrete Structurally, this pairwise CNN consists of two separate one-
dynamics of the one-dimensional model can be approximately dimensional CNNs, the master sub-network CNN1 with m cells
deduced as follows: and the slave sub-network CNN2 with n cells. Furthermore, all cells
of CNN1 are fixed and represented as C 1 ðiÞ, i ¼ 0; 1; …; m 1, while
xi ðt þ ΔtÞ xi ðtÞ 1 the cells of CNN2 are movable and represented as C 2 ðjÞ,
C ¼ xi ðtÞ þ ∑ Ak yk ðtÞ þ ∑ Bk j ¼ 0; 1; …; n 1. From the point of view of the interrelation, as
Δt Rx k A N i ðrÞ k A N i ðrÞ
shown in Fig. 3, CNN2 is separate from CNN1 and also in parallel to
uk þ I i ð5Þ CNN1. Moreover, the distance between two adjacent cells in a
same CNN and the distance between CNN1 and CNN2 are fixed as
the critical distance r, same as that in Eq. (6). In addition, CNN1
Ai −1 Ai +1
CNN1 (master, fixed)
yi −1 yi +1 … … m-1
Ai × yi 0 1 i
i -2 i -1 i i +1 i +2 r r r
Bi × ui
ui −1 ui +1
n-1 … 1 0
Bi −1 Bi +1 CNN 2 (slave, movable)
Fig. 2. The typical neighbor domain structure of the cell C(i). Fig. 3. The structure of one-dimensional pairwise CNN, at t¼ 0.
always keeps fixed at any time step t, t ¼ 0; 1; …, while CNN2 the three ones: C 1 ðm 1Þ, C 1 ðm 2Þ and C 2 ðn 1Þ, and occurs at
regularly moves forward step by step along the direction parallel t ¼ m þ n 1.
to CNN1 and moves forward a fixed distance r at each step. Furthermore, in the right neighbor domain as shown in Fig. 5(a),
Moreover, CNN1 is defined as the work center (the master) and C 1 ð0Þ receives the two link inputs and the two feedback inputs from
the dynamics of CNN2 are ignored in the pairwise CNN. As a result, itself and C 1 ð1Þ, and the link input from C 2 ð0Þ, ignoring the feedback
CNN2 only plays a role of the link supplier (the slave) to CNN1 in from the output of C 2 ð0Þ. Similarly, as shown in Fig. 5(b), C 1 ðiÞ
the running of this pairwise CNN. simultaneously receives the three link inputs and the three feed-
As shown in Fig. 3, at time t¼0, no cell in CNN1 can link to CNN2 back inputs from itself, C 1 ði 1Þ and C 1 ði þ 1Þ, and the link input
because the distance between C 1 ðiÞ and C 2 ðjÞ is bigger than the from C 2 ðjÞ. As shown in Fig. 5(c), C 1 ðm 1Þ receives the two link
critical distance r. Since CNN2 moves forward r at each time step, at inputs and the two feedback inputs from itself and C 1 ðm 2Þ, and
t¼1 cell C 2 ð0Þ exactly moves to the vertical location under C 1 ð0Þ. the link input from C 2 ðn 1Þ.
From then on, at each time step t, CNN2 will continue to move For the pairwise CNN, since CNN1 works as the work center and
forward r. So at time t¼ m, C 2 ð0Þ exactly moves to the vertical CNN2 is regarded as the link supplier to CNN1, the state and output of
location under C 1 ðm 1Þ, and at t ¼ m þ n 1 C 2 ðn 1Þ arrives in CNN1 at any time step t 4 0 can be derived from Eqs. (3) and (4) as
the location right under C 1 ðm 1Þ. Finally, at t ¼ m þ n, CNN2 8 !
>
> 1 1
>
completely divorces from CNN1, and no cell in CNN1 can link to < x1;i ðt þ 1Þ ¼ 1 CRx x1;i ðtÞ þ C
> ∑ Ak yl;k ðtÞ þ
C l ðkÞ A N 1;i ðr;tÞ
∑ Bk ul;k þ I 1;i
C l ðkÞ A N1;i ðr;tÞ
CNN2, so the pairwise CNN stops at that time step. (
>
> 1 if x1;i ðtÞ ¼ I 1;i ;
In order to clearly show the movement process of CNN2 in the >
>
: y1;i ðtÞ ¼ f ðx1;i ðtÞ; I 1;i Þ ¼ 0 else;
running of the pairwise CNN, Fig. 4 illustrates the three critical
transition positions of CNN2 moving forward respectively at the ð7Þ
time steps t¼1, t¼m and t ¼ m þ n 1. In this figure, the arrows
where t is the time step, Cl(k) is the neighbor cell of C 1 ðiÞ, x1;i ðtÞ and
represent the links of a cell in CNN1 to its neighbor cells.
y1;i ðtÞ represent the state and the output of C 1 ðiÞ at time t,
Moreover, the cell of CNN1 can have the three typical neighbor
respectively. yl;k ðtÞ and ul;k (independent of t) are the output and
domains, as shown in Fig. 5. In this figure, (a) represents the
the link input to Cl(k) at time t, respectively. Moreover, deriving from
neighbor domain of C 1 ð0Þ, which consists of three cells: C 1 ð0Þ,
Eq. (4), f ðx1;i ðtÞ; I 1;i Þ is still a two-valued function with the pulse
C 1 ð1Þ and C 2 ð0Þ, and occurs at t¼ 1; (b) represents the neighbor
output characteristic, thereinto I 1;i is designed as the pulse threshold
domain of C 1 ðiÞ, 1 o i o m 1, which consists of the four cells: C 1 ðiÞ,
for deciding the output of C 1 ðiÞ. In application, ul;k usually comes
C 1 ði 1Þ, C 1 ði þ 1Þ and C 2 ðjÞ, and occurs at 1 o t o m þ n 1; and
from the initialization of the CNN, and always keeps its value
(c) represents the neighbor domain of C 1 ðm 1Þ, which consists of
without any change all the time. For instance, it is initialized using
the grey-scale values of the pixels in the image processing [23].
Similarly, in the alignment algorithm of this paper it will be
initialized using the numerical codes of the DNA sequences.
CNN1 0 1 … i … m-1 The remaining symbols in Eq. (7) have the same meanings as in
Eq. (3). Because CNN2 moves forward a fixed distance r at each
step, its neighbor domain will change with time t. So in Eq. (7),
n-1 … 1 0 CNN2 N1;i ðr; tÞ is still used to indicate the present neighbor domain of
C 1 ðiÞ at time t. Moreover, as derived from Eq. (3), besides
jx1;i ð0Þj r 1 and ju1;i j r 1, Eq. (7) should also constrain ju2;j j r 1,
considering the link input of CNN2 available to CNN1.
0 1 … i … m-1 CNN1
3.2. The alignment using the one-dimensional pairwise CNN
n-1 … 1 0 CNN2 In practical applications, two DNA sequences to be aligned

often have the different DNA sequence lengths. Therefore, a key
step for a global alignment process is to generate an alignment
path, then under the guidance of the path, blank spaces are
inserted into the two sequences step by step. Finally, the two
0 1 … i … m-1 CNN1
sequences with different lengths will be expanded into two new
sequences of the same lengths [37,38], which, of course, include all
those inserted blank spaces. By this alignment, similarity compu-
n-1 … 1 0 CNN2
tation between two DNA sequences can be conducted.
Fig. 4. Three transition positions of CNN2: (a) at t¼ 1; (b) at t¼ m; and (c) at The algorithm proposed in this paper consists of three main
t ¼ m þ n 1. steps, as detailed in Fig. 6. First, it initializes a one-dimensional
pairwise CNN using the two original DNA sequences. Second, it
generates a global alignment path based on the state selection
array of the CNN. Last, it conducts a global alignment under the
guidance of the path generated in the previous step. For clarity, the
u, y u, y u, y following example with two short DNA sequence fragments is
u, y u, y u, y u, y used to present the detailed computation steps of this algorithm.
0 1 i-1 i i +1 m-2 m-1
This example contains the two DNA sequence fragments,
u, y u, y u, y S1 ¼ fA; A; G; C; T; C; T; Gg and S2 ¼ fC; A; G; C; A; Tg. Their sequence
j n-1 lengths are LenðS1 Þ ¼ 8 and LenðS2 Þ ¼ 6, respectively. Use S1 ðiÞ,
0
i ¼ 0; 1; 2…, LenðS1 Þ 1, and S2 ðjÞ, j ¼ 0; 1; 2…, LenðS2 Þ 1, to respec-
Fig. 5. Three typical neighbor domains: (a) right neighbor domain; (b) double tively represent the ith and jth characters of S1 and S2. For the sake
neighbor domain; and (c) left neighbor domain. of numerical calculations, the five basic characters fn; A; C; G; Tg are
correspondingly quantified as f0; 1; 0:5; 0:5; 1g (which meets where u1;i and u2;j represent the link inputs of C 1 ðiÞ and C 2 ðjÞ,
the constraints ju1;i j r 1 and ju2;j j r1), where ‘n’ represents respectively. After the initialization, the pairwise CNN with the
a blank space. initial state (namely at t¼0) is obtained, as exhibited in Fig. 7.
Then, set these parameters C ¼1, Rx ¼1, I 1;i ¼ 0, the feedback
3.2.1. The initialization of the CNN template A ¼ f0; 0; 0; 0g, the control template B ¼ f0; 1; 0; 1g,
On these five numerical characters, the two DNA sequences S1 x1;i ð0Þ ¼ 0, x2;j ð0Þ ¼ 0, and the selection factor template is set as
and S2 are numerically transformed into: F ¼ fF 0 ; F 1 ; F 2 g ¼ f5; 3; 2g, where the three prime numbers must be
S01 ¼ f 1; 1; 0:5; 0:5; 1; 0:5; 1; 0:5g and different from each other and follow F 0 4 F 1 4F 2 .
S02 ¼ f 0:5; 1; 0:5; 0:5; 1; 1g.
First, CNN1 is initialized as the master network with m ¼ Len 3.2.2. The generation of the global alignment path
ðS1 Þ þ 1 ¼ 9 cells, and CNN2 is initialized as the slave network with First, the state selection function φðÞ is defined for the cell state
n ¼ LenðS2 Þ þ1 ¼ 7 cells. The link input u1;i of cell C 1 ðiÞ, i ¼ 0; 1; …; 8, C 1;i ðtÞ, 1 r t r 15, where φðÞ follows:
and u2;j of cell C 2 ðjÞ, j ¼ 0; 1; …; 6, are respectively initialized using (
x1;i ðtÞ if i ¼ 0 and t ¼ 1;
the numerical codes of S1 and S2, as follows: x1;i ðtÞ ¼ φðx1;i ðtÞÞ ¼ ð9Þ
8
maxðδ0 ; δ1 ; δ2 Þ else;
> (
>
> S01 ði 1Þ if 1 r i r m 1;
>
>
> u ¼ where the three parameters δ1, δ2 and δ3 follow
>
<
1;i
0 if i ¼ 0; 8
>
( 0 ð8Þ < δ0 ¼ x1;i 1 ðt 1Þ F 2 ;
>
> u ¼ S2 ðn 2 jÞ if 0 r j r n 2;
>
> δ1 ¼ x1;i 1 ðt 2Þ þ y1;i ðtÞ F 0 ð1 y1;i ðtÞÞ F 1 ; ð10Þ
>
> 2;j
ifj ¼ n 1; >
>
: 0 : δ ¼ x ðt 1Þ F :
2 1;i 2
Moreover, if the states of the cell selected for δ0, δ1 and δ2 do

not exist at some time steps, these states will be ignored. For
example, for cell C 1 ð0Þ at t ¼2, because i 1 o0, neither state
x1;i 1 ðt 2Þ nor x1;i 1 ðt 1Þ exists. Therefore at this time step, both
x1;i 1 ðt 2Þ and x1;i 1 ðt 1Þ are ignored. As a result, both δ0 and δ1,
which directly depend on x1;i 1 ðt 2Þ and x1;i 1 ðt 1Þ, are also
ignored.
Fig. 8 exhibits the full steps of running this pairwise CNN. In
these steps, as mentioned above, those cells without any link to
CNN2 have been ignored. According to Eqs. (7) and (9), the full
computation results of CNN1 at time step t, where 1 r t r 16, are
listed as
(1) at t¼ 1, y1;0 ð1Þ ¼ 1, x1;0 ð1Þ ¼ 0;

(2) at t¼ 2, y1;0 ð2Þ ¼ 0, x1;0 ð2Þ ¼ 2; y1;1 ð2Þ ¼ 0, x1;1 ð2Þ ¼ 2;
(3) at t¼3, y1;0 ð3Þ ¼ 0, x1;0 ð3Þ ¼ 4; y1;1 ð3Þ ¼ 0, x1;1 ð3Þ ¼ 3;
y1;2 ð3Þ ¼ 0, x1;2 ð3Þ ¼ 3;
(4) at t ¼4, y1;0 ð4Þ ¼ 0, x1;0 ð4Þ ¼ 6; y1;1 ð4Þ ¼ 1, x1;1 ð4Þ ¼ 3;
y1;2 ð4Þ ¼ 0, x1;2 ð4Þ ¼ 5; y1;3 ð4Þ ¼ 0, x1;3 ð4Þ ¼ 6;
(5) at t ¼5, y1;0 ð5Þ ¼ 0, x1;0 ð5Þ ¼ 8; y1;1 ð5Þ ¼ 0, x1;1 ð5Þ ¼ 1;
y1;2 ð5Þ ¼ 1, x1;2 ð5Þ ¼ 2; y1;3 ð5Þ ¼ 0, x1;3 ð5Þ ¼ 7; y1;3 ð5Þ ¼ 0,
x1;3 ð5Þ ¼ 8;
⋮
(14) at t¼ 14, y1;7 ð14Þ ¼ 1, x1;7 ð14Þ ¼ 12; y1;8 ð14Þ ¼ 0, x1;8 ð14Þ ¼ 3;
(15) at t¼ 15, y1;8 ð15Þ ¼ 0, x1;8 ð15Þ ¼ 10;
(16) at t¼ 16, state computation is completed.
According to the movement characteristics of CNN2, since the

last step in this example comes at t¼ 16, the CNN stops at that
time, as shown in step (16).
In general, for the designed pairwise CNN with m cells in CNN1
and n cells in CNN2, both y1;i ðtÞ and x1;i ðtÞ at each time step t are
computed. When the CNN stops at t ¼ m þ n, a series of the cell
Fig. 6. Flow chart of the alignment algorithm. selection states, x1;i ðtÞ, i ¼ 0; 1; …; m 1 and t ¼ 1; 2; …; m þ n 1,
CNN1
u= 0 -1 -1 0.5 -0.5 1 -0.5 1 0.5
0 1 2 3 4 5 6 7 8
CNN2
0 1 2 3 4 5 6
u= 1 -1 -0.5 0.5 -1 -0.5 0
Fig. 7. The initialization results of the pairwise CNN, at time t¼ 0.

u1 = 0 -1 -1 0.5 -0.5 1 -0.5 1 0.5

Fixed, CNN1 0 1 2 3 4 5 6 7 8
CNN2 0 1 2 3 4 5 6 t=1
u2 = 1 -1 -0.5 0.5 -1 -0.5 0
CNN2 0 1 2 3 4 5 6 t=2
u2 = 1 -1 -0.5 0.5 -1 -0.5 0
CNN2 0 1 2 3 4 5 6 t=3
u2 = 1 -1 -0.5 0.5 -1 -0.5 0
CNN2 0 1 2 3 4 5 6 t=4
u2 = 1 -1 -0.5 0.5 -1 -0.5 0
CNN2 0 1 2 3 4 5 6 t=5
u2 = 1 -1 -0.5 0.5 -1 -0.5 0
…
…
u1 = 0 -1 -1 0.5 -0.5 1 -0.5 1 0.5
Fixed, CNN1 t=15
0 1 2 3 4 5 6 7 8
t=15
CNN2 0 1 2 3 4 5 6
u2 = 1 -1 -0.5 0.5 -1 -0.5 0
Fig. 8. The cell state and output computation steps of the pairwise CNN, at t ¼ 1; 2; …; 15.
have been obtained step by step. After that, the selection array Update N with N 0 , and M with M 0 .
R nm with n rows and m columns using all x1;i ðtÞ, is constructed as 3. Obtain the path node set P.
2 3 4. The generation of alignment path is completed.
x1;0 ð1Þ x1;1 ð2Þ x1;2 ð3Þ ⋯ x1;m 2 ðm 1Þ x1;m 1 ðmÞ
6 x1;0 ð2Þ x1;1 ð3Þ x1;2 ð4Þ ⋯ x1;m 2 ðmÞ x1;m 1 ðmþ 1Þ 7
6 7
6 x1;0 ð3Þ x1;1 ð4Þ x1;2 ð5Þ ⋯ x1;m 2 ðm þ1Þ 7x1;m 1 ðmþ 2Þ
6
6
7
7
As shown above, by back-tracing in the selection array R nm ,
nm
R ¼ 6 x1;0 ð4Þ x1;1 ð5Þ x1;2 ð6Þ ⋯ x1;m 2 ðm þ2Þ 7x1;m 1 ðmþ 3Þ
we can obtain the global alignment path:
6 7
6 ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ 7
6
4 x1;0 ðn 1Þ x1;1 ðnÞ
7
x1;2 ðn þ1Þ ⋯ x1;m 2 ðm þn 3Þ x1;m 1 ðmþ n 2Þ 5
P ¼ fð0; 0Þ; ð1; 0Þ; ð2; 1Þ; ð2; 2Þ; ð3; 3Þ; ð4; 4Þ; ð5; 4Þ; ð6; 5Þ; ð6; 6Þ; ð6; 7Þ;
x1;0 ðnÞ x1;1 ðn þ 1Þ x1;2 ðn þ2Þ ⋯ x1;m 2 ðm þn 2Þ x1;m 1 ðmþ n 1Þ ð6; 8Þg.
By now, the key global alignment path P for the two DNA
ð11Þ
sequences has been achieved.
Therefore, based on Eq. (11), all these cell selection states x1;i ðtÞ,
where i ¼ 0; 1; …; 8 and t ¼ 1; 2; …; 15, obtained in the previous
3.2.3. The global alignment of two DNA sequences
step are used to construct the selection array R 79 for CNN1 as
2 3 This part will exhibit how to conduct the global alignment of
0 2 4 6 8 10 12 14 16 two original DNA sequences under the guidance of the global
6 2 3 5 7 3 5 5 7 9 7
6 7 alignment path node set P.
6 7
6 4 3 2 0 2 4 6 8 10 7 First, define P(i), i ¼ 0; 1; 2; …, to represent the ith element of
6 7
79 6 7 the coordinate set P (namely, the path node set), and use lP to
R ¼ 6 6 1 0 7 5 3 1 1 3 7
6 7 represent the size of P. The full alignment process is described as
6 8 1 2 5 12 10 8 6 4 7
6 7 follows:
6 7
4 10 3 4 3 10 9 7 5 3 5
12 5 2 1 8 15 13 12 10 1. Let integer I ¼0, PðIÞ ¼ ðV I ; H I Þ.
After having obtained the selection array, the global alignment 2. While I olP 1, do the following loops:
path is to be determined. Define a node set P as the alignment Compare PðI þ1Þ with P(I).
path, where the initial P ¼ ∅. Moreover, use Rðn0 ; m0 Þ to represent case (i): if V I þ 1 V I ¼ 1 and H I þ 1 H I ¼ 0, a ‘n’ is inserted
the element at coordinate ðn0 ; m0 Þ of the array Rnm . By back- into S1 at the location of S1 ðIÞ;
tracing from the last state x1;m 1 ðm þn 1Þ to the first one x1;0 ð1Þ case (ii): if V I þ 1 V I ¼ 1 and H I þ 1 H I ¼ 1, no action;
of the array R nm step by step, the global alignment path can be case (iii): if V I þ 1 V I ¼ 0 and H I þ 1 H I ¼ 1, a ‘n’ is inserted
determined as the following steps: into S2 at the location of S2 ðIÞ;
case (iv): else, no action.
1. Let P ¼ fðn 1; m 1Þg, and N ¼ n 1; M ¼ m 1. Update I with I ¼ I þ 1;
2. While N a 0 and M a 0, do the following loops: 3. The global alignment is completed.
Find the maximum of RðN; M 1Þ, RðN 1; M 1Þ and
RðN 1; MÞ. If any one of the three does not exist, ignore it. Using the alignment process described above, the two DNA
Save coordinates of the maximum into ðN 0 ; M 0 Þ. sequences, S1 and S2, in this example have been globally aligned.
P ¼ fðN 0 ; M 0 Þg⋃P. Fig. 9 shows the final alignment results. In this figure, the vertical
Fig. 9. An example of the global alignment: (a) two original sequences; (b) two sequences with alignment labels.
Table 1
Time consumption of the CNN algorithm to complete the alignment (ms).
S2 S1
S62051 NM_008134 NM_010422 NM_000405 NG_009059 NG_009301

Len ¼ 226 Len ¼ 625 Len ¼ 1922 Len ¼ 3690 Len ¼ 24 343 Len ¼42 028
S62051 72.23 76.15 80.12 140.27 446.75 731.19

Len ¼226
NM_008134 76.15 78.63 91.46 152.69 466.78 786.61
Len ¼625
NM_010422 80.12 91.46 124.66 225.29 531.13 852.82
Len ¼1922
NM_000405 140.27 152.69 225.29 280.12 615.75 998.58
Len ¼3690
NG_009059 446.75 466.78 531.13 615.75 N/A N/A
Len ¼24 343
NG_009301 731.19 786.61 852.82 998.58 N/A N/A
Len ¼42 028
line ‘j’ between two nucleotide bases indicates one labelled couple t ¼ m þ n þ1 because of the parallel computation of the neural
of the nucleotide bases which have been aligned exactly. More- networks. So the time complexity O(T) of the algorithm in this paper
over, ‘n’ still represents a blank space which has been inserted into is OðTÞ ¼ Oðm; nÞ Oðm þ n þ 1Þ, which is approximately in propor-
these two sequences. tion to the total length of the two DNA sequences to be aligned.
3.2.4. The numerical evaluation on the alignment

For the DNA sequences in the example above, both the lengths 4. Experimental evaluations
LenðS1 Þ and LenðS2 Þ are quite short, so it is easy to evaluate the
alignment results by directly checking these sequences with the In order to evaluate the proposed method and compare it with
alignment labels. However, a majority of DNA sequences are very the others, the following experiments have been done. In these
long, so checking the sequence alignment labels is no longer an experiments, all these alignment approaches are simulated using
effective method to evaluate the performances of the alignment MATLAB software (version: R2011a), and tested on the same
algorithm. computer with 3.2 GHz dual-core CPU, 2 GB RAM and Windows
Another important and frequently used method is to calculate XP operating system. All the DNA sequences in these experiments
the global similarity between two DNA sequences [29,37,38]. In are taken randomly from the publicly available INSDC Tay-Sachs
general, on a same sequence pair, a good alignment algorithm disease section of the NCBI, U.S.A. This section contains forty-seven
often generates a high numerical similarity, while a bad one often different nucleotide sequences.
results in a low similarity. In this paper, we define SCðS1 ; S2 Þ as the In general, both the global similarity and the computation time
numerical evaluation criterion of the global similarity between S1 cost are often taken as the criteria to evaluate a DNA alignment
and S2: algorithm [29,37,38]. This paper also uses these criteria to assess
the outperformance of the proposed CNN algorithm over the
2 N Label others.
SCðS1 ; S2 Þ ¼ 100% ð12Þ
LenðS1 Þ þ LenðS2 Þ
where NLabel indicates the number of these nucleotide bases with 4.1. Algorithm testing results
alignment labels, LenðS1 Þ and LenðS2 Þ still represent the lengths of
the original sequences S1 and S2, respectively. In these experiments on the sequences from the Tay-Sachs
Furthermore, for the example shown in Fig. 9, it shows that section, the computation time and similarity values with this
LenðS1 Þ ¼ 8, LenðS2 Þ ¼ 6 and N label ¼ 4. Therefore, SCðS1 ; S2 Þ ¼ developed CNN algorithm are summarized in Tables 1 and 2,
ð2 4Þ=ð8 þ 6Þ 100% 57:14%, according to Eq. (12). respectively. In these two tables, Len represents the length of a
Generally, those classic alignment algorithms for DNA sequences, sequence, and ‘N/A’ indicates that the corresponding alignment
such as the dot pot [30], the dynamic programming [29,31], the has not been successfully executed due to the insufficient memory
heuristic algorithms such as FASTA [32] and BLAST [33], could of the computer.
achieve good alignment results but often have high time complexity. Table 1 clearly shows that the computation time of this CNN
For these classic algorithms mentioned above, their time complexity algorithm gradually increases with the increase of the total length
is represented as OðTÞ ¼ Oðm nÞ, which implies that the computa- (approximately, its computation time follows Oðm þ nÞ). The longer
tion time cost is in proportion to the product of two sequence the total length of the two sequences is, the more the computation
lengths. However, the alignment algorithm using a one-dimensional time it takes. When the total length of the two sequences reached
pairwise CNN can generate the selection array of the cells just at time to the maximum which the experiment computer can handle, this
Table 2
Global similarity obtained with the CNN algorithm (%).
S2 S1
S62051 NM_008134 NM_010422 NM_000405 NG_009059 NG_009301

Len ¼226 Len ¼ 625 Len ¼1922 Len ¼ 3690 Len ¼24 343 Len ¼ 42 028
S62051 100 39.95 17.04 8.53 1.40 0.82

Len ¼ 226
NM_008134 39.95 100 35.81 20.35 3.52 2.25
Len ¼ 625
NM_010422 17.04 35.81 100 50.11 10.57 6.50
Len ¼ 1922
NM_000405 8.53 21.88 50.11 100 25.15 11.68
Len ¼ 3690
NG_009059 1.40 3.52 10.57 25.15 N/A N/A
Len ¼ 24 343
NG_009301 0.82 2.25 6.50 11.68 N/A N/A
Len ¼ 42 028
Table 3
Computation time comparisons of the three algorithms (ms).
Methods Pairs
S62051 NM_308134 NG_009059 NG_009059 NM_010422 NG_009301

NM_008134 NM_000405 NM_008134 NM_010422 NG_009301 NM_000405
MILP 80.91 237.88 595.71 735.91 1223.67 2212.34

SPA 96.42 262.56 756.45 920.13 1560.22 3043.74
The proposed 76.15 152.69 466.78 531.13 852.82 998.58
CNN algorithm failed to align the sequences. This does not mean 3500
that this CNN algorithm does not work for the long sequences but
just implies that a computer with a sufficient memory can 3000 MILP
The algorithm computation time, ms
facilitate this new CNN algorithm especially in case of aligning SPA

the two long DNA sequences. the proposed
2500
Table 2 exhibits the experiment results in terms of the global
similarity defined by Eq. (12). These experiment results show that
2000
this CNN algorithm always obtains the precise similarity values,
such as 35.81% between NM_008134 and NM_010422, and always
ensures the global similarity of 100% between the sequence and 1500
itself (e.g. S62051 vs. itself). This algorithm exactly behaves as
what are described before. 1000
4.2. Algorithm comparisons

500
In order to compare this CNN algorithm with the others, more

0
experiments with the DNA sequences from NCBI have been 102 103 104 105
executed and compared to each other. These two algorithms used
The sum of two sequence lengths, m+n
for comparisons include the mixed-integer linear optimization
(abbreviated as MILP, 2007) algorithm [37] and the super pairwise Fig. 10. The computation time curves of these three algorithms on different
sequence lengths.
alignment (abbreviated as SPA, 2002) algorithm [38]. Just as
mentioned in the publications [29,37,38], the computation time
and the numerical similarity are still taken as the two criteria to
compare these algorithms. total length is 45718 (NG_009301 vs. NM_000405). Note that no
In order to test and compare these three algorithms, a number sequence with more than 50 000 nucleotide bases has been tested
of nucleotide sequence pairs with various lengths are randomly in this paper due to the insufficient computer memory.
chosen from the NCBI databases and tested one by one using these In terms of the computation time, as shown in Table 3, the
three alignment algorithms. Table 3 summarizes and compares algorithm proposed in this paper is the fastest one for both the
their experiment results. shortest sequence pair S62051 vs. NM_008314 and the longest
The unit for the computation time in Table 3 is millisecond sequence pair NG_009301 vs. NM_000405. It took only 71.16 ms
(ms). These sequence pairs in this table cover the sequences with the proposed CNN algorithm to complete the alignment of
containing 400–50 000 nucleotide bases. The shortest total length S62051 vs. NM_008314. It is 20.27 ms faster than SPA, and 4.76 ms
of two sequences is 851 (S62051 vs. NM_008134), and the longest faster than MILP. For those long sequence pairs, such as NG_009301
Table 4
The Global similarity comparisons (%).
Methods Pairs
S62051 NM_008134 NG_009059 NG_009059 NM_010422 NG_009301

NM_008134 NM_000405 NM_008134 NM_010422 NG_009301 NM_000405
MILP 39.95 21.78 3.49 10.55 6.49 11.67

SPA 39.95 21.74 3.48 10.54 6.48 11.65
The proposed 39.95 21.88 3.52 10.57 6.50 11.68
vs. NM_000405, the proposed algorithm took 998.58 ms to complete a little bit more alignment pairs of the nucleotide bases than the
the global alignment, and it is approximately 2.2 times faster than other two. Table 4 also shows that the proposed CNN algorithm
MILP, approximately 3.0 times faster than SPA, as shown in Table 3. can often obtain the higher similarity than the other two espe-
Table 3 also shows that the new algorithm based on the cially on these long DNA sequence pairs.
pairwise CNN is the fastest one in terms of the computation time.
Although the proposed one is only slightly better than the other
two for the short DNA sequences, for the longer ones (such as, 5. Conclusions
more than 10 000 nucleotides), the proposed one is significantly
better than the other two. When the total length of the two The cellular neural network is one of the classic neural net-
sequences is very large (e.g. NG_009301 vs. NM_000405), the works. During the past decades, the two-dimensional CNNs have
proposed algorithm is approximately 55% faster than MILP, and attracted the most interests from the researchers but the one-
approximately 68% faster than SPA. dimensional models of the CNN have not yet been paid enough
These experiments and these comparisons of the computation attention too. In order to exploit a new model for the CNN and
time prove that the proposed algorithm performs much better develop its applications, this paper proposes a simplified one-
than both MILP and SPA. Especially, the proposed algorithm will dimensional CNN model, which consists of a linear cell chain. Then
over-perform more significantly over the other two as aligning the a pairwise CNN is constructed using two one-dimensional CNNs,
long DNA sequences. including an immovable one and a movable one. Based on the
Moreover, in order to visually demonstrate the advantages of pairwise CNN, the global alignment algorithm for two DNA
the proposed algorithm in terms of the time performance, another sequences is developed in this paper. The simulation and compar-
group of experiments have been done to relate the time and the ison experiments are carried out and evaluated with the DNA
total length of two sequences. In these experiments, a group of sequences from the publicly available databases of the NCBI. By
sequence pairs are aligned using the three algorithms described comparing with the other two algorithms, the developed one
above. All these experiment results are plotted in a Cartesian performs better in the global alignment with the significantly less
coordinate system, as shown in Fig. 10. The three curves illustrate execution time. These experiments in this paper prove that the
the relationships between the computation time and the total proposed model is practically applicable and the algorithm with
length of the sequence pair for these three algorithms. a pairwise one-dimensional CNN is efficient.
In Fig. 10, the horizontal-axis is corresponding to the total
lengths of these sequence pairs, and the vertical-axis is corre-
sponding to the computation time taken by these three algorithms Acknowledgments
to align these sequence pairs. It clearly shows that these three
algorithms have a quite close performance in terms of the time This work is supported by the National Science Foundation of
when the total length, m þ n is not very large. With the increase of China (NSFC) under Grants 61175061 and 61273308, and the
the total length, except the CNN curve, both SPA curve and MILP Fundamental Research Funds for the Central Universities under
curve rise up steeply. Although the curve of the proposed algo- Grant ZYGX2013J076. Authors would also like to thank NCBI for
rithm also rises up with the increase of the total length as well, it the experiment data of DNA sequences, and thank all editors/
rises up much slower than the other two. Same as Table 3, Fig. 10 reviewers for their comments and helps on manuscript
also indicates that the proposed one needs the less computation improvement.
time to align these sequence pairs, and the larger the m þ n is, the
more the superior it is.
Just as mentioned before, besides the computation time, the References
global similarity is also of importance to distinct these three
algorithms. The experiments on the global similarity of the [1] L.O. Chua, L. Yang, Cellular neural networks: theory, IEEE Trans. Circuits Syst.
35 (1988) 1257–1272.
following DNA sequence pairs are also conducted, where the [2] L.O. Chua, L. Yang, Cellular neural networks: applications, IEEE Trans. Circuits
numerical global similarity is calculated using Eq. (12), as exhib- Syst. 35 (1988) 1273–1290.
ited in Table 4. [3] Z. Liu, H. Zhang, Z. Wang, Novel stability criterions of a new fuzzy cellular
neural networks with time-varying delays, Neurocomputing 72 (2009)
Consistent with Table 3, Table 4 shows that each of the three 1056–1064.
algorithms can achieve a good similarity for the DNA sequence [4] S. Long, D. Xu, Stability analysis of stochastic fuzzy cellular neural networks
pair. For the short pair such as S62051 vs. NM_008134, the three with time-varying delays, Neurocomputing 74 (2011) 2385–2391.
[5] L. Wang, T. Chen, Complete stability of cellular neural networks with
algorithms obtain an identical similarity of 39:95%. However, for
unbounded time-varying delays, Neural Netw. 36 (2012) 11–17.
the longer DNA sequence pairs such as NM_010422 vs. [6] W.H. Chen, W.X. Zheng, A new method for complete stability analysis of
NG_009301, the global similarity values with these three algo- cellular neural networks with time delay, IEEE Trans. Neural Netw. 7 (2010)
rithm are slightly different. Usually, the proposed new CNN 1126–1139.
[7] X. Song, X. Xin, W. Huang, Exponential stability of delayed and impulsive
algorithm results in a little bit larger similarity value than the cellular neural networks with partially Lipschitz continuous activation func-
other two. The reason is that this CNN algorithm always generates tions, Neural Netw. 29–30 (2012) 80–90.
[8] M. Di Marco, M. Forti, M. Grazzini, L. Pancioni, Convergence of a class of [35] D.F. Feng, R.F. Doolittle, Progressive sequence alignment as a prerequisite to
cooperative standard cellular neural network arrays, IEEE Trans. Circuits Syst. correct phylogenetic trees, J. Mol. Evol. 25 (1987) 351–360.
I: Regul. Pap. 4 (2012) 772–783. [36] S.A.M.A. Junid, N.M. Tahir, Z.A. Majid, M.F.M. Idros, Potential of graph theory
[9] X. Huang, Z. Zhao, Z. Wang, Y. Li, Chaos and hyperchaos in fractional-order algorithm approach for DNA sequence alignment and comparison, in: Proceedings
cellular neural networks, Neurocomputing 94 (2012) 13–21. of the Third International Conference on Intelligent Systems, Modelling and
[10] L. Wang, W. Liu, H. Shi, J.M. Zurada, Cellular neural networks with transient Simulation (ISMS 2012), vol. 2, 2012, pp. 187–190.
chaos, IEEE Trans. Circuits Syst. II: Express Briefs 5 (2007) 440–444. [37] S.R. Mcallister, R. Rajgaria, C.A. Floundas, Global pairwise sequence alignment
[11] C.L. Chang, K.W. Fan, I.F. Chung, C.T. Lin, A recurrent fuzzy coupled cellular through mixed-integer linear programming: a template-free approach, Optim.
neural network system with automatic structure and template learning, IEEE Methods Softw. 1 (2007) 127–144.
Trans. Circuits Syst. II: Express Briefs 8 (2004) 602–606. [38] S. Shen, J. Yang, A. Yao, P. Hwang, Super pairwise alignment (SPA): an efficient
[12] C.H. Huang, C.T. Lin, Bio-inspired computer fovea model based on hexagonal- approach to global alignment for homologous sequences, J. Comput. Biol. 3
type cellular neural network, IEEE Trans. Circuits Syst. I: Regul. Pap. 1 (2007) (2002) 477–486.
35–47.
[13] L. Zhang, Z. Yi, S.L. Zhang, P.A. Heng, Activity invariant sets and exponentially
stable attractors of linear threshold discrete-time recurrent neural networks,
IEEE Trans. Autom. Control 6 (2009) 1341–1347. Luping Ji received his B.S. degree in Mechanical &
[14] G. Manganaro, J.P. de Gyvez, One-dimensional discrete-time CNN with multi- Electronic Engineering from Beijing Institute of Technol-
plexed template-hardware, IEEE Trans. Circuits Syst. I: Fundam. Theory Appl. 5 ogy, Beijing, PR China, 1999. Then, he received his M.S.
(2000) 764–769. and Ph.D. degrees respectively in Computer Application
[15] S.N.T. Polat, O. Yavuz, V. Tavsanoglu, Efficient Simulation of Time-Derivative & Technology, 2005 and in Computer Software & Theory,
Cellular Neural Networks, IEEE Trans. Circuits Syst. I: Regul. Pap. 11 (2012) 2008 from the University of Electronic Science and
2638–2645. Technology of China, Chengdu, PR China. Currently, he
[16] J. Martłnez, J. Garrigs, J. Toledo, J. Ferr¢ndez, An efficient and expandable is working as an associate professor in the School of
hardware implementation of multilayer cellular neural networks, Neurocom- Computer Science and Engineering, University of Elec-
puting 114 (2013) 54–62. tronic Science and Technology of China, Chengdu,
[17] C.Y. Wu, S.H. Chen, The design and analysis of a CMOS low-power large- PR China. His current research interests include neural
neighborhood CNN with propagating connections, IEEE Trans. Circuits Syst. I: networks and pattern recognition.
Regul. Pap. 2 (2009) 440–452.
[18] E. Cesur, N. Yildiz, V. Tavsanoglu, On an improved FPGA implementation of
CNN-based Gabor-type filters, IEEE Trans. Circuits Syst. II: Express Briefs 11
(2012) 815–819. Xiaorong Pu is a professor at the University of Electro-
[19] D. Amanatidis, D. Tsaptsinos, P. Giaccone, G. Jones, Optimizing motion and nic Science and Technology of China, PR China. She has
colour segmented images with neural networks, Neurocomputing 62 (2004) been a visiting scholar at Illinois Institute of Technology
197–223. (IIT), USA, 2008, and at the University of Manchester
[20] Y.W. Shou, C.T. Lin, Image descreening by GA-CNN-based texture classification, Institute of Science and Technology (UMIST), Britain
IEEE Trans. Circuits Syst. I: Regul. Pap. 11 (2004) 2287–2299. 2004. Her research interests include computer vision,
[21] C.T. Lin, C.H. Huang, S.A. Chen, CNN-based hybrid-order texture segregation as biometrics, neural networks and operating system. She
early vision processing and its implementation on CNN-UM, IEEE Trans. has 8 books and more than 30 academic papers
Circuits Syst. I: Regul. Pap. 10 (2007) 2277–2287. published so far.
[22] Q. Gao, G.S. Moschytz, Fingerprint feature matching using CNNs, in: Proceed-
ings of the 2004 International Symposium on Circuits and Systems, vol. 3,
2004, pp. 73–76.
[23] M. Milanova, U. Bker, Object recognition in image sequences with cellular
neural networks, Neurocomputing 31 (2000) 125–141.
[24] A. Delbem, L. Correa, L. Zhao, Design of associative memories using cellular
Hong Qu received the B.S., M.S. and Ph.D. degrees in
neural networks, Neurocomputing 72 (2009) 2180–2188.
Computer Science and Engineering from the University
[25] Q. Han, X. Liao, T. Huang, et al., Analysis and design of associative memories
of Electronic Science and Technology of China,
based on stability of cellular neural networks, Neurocomputing 97 (2012)
Chengdu, China, in 2000, 2003 and 2006, respectively.
192–200.
From March 2007 to February 2008, he was a post-
[26] Z. Zeng, J. Wang, Analysis and design of associative memories based on
doctoral fellow at the Advanced Robotics and Intelli-
recurrent neural networks with linear saturation activation functions and
gent Systems Lab, School of Engineering, University of
time-varying delays, Neural Comput. 8 (2007) 2149–2182.
Guelph, Guelph, ON, Canada. Currently, he is currently a
[27] N. Takahashi, M. Nagayoshi, S. Kawabata, T. Nishi, Stable patterns realized by
professor in the School of Computer Science and
a class of one-dimensional two-layer CNNs, IEEE Trans. Circuits Syst. II: Regul.
Engineering, University of Electronic Science and Tech-
Pap. 11 (2008) 3607–3620.
nology of China. His current research interests include
[28] N. Takahashi, et al., Sufficient conditions for one-dimensional cellular neural
neural networks, robot, neurodynamics, intelligent
networks to perform connected component detection, Nonlinear Anal.: Real
computation, and optimization.
World Appl. 11 (2010) 4202–4213.
[29] S. Needleman, C. Wunsch, A general method applicable to the search for
similarities in the amino acid sequence of two sequences, J. Mol. Biol. 3 (1970)
443–453.
Guisong Liu received his B.S. degree in Mechanics from
[30] A.J. Gibbs, G.A. Mcintyre, A. George, The diagram, a method for comparing
Xi'an Jiao Tong University, Xi'an, China, in 1995, the M.S.
sequences, its use with amino acid and nucleotide sequences, Eur. J. Biochem.
degree in Automatics and the Ph.D. degree in Computer
1 (1970) 1–11.
Science both from the University of Electronic Science
[31] T.F. Smith, M.S. Watermann, Identification of common molecular subsequence,
and Technology of China (UESTC, Chengdu, China), in
J. Mol. Biol. 147 (1981) 196–197.
2000 and 2007, respectively. Now he is an associate
[32] D.J. Lipman, W.R. Pearson, Rapid and sensitive protein similarity searchers,
professor in the Computational Intelligence Laboratory,
Science 4693 (1985) 1435–1441.
the School of Computer Science and Engineering, UESTC.
[33] S.F. Altschul, T.L. Madden, A.A. Schaffer, et al., Gapped BLAST and PSI-BLAST: a
His research interests include computational intelli-
new generation of protein database search programs, Nucleic Acids Res. 7
gence, pattern recognition and machine learning.
(1997) 3389–3402.
[34] F. Corpet, Multiple sequence alignment with hierarchical clustering, Nucleic
Acids Res. 22 (1988) 10881–10890.

Ji 等。 - 2015 - One-dimensional pairwise CNN for the global alignm

Uploaded by

Copyright:

Available Formats

You might also like

Ji 等。 - 2015 - One-dimensional pairwise CNN for the global alignm

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ji 等。 - 2015 - One-dimensional pairwise CNN for the global alignm

Uploaded by

Copyright:

Available Formats

Neurocomputing 149 (2015) 505–514

Contents lists available at ScienceDirect

One-dimensional pairwise CNN for the global alignment of two

1. Introduction In general, the conventional two-dimensional CNN [1,2] model

n-1 … 1 0 CNN2 In practical applications, two DNA sequences to be aligned

Moreover, if the states of the cell selected for δ0, δ1 and δ2 do

(1) at t¼ 1, y1;0 ð1Þ ¼ 1, x1;0 ð1Þ ¼ 0;

According to the movement characteristics of CNN2, since the

Fig. 7. The initialization results of the pairwise CNN, at time t¼ 0.

u1 = 0 -1 -1 0.5 -0.5 1 -0.5 1 0.5

u2 = 1 -1 -0.5 0.5 -1 -0.5 0

u2 = 1 -1 -0.5 0.5 -1 -0.5 0

S62051 NM_008134 NM_010422 NM_000405 NG_009059 NG_009301

S62051 72.23 76.15 80.12 140.27 446.75 731.19

3.2.4. The numerical evaluation on the alignment

S62051 NM_008134 NM_010422 NM_000405 NG_009059 NG_009301

S62051 100 39.95 17.04 8.53 1.40 0.82

S62051 NM_308134 NG_009059 NG_009059 NM_010422 NG_009301

MILP 80.91 237.88 595.71 735.91 1223.67 2212.34

facilitate this new CNN algorithm especially in case of aligning SPA

4.2. Algorithm comparisons

In order to compare this CNN algorithm with the others, more

S62051 NM_008134 NG_009059 NG_009059 NM_010422 NG_009301

MILP 39.95 21.78 3.49 10.55 6.49 11.67

You might also like