Distributed Sequences and Search Process

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Distributed Sequences and Search Process

Dragana Baji
Department of Communications and Signal Processing
University of Novi Sad
21000 Novi Sad, Serbia and Montenegro
LMCDRA@eunet.yu

Jakov Stojanovi
Network Planning Centar
Mobtel
11 070 Novi Beograd, Serbia and Montenegro
jakov.stojanovic@mobtel.co.yu


Abstract:An analytical approach to a search process for a set
of M fixed sequences in random data covers the problems of a
search for the sequences with errors, as well as of a search for
the distributed sequences. It is of greater practical interest than
the formerly analysed case of search for a single sequence. This
paper derives statistical parameters (probability distribution
function, expected value, variance) of this process, introducing
a new term - cross-bifix. The derived formulae are applied to
the case of search for distributed sequences with errors.
Keywords-expected duration of a search; distributed
sequences; bifix; cross-bifix
I. INTRODUCTION
Fast and reliable frame synchronization acquisition is a
crucial requirement to establish a connection between a
transmitter and a receiver. A common method to obtain this
is to insert periodically a synchronization sequence (frame-
alignment word, FAW) in data sequence, thus indicating the
boundaries between the frames of data. Then a correlation
technique (or a window-sliding search) is used to find the
position of FAW in the received signal.
Analytical approaches to synchronization acquisition
process and methods for construction of sequences with the
best aperiodic autocorrelation properties (alternatively, with
minimal total simulation probability) have been the subject
of numerous analyses over the past decades, e.g. [1-9]. Only
few of them investigated the relationship between frame
length and synchronization sequence length&structure [10,
11], showing that, if frame length exceeds turning point,
other sequence structures might outperform the optimised
ones considering the acquisition time. This is a consequence
of the known result [12]: duration of search for a fixed
sequence in random data is the shortest for bifix-free
patterns, so they are the most likely to be simulated, thus
prolonging the acquisition. This introduces the necessity to
analyse statistically a process interesting both to
mathematicians and engineers search-for-a-fixed-sequence
process. Purely mathematical achievements (at binary level)
are recently summarized in [13], but older and more suitable
engineering approach [12] introduced already mentioned
term bifix a subsequence that is both a prefix and a suffix
of an observed synchronization sequence. Based upon the
bifix analysis, probability density functions (p.d.f) of search
process in random data [14] and in frame (where overlap
region has to be taken into account) [15] were derived. These
formulae were necessary mathematical prerequisite for
analytical expressions that verify simulation-study results
from [11, 12].
However, analytical derivation [14, 15] of simulation
curves from [11, 12] had no further application, since the
exact matching of the synchronization sequence at the
receiver end has been long abandoned. Current techniques
perform search for sequences within the specified distance of
the inserted one. At symbol level, this is equivalent to a
search for a set of sequences; therefore, new analytical tools
for multiple search process have to be developed. These
analytical tools and their application to distributed sequences
are the topic of this paper.
The next section gives a brief explanation of the derived
formulae for statistical parameters of multiple search process
(abstract in [16]). The comparison of the best known
distributed sequences [17] and contiguous synchronization
sequences of the same redundancy are performed within the
third section and followed by the concluding remarks.
II. SEARCH FOR M SEQUENCES IN RANDOM DATA
The goal of the multiple search processes is to find any
sequence out of M known sequences while sliding along
random equiprobable data stream. Each sequence consists of
N L-ary symbols. For example, search for binary (L=2)
distributed sequence 0x0x1 means search for M=4 sequences
00001, 00011, 01001 and 01011.
The search starts (the first test, k=1) when the first N
received data symbols are compared to each one of M
sequences. If the test fails (the received N data symbols
match to none of M sequences), test position shifts or
slides and symbols 2 to N+1 are compared to M sequences,
etc. Search stops if the k
th
test succeeds, meaning that
received data stream of length k+N-1 satisfies extended
matching condition (e.m.c): last N symbols of data stream
equal to one of M sequences while none of these M
sequences was found at any of the previous test positions.
Probability of this event (i.e. probability that number of tests
is exactly k or, rather, probability that one of the M
sequences is found at the k
th
test - p.d.f. of the process - is:
L p p b p a b p a k
N k
k
N k
k
/ 1 , , } Pr{
1 1

+
(1)
0-7803-8533-0/04/$20.00 (c) 2004 IEEE IEEE Communications Society
514

where
1 +N k
p is the probability of a data stream of length
k+N-1 and a
k
is number of different data streams satisfying
e.m.c.
Matching condition is particular (p.m.c) if just one
particular sequence i, i=1,,M, is found at the k
th
test.
Number of different data streams satisfying it equals a
k
(i).
In order to derive a
k
and a
k
(i), a new term - cross-bifix
- is introduced. A cross-bifix is a subsequence of length nN
that is a suffix of i
th
sequence and a prefix of j
th
sequence,
i,j=1,,M. The corresponding cross-bifix indicator
) (n
ij
h
equals to 1 if cross-bifix of length n exists, e.g. binary
sequences P
i
=0001 and P
j
=0011 have a 3-bit cross-
bifix 1
) 3 (

ij
h , while obviously 0
) 3 (

ji
h . If i=j,
) (n
ii
h denotes
classical bifix indicator
n
h introduced in [12]. The default
values for cross-bifix indicators are:
M j i
j i
j i
h h
N
ij ij
, , 1 , ,
, 1
, 0
, 1
) ( ) 0 (

(2)
The required number of different data streams a
k
can be
evaluated using the following recursion:



M
i
k k
M
j
k N
m
m k
m N
ji
m N
ji k
i a a
j a h h L i a
M i i a
1
1
) 1 , min(
1
) ( ) 1 (
1
) (
, ) ( ) ( ) (
, , 1 , 1 ) (
(3)
Value of a
1
(i), i=1,...,M must be 1, as there can exist only
one data stream of length N that equals to a particular one of
M sequences. For further discussion, a part of search trellis
for binary sequences P
1
=00 and P
2
=10 (L=2, N=2, M=2) is
drown in Figure 1a. Relationship between the sequences is
shown in Figure 1b: states are connected if the cross-bifix
indicator equals to one, and in this case a sequence prevents
the appearance of subsequent sequences with which it
shares a cross-bifix.
Generally, the number of data streams that satisfy the
p.m.c. is at most twice the number of data streams that have
one bit less ( ) 1 ( 2 ) 1 (
1

k k
a a )). The prefix of P
1
is a suffix
of P
1
and P
2
( 1
) 1 (
21
) 1 (
11
h h ), so their appearance at k-2
(Figure 1a, bold states) prevents P
1
at k-1 (dashed states). P
1

at k-1 would prevent the appearance of the same sequence at
k, had it not already been prevented by the sequences at k-2
(arms entering the state a
k-1
(1) in Figure 1b). So, their
number should be added to a
k-1
(1) to balance the loss and
( ) 1
k
a becomes )) 2 ( ) 1 ( ) 1 ( ( 2
2
) 1 (
21 2
) 1 (
11 1
+ +
k k k
a h a h a , in
accordance with (3).
00
00 00
10
00
10
00
10
00
10
10
00
10
10
01
11
01
11
01
11
a)
k-2
k-1 k
00:
10:
a
k-2
(1)
a
k-2
(2)
a
k-1
(1)
a
k-1
(2)
a
k
(1)
a
k
(2)
h
11
(1)
h
11
(1)
h
11
(0)
h
12
(0)
h
21
(1)
h
21
(1)
h
21
(0)
h
22
(0)
b)

Figure 1. - a) Search trellis for sequences 00 and 10; b) Sequence relations
On the other hand, P
1
at k is prevented by its replicas at k-
1 and k-2, as well as by P
2
at k-2 (bold states in Figure 1a and
the arms entering state a
k
(1) in Figure 1b). Their number
should be subtracted, so finally a
k
(1) equal to:

) 2 ( ) 2 ( ) 1 ( ) 1 (
) 2 ( ) 2 ( ) 1 (
) 1 ( )) 2 ( ) 1 ( ) 1 ( ( 2 ) 1 (
2 1 2 1
2
) 0 (
21 1
) 1 (
21 2
) 0 (
11
1
) 1 (
11 2
) 1 (
21 2
) 1 (
11 1



+ +

+ +
k k k k
k k k
k k k k k
a a a a
a h a h a h
a h a h a h a a
(4)
Above relation is also in accordance with (3).
It can be shown

that (1) is a probability density function,


as:
1 } Pr{
1


k
k S (5)
Other statistical parameters are expected value of the
number of tests (duration of search):


+
M
i
i
M
j
N
m
m N
ij
m N
k
S h L M N k k T k
1 1
1
0
) ( 1
1
1 } Pr{ } { E
(6)
and its second moment:

+ + +

M
i
M
j
N
m
i i
m N
ij
m N
k
S m T h L M N NT
k k T k
1 1
1
0
) ( 1 2
1
2 2 2
] ) 1 2 ( 2 [ 2 1
} Pr{ } { E
(7)
The quantities S
j
and T
j
, necessary for (6) and (7), are
defined as:

Detailed derivation of (5), (6), (7) and (10) (16) (cca. 15


singe-spaced pages) is available upon request.
0-7803-8533-0/04/$20.00 (c) 2004 IEEE IEEE Communications Society
515

[ ]
M
k
k
k j
S S S M j p j a b S
2 1
1
, , , 1 , ) (

S (8)

[ ]
M
k
k
k j
T T T M j p j a k b T
2 1
1
, , , 1 , ) (

T
(9)
and can be obtained as solutions of the following sets of
linear equations:
[ ]
T T
1 0 0 0 S A (10)

T T
B T A (11)
Matrix A is an MxM matrix while B is a vector:

[ ]
M j
M i
M i h h p
A A
N
m
m N
i j
m N
j
m
ij ij
, , 1
,
, 1
1 , , 1 ), (
;
1
0
) (
) 1 (
) (
1

A
(12)

[ ]

< +


+
M i T
M i S h h p m
B
B
M
j
N
m
j
m N
j
m N
i j
m
i
i
,
, ) ( ) 5 . 0 (
;
1
1
0
) (
1
) (
) 1 (
B
(13)
If M=1, then a
k
, E{k} and E{k
2
} become the same as in
[12, 14], i.e. the case of a single sequence search.
III. DISTRIBUTED SEQUENCES AND CROSS-BIFIX-FREE
SEQUENCES
A distributed sequence [17] of length N is specified by
C=c
0
c
1
c
N-1
. Out of N symbols, l ones are fixed, taking
value 0 or 1, and the remaining N-l are unconstrained data,
identified by the character x. The data symbols will assume
the values 0 and 1 with equal probability. By definition, the
first and last positions of the sequence C are fixed symbols,
e.g. for sequence C=0xx1xx0, l=3 and N=7.
For contiguous sequences it is known that the best ones
are bifix-free; analogously, a term cross-bifix-free
sequences can be defined, for a set of sequences with all the
cross-bifix indicators equalling 0, except the default ones.
This sets values of (12) and (13) to be:

[ ] T 0 0 0 ,
1 1 1 1 1
1 0 0 0 1
0 0 1 0 1
0 0 0 1 1

]
]
]
]
]
]
]
]
,
,
,
,
,
,

B A
(14)
from which it follows immediately:
M T T M S
j j
/ ; / 1 . (15)
Expressions (3) and (6) are then considerably simplified,
as well as expression for variance
2
=E{k
2
}-E
2
{k}:

N k k k
a M a L a M a


1 1
; ; (16)
M L N T
N
/ 1 + (17)

2 2 2
/ / ) 2 1 ( M L M L N
N N
+ . (18)
Putting M=1, expressions for statistical parameters of an
ordinary bifix-free sequence is obtained [12, 14].
The only detailed contribution dealing with distributed
sequences [17] proposes methods for construction of optimal
sequences and gives a list of best one found. It is not
surprising that such sequences are cross-bifix free (the
authors of [17] call them bifix-free, obviously in a cross-bifix
sense).
Table I. lists evaluated expected values of search of
minimal and maximal length distributed sequences [17],
compared to contiguous sequences ((6) and (7) are verified
by simulation study). As the best sequences should have
minimal expected value of search duration [12], distributed
sequences outperforms contiguous sequence of the same
redundancy l. Relative gains of distributed sequences over
contiguous ones of same redundancy are defined as:
[%] 100 ] / ) [(
CONT DISTR CONT TR
T T T G . (19)
[%] 100 ] / ( [
2
CONT
2
DISTR)
2
CONT R
G . (20)
Gains are plotted in Fig. 2. P.d.f. of the same set of
sequences is plotted in Fig. 3. It possess a striking similarity
to the exponential distribution and can be approximated by it
- besides, the values of variance (18) are approximately the
squares of mean values of the search time (17), another
characteristic of exponential distribution. The relative error
of approximation:



% 100
Pr
1
Pr

k
e
T
k
T
k
R
(21)
is plotted in Fig. 4, exceeding 10% just in case of lowest
redundancy sequences.
Synchronization sequence might be considered correctly
received if it is within certain Hamming distance H from the
correct sequence. Distributed sequences are not exception -
H errors per fixed l bits can be allowed. So, distributed
sequence 0x0 with single error allowed (H=1) means a
search for sequences 0x0, 1x0 and 0x1, total of M=6
sequences (x can be 0 or 1).
P.d.f. of distributed sequences with errors is shown in Fig
5. It is obvious that in this case it cannot be approximated by
the exponential distribution of the same expected value (also
plotted in Fig. 5). Furthermore, expected duration of search
is considerably shortened (logically - as M increases there
are more possibilities for positive outcome of test), as shown
in Table II for sequences
0-7803-8533-0/04/$20.00 (c) 2004 IEEE IEEE Communications Society
516

0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0
1 E - 5
1 E - 4
1 E - 3
0 . 0 1
l = 5
l = 1 0
N u m b e r o f i n f o r m a t i o n b i t s : l = 1 5
C o n t i g u o u s b i f i x - f r e e s e q u e n c e
D i s t r i b u t e d s e q u e n c e o f m i n i m u m l e n g t h
D i s t r i b u t e d s e q u e n c e o f m a x i m u m l e n g t h p
.
d
.
f
.

(
l
o
g
)
D u r a t i o n o f s e a r c h k
0 10 20 30 40 50 60 70 80 90 100
1E-6
1E-5
1E-4
1E-3
0.01
0.1
1
10
l=5
l=10
Number of information bits:l=15
Contiguous bifix-free sequence
Distributed sequence of minimum length
Distributed sequence of maximum length
R
e
l
a
t
i
v
e

e
r
r
o
r

o
f

a
p
p
r
o
x
i
m
a
t
i
o
n

[
%
]
Duration of search k
- 1 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 1 1 0
1 E - 3 2
1 E - 2 8
1 E - 2 4
1 E - 2 0
1 E - 1 6
1 E - 1 2
1 E - 8
1 E - 4
1
H = 2
H = 1
H = 0
p
.
d
.
f
.

(
l
o
g
)
D u r a t i o n o f s e a r c h - k
1 0 1 0 0
1 1 x 0 1 0
1 1 1 0 x x 0
e x p o n e n t i a l a p p r o x i m a t i o n
4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6
0 . 0 1
0 . 1
1
1 0
1 0 0
M a x . l e n g t h s e q u e n c e s
M i n . l e n g t h s e q u e n c e s
, E x p e c t e d v a l u e
, V a r i a n c e
R
e
l
a
t
i
v
e

g
a
i
n

o
v
e
r

c
o
n
t
i
g
u
o
u
s

s
e
q
u
e
n
c
e

[
%
]
N u m b e r o f f i x e d b i t s w i t h i n s e q u e n c e ( r e d u n d a n c y ) l
TABLE I. EXPECTED VALUE AND VARIANCE OF DURATION OF SEARCH FOR SOME OF THE BEST SEQUENCES - DISTRIBUTED AND CONTIGUOUS












Figure 2. Relative gain over contiguous sequences









Figure 3. Probability distribution function











Figure 4. Cross-bifix-free sequences p.d.f. approximated by exponential
distribution: relative error










Figure 5. Distributed sequences with errors: pdf and exp. approximation


Minimal length sequences
[17] Maximal length sequences [17] Contiguous sequenc.
l T
2
T
2
T
2

5 11x010 27 672 1110xx0 26 608 28 736
6 11x0010 58 3264 1110xx0xx0 55 2880 59 3392
7 1110010 122 14720 111xx0xxx0x10 116 13184 122 14720
8 11011x10x0 247 60672 111xx0xxx0xxx0x10 240 57088 249 61696
9 111x10x0110 502 251392 111xx0xxx0xxx0x10xx0 493 242176 504 253440
10 1110x01001x0 1013 1025024 11xxxx110xx0xx1xxxx0x0x0 1001 1000448 1015 1029120
11 11100010010 2038 4151296 111xx0x0xxxx0xxxx0xxxxxx0110 2021 4081664 2038 4151296
12 1111x0x1001010 4083 16666624 111xx0x0x0xxxx0xxxx0xxxxxxxx0110 4065 16519168 4085 16683008
13 1111x00110x1010 8178 66871296 111xx0x0x0xxxx0xxxx0xxxx0xxxxxxxx0110 8156 66510848 8180 66904064
14 11x00111x1xx010010 16367 267862016 11xxxx11x0xxxxxxxxx0xxx0xxx0xx0xxx1xx10x0x0 16342 2.67E+08 16371 2.68E+08
15 110x1100000101x0x0 32751 1072594944 111xxx1xxxx1xx0xxxxxxx0xxxxxxx0xxxx0x1011x0xx0 32723 1070759940 32754 1072791550
0-7803-8533-0/04/$20.00 (c) 2004 IEEE IEEE Communications Society
517


TABLE II. EXPECTED VALUE AND VARIANCE OF DURATION OF SEARCH FOR l=5 SEQUENCES WITH ERRORS
l = 5 H=0 H=1 H=2
SEQUENCES T
2
GR[%] T
2
GR [%] T
2
GR [%]
10100 28 736 - 4.270936 10.56147930 - 1.839844 1.39231873 -
11x010 27 672 4 4.34768 10.68947770 -2 1.882813 1.45111084 -2
1110xx0 26 608 7 4.98806 15.49023460 -17 2.118591 2.46170514 -15

with l=5. Contrary to error-free case, contiguous sequences
are the ones with minimal T=E{k} (gain is negative!). It
might seem as an advantage, but this case must be further
carefully investigated, as shorter search duration guarantees
more frequent simulations of sync. sequence within the data
region that prevails good properties of sequence within the
overlap region. This implies the necessity to perform further
research upon (3), to cover the case of search in frame where
synchronization sequence is periodically inserted within the
random (almost always scrambled!) data. Such a formula,
similar to one derived for a case of a single sequence search
[15], will be a basis for re-plotting the famous Haeberles
curves [11], for the case of sequences with errors and for the
case of distributed sequences, both with and without errors.
IV CONCLUSION
This paper explains an analytical tool for the study of
multiple sequences search process. As an application
example, statistical analysis of distributed sequences is
performed. It is shown that, for sequences where exact
matching is required, optimal distributed sequences
outperform contiguous ones of the same redundancy.
However, if errors are allowed, the advantage is not so clear
and further investigation considering search in frame and not
within the stream of random data have to be performed (work
in progress).
Other applications of the formulae shown in this paper lie
within the field of telecommunications, information theory or
image analysis, including the analysis of window-sliding
procedures for lossless data compression and two-
dimensional search processes with the dimensions measured
in the same units (images) or different units (Hz/second).
Therefore, results of this mathematical tool can be used to
analyse not only synchronization sequences, but other
practical problems as well.
Further research will include the study of a case when
data are memoryless, but not equiprobable.

REFERENCES
[1] J.L. Massey: Optimum Frame Synchronization, IEEE Trans. on
Comm., pp. 115-119, Vol. COM-20, April 1972.
[2] R. A. Scholtz, Frame synchronization techniques, IEEE
Trans.Commun., vol. COM-28, pp. 12041213, Aug. 1980.
[3] P.T. Nielsen: Some Optimum and Suboptimum Frame
Synchronizer for Binary Data in Gaussian Noise, IEEE Trans.
on Comm., pp. 770-772, Vol. COM-21, June 1973.
[4] G.L. Lui and H.H. Tan: Frame Synchronization for Gaussian
Channels, IEEE Trans. on Comm., pp. 818-828, Vol. COM-35,
August 1987.
[5] T. Sekimoto and H. Kaneko: Group Synchronization for Digital
Transmission Systems, IRE Trans on Commun. Syst. pp. 381-
390, December 1962.
[6] K. Brayer: Frame synchronization for binary data transmission,
Electronics Letters, pp. 392-393, Vol. 7, July 1971.
[7] O. Brugia and M. Decina: Reframing statistic of P.C.M.
multiplex transmission, Electronics Letters, pp. 625-627, Vol. 7,
July 1971.
[8] D.E. Dodds, S.-M. Pan and G.W. Wacker: Statistical distribution
of PCM Framing Times, IEEE Trans. on Comm., pp. 1236-1241,
Vol. COM-36, November 1988.
[9] J. Lindner, Binary sequences up to length 40 with best possible
autocorrelation function, Electron. Lett., vol. 11, p. 507, 1975.
[10] M.N. Al-Subbagh and E.V. Jones: Optimum patterns for frame
alignment, IEE Proc. part F - Commun. Radar & Signal
processing, Vol. 135 (6), pp 594-603, December 1988.
[11] H. Hberle: Frame synchronizing PCM systems, Electrical
Communications, Vol. 44, No. 4, pp. 280-287, 1969.
[12] P.T. Nielsen: On the Expected Duration of a Search for a Fixed
Pattern in Random Data, IEEE Trans. on Inf. Theory., pp. 702-
704, Vol. IT-19, September 1973.
[13] T. McConell: The Expected Time to Find a String in a Random
Binary Sequence [http://barnyard.syr.edu/cover.pdf], January
2001.
[14] D. Baji, D. Draji: Duration of search for a fixed pattern in
random data: Distribution function and variance, Electronics
Letters, 1995, Vol. 31. No. 8, pp 631-632.
[15] D. Baji, D. Draji: Search Process: The Analysis, Facta
Universitatis, Series: Electronics and Energetics, 1996. Vol. 9,
No. 2, pp 191-205.
[16] D. Baji, J. Stojanovic and J. Lindner: Multiple Window-sliding
Search, Proceedings of 2003 IEEE International Symposium on
Information Theory (ISIT-2003), Yokohama, Japan, June 2003,
pp 249.
[17] A. J. de Lind van Wijngaarden, T. J. Willink: Frame
Synchronization Using Distributed Sequences, IEEE
Transactions on Communications, 2000, Vol. 48, No. 12, pp
2127-2138.

0-7803-8533-0/04/$20.00 (c) 2004 IEEE IEEE Communications Society
518

You might also like