An Overview of Turbo Codes and Their Applications: November 2005

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/4234971

An overview of turbo codes and their applications

Conference Paper · November 2005


DOI: 10.1109/ECWT.2005.1617639 · Source: IEEE Xplore

CITATIONS READS

24 3,898

5 authors, including:

Ramesh Pyndiah Patrick Adde


IMT Atlantique Institut Mines-Télécom
158 PUBLICATIONS   1,836 CITATIONS    61 PUBLICATIONS   476 CITATIONS   

SEE PROFILE SEE PROFILE

Catherine Douillard
IMT Atlantique
138 PUBLICATIONS   2,862 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Error correcting codes View project

Interference management in wireless communication systems View project

All content following this page was uploaded by Ramesh Pyndiah on 30 May 2014.

The user has requested enhancement of the downloaded file.


An Overview of Turbo Codes and Their Applications
(Invited Paper)

Claude Berrou, Ramesh Pyndiah, Patrick Adde, Catherine Douillard and Raphaël Le Bidan
GET/ENST Bretagne, Laboratoire TAMCIC (UMR CNRS 2872), PRACom
Technopôle Brest Iroise, CS 83818, 29238 Brest Cedex 3, FRANCE
E-mail: {firstname.lastname}@enst-bretagne.fr

Abstract — More than ten years after their introduction, energy. Historically, Turbo Codes were first deployed for
Turbo Codes are now a mature technology that has been satellite links and deep-space missions, where they
rapidly adopted for application in many commercial offered impressive Bit-Error Rate (BER) performance
transmissions systems. This paper provides an overview of beyond existing levels with no additional power
the basic concepts employed in Convolutional and Block
Turbo Codes, and review the major evolutions in the field
requirement (a premium resource for satellites). Since
with an emphasis on practical issues such as implementation then, they have made their way in 3G wireless phones,
complexity and high-rate circuit architectures. We address Digital Video Broadcast (DVB) systems, or Wireless
the use of these technologies in existing standards and also Metropolitan Area Networks (WMAN). They are also
discuss future potential applications for this error-control considered for adoption in several emerging standards
coding technology. including enhanced versions of Wi-Fi networks.
A decade after the discovery of Turbo Codes, this
paper provides an overview of this advanced FEC
I. INTRODUCTION technology. The next two sections review the basic
Error-control codes, also called error-correcting codes concepts and the major evolutions in the field for both
or channel codes, are a fundamental component of Convolutional and Block Turbo Codes. Practical issues
virtually every digital transmission system in use today. relevant to the system designer such as implementation
Channel coding is accomplished by inserting controlled complexity and high-rate circuit architectures are also
redundancy into the transmitted digital sequence, thus addressed, and the use of Turbo Codes in existing
allowing the receiver to perform a more accurate decision standards is discussed. Some personal views about the
on the received symbols and even correct some of the next evolutions expected in the field of channel coding
errors made during the transmission. In his landmark are finally proposed in conclusion.
1948 paper that pioneered the field of Information
Theory, Claude E. Shannon proved the theoretical II. CONVOLUTIONAL TURBO CODES
existence of good error-correcting codes that allow data
to be transmitted virtually error-free at rates up to the Classical Convolutional Turbo Codes, also called
absolute maximum capacity (usually measured in bits per Parallel Concatenated Convolutional Codes (PCCC),
second) of a communication channel, and with result from a pragmatic construction conducted by C.
surprisingly low transmitted power (in contrast to Berrou and A. Glavieux, based on the intuitions of G.
common belief at that time). However Shannon’s work Battail [5], J. Hagenauer and P. Hoeher [6], who, in the
left unanswered the problem of constructing such late 80’s, highlighted the interest of introducing
capacity-approaching channel codes. This problem has probabilistic processing in digital communications
motivated intensive research efforts during the following receivers. Previously, other researchers including P. Elias
four decades, and has led to the discovery of fairly good [7], R. G. Gallager [8] and M. Tanner [9] had already
codes, usually (but not always – see convolutional codes imagined coding and decoding systems closely related to
for example) obtained from sophisticated algebraic the principles of Turbo Codes.
constructions. However, 3 dB or more stood between A. Principles of Turbo Codes
what the theory promised and the practical performance
offered by error-correcting codes in the early 90’s. The classical Turbo Code is shown in Fig. 1 and
The introduction of Convolutional Turbo Codes (CTC) consists of the parallel concatenation of two binary
in 1993 [1,2], quickly followed by the invention of Block Recursive Systematic Convolutional (RSC) codes C1 and
Turbo Codes (BTC) in 1994 [3,4], closed much of the C2 separated by a permutation (interleaver) Π. Serial
remaining gap to capacity. Today, advanced Forward concatenation is also possible [10] (with its own pros and
Error Correction (FEC) systems employing Turbo Codes cons) but will not be discussed here. RSC codes are a key
commonly approach Shannon’s theoretical limit within a component of Turbo Codes. They are based on Linear
few tenths of a decibel. Practical implications are Feedback Shift-Registers (LFRS) and act as pseudo-
numerous. Using Turbo Codes, a system designer can for random scramblers. RSC codes offer several advantages
example achieve a higher throughput (by a factor 2 or in comparison with classical non-recursive non-
more) for a given transmitted power, or, alternatively, systematic convolutional codes. First, they resemble
achieve a given data rate with reduced transmitted random codes, and it is known from Shannon’s
pioneering work that random-like codes are the key to converge towards a stable final decision for d. In practice
approach capacity. In addition, they perform better than and depending on the nature of the SISO decoder, fine-
classical convolutional codes at low signal to noise ratios tuning operations (scaling, clipping) may be applied to
[2]. Finally, RSC codes have the interesting property that the extrinsic information in order to ensure convergence
only a small fraction of finite weight information within a small number of iterations.
sequences yields finite weight (“low redundancy”) coded
B. Example of performance results
sequences at the encoder’s output. These particular
sequences are called Return To Zero (RTZ) sequences in Table I show some examples of performance results
the literature and play a fundamental role in the for the DVB-RCS Turbo Code over an AWGN channel
asymptotic performance of the Turbo Code [11,12]. using 8 iterations and 4-bit input quantization. We have
reported the Eb/N0 level (dB) required to achieve a target
Frame Error Rate (FER) of 10-4 for different code rates
and block lengths. The corresponding gap ∆ with respect
to the Sphere-Packing Bound (SPB) is also given. We
recall that the SPB provides a theoretical lower bound on
the minimum Eb/N0 required to achieve a given FER with
the best codes of a given finite block size [13]. Although
these performance do not fully reflect the current state of
the art in CTC, we observe that the DVB-RCS perform
very close (from 1.0 to 1.5 dB) to the theoretical limits
under real implementation constraints.

Rate ATM (53 bytes) MPEG (188 bytes)


1/2 2.3 dB (∆ = 1.0 dB) 1.8 dB (∆ = 1.0 dB)
2/3 3.3 dB (∆ = 1.1 dB) 2.6 dB (∆ = 0.9 dB)
Fig. 1. The classical Turbo Code. 3/4 3.9 dB (∆ = 1.3 dB) 3.2 dB (∆ = 1.1 dB)
4/5 4.6 dB (∆ = 1.5 dB) 3.8 dB (∆ = 1.2 dB)
Optimal decoding of the overall Turbo Code is not 6/7 5.2 dB (∆ = 1.4 dB) 4.4 dB (∆ = 1.1 dB)
possible in practice due to a prohibitive number of states Table I. Minimum Eb/N0 (dB) required to achieve a target
to consider. Instead, a clever divide-and-conquer strategy FER=10-4 with the DVB-RCS CTC over an AWGN channel.
with manageable complexity and near-optimum
B. Advances in the field
performance is applied at the receiver side. Turbo
decoding relies on the exchange of probabilistic Many progress in the understanding and design of
messages between two Soft-Input Soft-Output (SISO) CTC have been made during the last decade. Some of
decoders. Usually (but not necessarily), probabilistic these advances are surveyed in this section.
information is expressed in Log Likelihood Ratio (LLR)
1) Low-complexity decoding algorithms
form. Denoting by Pr{d=1} the probability that, at a
given step of the decoding process, a binary datum d has CTC were originally decoded using the BCJR-MAP
logical value “1”, the LLR L(d) about d is given by: algorithm [2]. However this algorithm does not lend itself
easily to a digital hardware implementation since it
 Pr{d = 1}  involves many real number multiplications. Efficient
L(d ) = ln  (1)
 1 − Pr{d = 1} implementations have been proposed that operate directly
in the logarithm domain, thereby translating
The sign of L(d) gives the hard-decision about d, while multiplications into additions (see for example [14]).
the magnitude |L(d)| measures the reliability of this Among them, the Max-Log-Map decoding algorithm
decision. The role of a SISO decoder consists in taking realizes a good trade-off between performance and
input LLR estimates about the transmitted bits and trying complexity, with the added advantage of not requiring
to improve these estimates, using the local redundancy of any knowledge of the noise level. In addition, the
the considered component code. The output LLR introduction of sliding-window decoding algorithm has
delivered by the SISO decoder may be written as: helped in reducing the internal memory requirements and
Loutput (d ) = Linput (d ) + z (d ) (2) the decoding latency of the SISO decoders [15].
2) Stop criterion
The probabilistic quantity z(d) is called the extrinsic
information about bit d. This is the result of the decoder’s A stop criterion facilitates the convergence of the
estimation of d, but not taking its own input into account. iterative decoding process and helps reducing the average
It is precisely this extrinsic information that is exchanged power consumption of the decoder by reducing the
iteratively between the two SISO decoders during the average number of iterations required to decode a block,
decoding process. Subtracting the decoder’s input from without compromising performance. Various stop criteria
its output prevents the decoder from acting as a positive have been proposed for CTC over the years. As an
feedback amplifier and introduces stability (a crucial illustration, a detailed investigation of several stopping
issue!) in the feedback process. Usually, after a given rules can be found in ref. [16].
number of iterations, one observes that the two decoders
3) Circular termination of the component codes between double-binary CTC employing the exact MAP
or its sub-optimal Max-Log-MAP approximation,
In contrast to classical block codes, convolutional
whereas a degradation of 0.5 dB or less is usually
codes are not a priori well suited for the transmission of
observed with binary CTC in the latter case. Hence
finite-length information sequences. Several solutions are
double-binary CTC provide an efficient and versatile
available to circumvent this problem. The first idea
FEC solution with reasonable decoding complexity.
consists in not terminating the two trellises at the end of
the encoding process. This, however, introduces some
performance loss in the decoding process (possible error
floor). Another solution, adopted in several standards
(CCSDS, UMTS), forces the termination of at least one
of the two trellises in the all-zero state. This is
accomplished by inserting additional “dummy” tailbits at
the end of the information sequence, thereby reducing the
overall spectral efficiency of the transmission. The final
solution consists in properly initializing the encoder
memory so that the final state of the encoder register
becomes equal to the initial state. This technique does not
require additional termination bits. The resulting code
trellis has a circular representation, hence the name
“circular” (or “tailbiting”) termination. Long known for
classical non-recursive convolutional codes, this
technique has been adapted to RSC codes in [12].
Circular RSC (CRSC) codes have several advantages
over terminated RSC codes. In particular, circular
termination guarantees a uniform protection level of all Fig 2. Performance comparison between 8-state binary (m=1)
the bits in the coded sequence since they benefit from the and duo-binary (m=2) CTC of various code rates with QPSK
whole set of redundancy. It also facilitates the design of modulation over an AWGN channel.
parallel decoding architecture. Note finally that the usual
SISO decoding algorithms can be easily accommodated
to handle the specificity of circular termination.
4) Non-binary CTC
Classical binary CTC usually employ rate-1/2 RSC
codes. In contrast, non-binary CTC are based on parallel
concatenation of rate-m/(m+1) (usually m=2) CRSC
component codes. 8-state CTC from this family have
already been adopted in several standards, including
DVB-RCS, DVB-RCT and IEEE 802.16a standard
(WiMAX) for Wireless Metropolitan Area Networks
(WMAN). Non-binary CTC indeed exhibit remarkable
properties, as shown in [17,18]. First, we observe a better
convergence of the iterative process (less correlation
effects between the decoders). Non-binary CTC are also
less sensitive (less degradation) with respect to Fig 3. Performance comparison between MAP and Max-Log-
puncturing patterns. These two points are illustrated in MAP decoding of 8-state binary (m=1) and duo-binary (m=2)
CTC with QPSK modulation over an AWGN channel.
Fig. 2 where we compare the performance of 8-states
binary (m=1) and double-binary (m=2) CTC with octal
generator polynomials (15,13) and different code rates. 5) Better understanding of the code performance
The fact that the encoder and decoder non longer operate
on individual bits but rather on non-binary symbols also Performance curves of Turbo Codes exhibit a very
introduce additional degree of freedom in the characteristic behavior comprising a (usually steep) turbo
permutation design (use of two-stage intra- and inter- cliff (“waterfall region”) at low SNR, followed by a
symbol permutation), yielding better minimum distance progressive flattening of the performance curve (the so-
and lower error floors. Architectures for non-binary called “error-floor region”) at moderate to high SNR.
SISO decoders provide a higher throughput and reduced The introduction of the notion of a probabilistic
latency (although the decoding operations are slightly “uniform” interleaver has facilitated the asymptotic
more complex since there are more edges to consider in performance analysis of Turbo Codes and has provided
the individual trellises). Finally, non-binary SISO useful guidelines with respect to the choice of the
decoders have increased robustness with respect to the component codes [19]. More recently, several methods
use of sub-optimum decoding algorithms. As shown in have been proposed that allow to compute or at least
Fig. 3, there is almost no performance degradation closely estimate the true minimum Hamming distance of
the Turbo Code [20,21]. These tools are of great practical
Application Turbo Code Termination Polynomials Rates
CCSDS (deep space missions) Binary, 16-state Tail bits 23, 33, 25, 37 1/6, 1/4, 1/3, 1/2
UMTS, Cdma2000 (3G mobile) Binary, 8-state Tail bits 13, 15, 17 1/4, 1/3, 1/2
DVB-RCS (Return Channel over Satellite) Duo-binary, 8-state Circular 15, 13 1/3 up to 6/7
DVB-RCT (Return Channel over Terrestrial) Duo-binary, 16-state Circular 15, 13 1/2, 3/4
M4 (Inmarsat) Binary, 16-state None 23, 35 1/2
Skyplex (Eutelsat) Duo-binary, 8-state Circular 15, 13 4/5, 6/7
WiMAX (IEEE 802.16) Duo-binary, 8-state Circular 15, 13 1/2 up to 7/8
Table II. Current known applications of Convolutional Turbo Codes.

importance in order to design good permutations yielding based on similar ideas, and combine a high-level regular
high minimum distance (low error floors). In parallel, the permutation with local controlled disorder. These two
introduction of EXtrinsic Information Transfer (EXIT) solutions have good asymptotic performance at low BER
charts [22] and other related convergence analysis and led to very efficient hardware implementation. Note
methods (density evolution, etc) has led to a better that the ARP permutation model has been used in the
understanding of the Turbo Codes behavior in the turbo CTC adopted in the DVB-RCS and DVB-RCT standards.
cliff region (convergence threshold, convergence speed).
C. Applications of CTC
The combination of these various tools allow the system
designer to carefully optimize the performance of his A decade after their introduction, CTC are already in
Turbo Codes with respect to the now classical use in several industry standards. Some of them are
“convergence versus minimum distance” dilemma described in Table II (see also the EchoStar system for
encountered with capacity-approaching codes. satellite TV developed by Broadcom Corp. for another
example). The corresponding four CTC commonly used
6) The art of permutation design
in practice are shown in Fig. 3. Let us examine the
The pseudo-random permutation is another key relative merits and limitations of each of these codes.
component of Turbo Codes. Originally introduced to Since the choice of a FEC system is usually dictated by
break correlation effects during the iterative decoding practical system constraints such as latency, residual
process, the permutation function has been quickly error rate or silicon area, we will consider here three
recognized as a fundamental parameter of the code itself. different FER regions corresponding to different Quality
When considering very large block sizes (say 30000 bits of Service (QoS) requirements:
or more), a permutation drawn at random will yield a
good Turbo Code with high probability. To quote David ƒ Medium error rates (FER > 10-4)
Forney (MIT): “It sometimes seems that almost any This is typically the domain of Automatic Repeat
simple codes interconnected by a large pseudo-random reQuest (ARQ) systems and is also the more favorable
interleaver and decoded with sum-product decoding will range of error rates for CTC. 8-state component codes are
yield near-Shannon-limit performance”. This is no longer sufficient to reach near-optimum performance. The
true when one aims at designing CTC operating on small binary CTC in Fig. 3a is suitable for rates < 1/2. The duo-
blocks with good performance (low error floor) at low binary code of Fig. 3b is preferable for higher rates (less
Bit-Error Rates (BER). The way the permutation is sensitivity to puncturing patterns). In both cases,
devised (together with the choice of the component performance close to the theoretical limit is achieved
codes) indeed fixes the minimum Hamming distance dmin with existing silicon decoders for most coding rates and
of the Turbo Code, and therefore the corresponding block sizes, even the shortest.
achievable asymptotic coding gain Ga ≈ 10 log10(Rdmin).
ƒ Low error rates (10-9 < FER < 10-4)
Regularity of the permutation is another important factor
that should not be overlooked in practice. Indeed, the 16-state CTC are usually preferable to 8-state CTC in
more regular the permutation, the easier it is to conceive this context since they offer better performance (by about
high-throughput parallel decoding architectures. 1 dB at a FER of 10-7) in this region. The choice between
Designing permutations having both good structural and the two solutions mainly depends on the desired trade-off
spreading properties actually remains an on-going area of between performance and decoding complexity. The
research that regularly inspires new contributions. corresponding Turbo Codes are shown in Fig. 3c and 3d.
Recently however, two permutation models have been Again, binary CTC are suitable for coding rates < 1/2 and
proposed that satisfy most of the requirements for a good non-binary CTC should be used for higher rates. Note
permutation. Called Dithered Relatively Prime (DRP) also that the permutation must be very carefully designed
permutation [23] and Almost Regular Permutation (ARP) in order to maintain good performance at low error rates.
[24] respectively, these two simple models are actually
B

X A

k binary k/2 binary


data couples

permutation
Y1 Y1
Π
permutation
Π

Y2
(a) (b) Y2

polynomials 15, 13 (or 13, 15)

B
X A

k binary
k/2 binary
data
couples

permutation
Y1 Y1
Π
permutation
Π

Y2

(c) (d) Y2

polynomials 23, 35 (or 31, 27)

Fig. 3. The four CTC used in practice: a) 8-state binary; b) 8-state duo-binary; c) 16-state binary; d) 16-state duo-binary.

ƒ Very low error rates (FER < 10-9) A. Construction and iterative decoding
For the time being, the minimum Hamming distances The general concept of Block Turbo Codes is based
that are currently obtained with CTC cannot prevent a on iterative SISO decoding of product codes which were
change of slope in the performance curves at very low introduced by P. Elias in 1954 [7]. Product codes are
error rates. An increase of about 25% in the minimum constructed by serial concatenation of two (or more)
distance of the code would be necessary to make CTC systematic linear block codes C1 and C2 with parameters
attractive for those applications that operate in this error- (n1,k1,δ1) and (n2,k2,δ2), where ni, ki, and δi stand for the
rate region (such as optical transmission or mass storage code length, code dimension and minimum Hamming
systems for example). distance of each component code Ci. As shown in Fig. 4,
To summarize the previous discussion, 8-state CTC data bits are placed in a k1×k2 information matrix [M] and
are particularly appropriate for ARQ systems and short to the rows and columns are encoded by the codes C and
medium block sizes. On the other hand, 16-state CTC are C respectively, yielding a n1×n2 coded matrix [C]. The
necessary for broadcast systems, long blocks, or high product code has length n=n1.n2, dimension k=k1.k2, and
coding rates. Several remaining challenges are currently code rate R=R1.R2 where Ri is the code rate of code Ci.
under investigation. In particular, it would be desirable to All the rows of the coded matrix are code words of C
reduce by half the number of iterations required to and all columns are code words of C. It follows from
achieve convergence (from 8 to 4), and to decrease the this important property that the minimum Hamming
complexity of the Max-Log-MAP decoder for 16-state distance of the product code is the product δ=δ1.δ2 of the
Turbo Codes. minimum Hamming distance δi of the component codes
[4]. Hence it is easy to construct product codes with large
minimum distance, that do not suffer from error-floor
III. BLOCK TURBO CODES problems as may do CTC in the absence of a careful
Block Turbo Codes (BTC), also called Turbo Product permutation design.
Codes (TPC), offer an interesting alternative to CTC for In the iterative decoding process, all the rows and
applications requiring either high code rates (R > 0.8), columns of the received matrix are decoded sequentially
very low error floors, or low-complexity decoders able to at each iteration. Thus the data bits but also the parity bits
operate at several hundreds of megabits per second (and can exploit the extrinsic information which is another
even higher). advantage of serial over parallel concatenation.
α(m) β(m)

[W(m)] [R(m)] Decoding of rows or [W(m+1)]


columns of matrix P

[R] [R] [R]


DELAY LINE

Fig. 5. Block diagram of the Block Turbo Decoder.


B. Examples of performance results
Performance of BCH-BTC using single- and double-
error correcting component codes at iteration 4 are given
Fig. 4. Construction of the product code. in Table III for QPSK modulation over an AWGN
channel. (Eb/N0) is the energy per bit to single side-band
The basic component of the Block Turbo Decoder is noise spectral density ratio for a target BER of 10-4 and
the SISO decoder used for decoding the rows and
∆(Sh) is the gap to the Sphere Packing Bound for the
columns of the product code. The SISO decoder consists
same BER. Extended versions of BCH component codes
of a modified Chase-II soft-input hard-output decoder
are used in order to maximize the product (Rδ) which
[25] augmented by a soft-output computation unit. Given
guarantees maximum asymptotic gain. Sixteen test
a soft-input sequence R in LLR form corresponding to a
patterns are used to generate the subset Ω of candidate
row or column of the observation matrix [R], the Chase-
code words based on the four (s=4) least reliable bits.
II decoder first forms the binary hard-decision sequence
We observe that product codes using BCH component
Y from R. The reliabity of the decision on the jth coded
codes with minimum Hamming distance (MHD) 4
bit is given by the magnitude |rj| of the corresponding soft
exhibit a gap ∆(Sh) of less than 1 dB while those with a
input. 2s error patterns are generated by considering all
possible combinations of 0 and 1 in the s least reliable MHD of 6 exhibit a gap ∆(Sh) slightly higher than 1 dB
bit positions. These error patterns are added to the hard- (< 1.2 dB). Hence, BCH-BTC perform close to the
decision sequence Y to form candidate sequences that are theoretical limits in both cases.
decoded by a bounded-distance algebraic decoder. This
C1, C2 N k R δ Eb/N0 ∆(Sh)
procedure returns a list containing at most 2s distinct
candidate code words. Among them, the code word D (16,11,4) 256 121 0,473 16 3,35 0,72
at minimum euclidean distance from the observation R (16,7,6) 256 49 0,191 36 3,70 1,04
is selected as the Maximum Likelihood (ML) estimate. (32,26,4) 1024 676 0,660 16 3,05 0,81
On the basis of this decision, soft-output computation is (32,21,6) 1024 441 0,431 36 2,50 1,18
performed as follows. For a given bit in position i, the list (64,57,4) 4096 3249 0,793 16 3,45 0,84
of candidate code words is searched for a concurrent (64,51,6) 4096 2601 0,635 36 2,70 1,18
codeword C at minimum euclidean distance from R (128,120,4) 16384 14400 0,879 16 4,10 0,88
and such that cj≠dj. If such a codeword exists, the soft (128,113,6) 16384 12769 0,779 36 3,35 1,17
output r’j on this bit is given by: (256,247,4) 65536 61009 0,931 16 4,80 0,95
Table III. Performance of different BCH-BTC at iteration 4
 R −C 2 − R− D 2  for a BER of 10-4 on AWGN channel using QPSK.
 
r ′j =   × d j (3)
 4 C. Significant advances in the field
 
This section reviews some of the most significant
Otherwise, the soft output is computed empirically using: improvements proposed for BTC over the last few years.
r ′j = β × d j (4) 1) Reduced search for a concurrent codeword
where β is a positive constant which increases with the A detailed analysis of the SISO decoding algorithm
iterations. As for CTC, the extrinsic information wj about shows that most of the decoding complexity lies in the
bit j is finally obtained by subtracting the soft-input soft output computation. Further investigation have
contribution to the soft-output computed by the decoder: shown that this complexity is mainly due to the search
and test (parity and metric) for a concurrent code word C
w j = r j′ − r j (5) in Ω for each component bit dj of the decision D. In order
to reduce the complexity of the decoder, a slightly
The block diagram of a block turbo decoder is illustrated different strategy has been proposed in [26]. Instead of
in Fig. 5. [R] is the channel output LLR matrix, [W] is searching for a concurrent code word for every
the extrinsic information matrix and α is another scaling component dj of D, the simplified algorithm selects once
factor which increases with the iteration number m. and for all the L code words in Ω at minimum Euclidean
distance from the observation R in the list of candidate
code words. Computation of the soft output is then
performed by restricting the search for concurrent code and the additional complexity is negligible. A detailed
words to these L selected candidate code words. For L=1, study of this method can be found in [27].
a single concurrent code word is considered and
complexity is divided by a factor of ~10. Metrics in Eq.
BER
(3) are computed and for each component dj of D the -2
10
simplified algorithm applies Eqs. (3) or (4) given the
parity test (dj=cj). This concept can be extended for -3 Fixed Beta
10
higher values of L. However increasing the value of L Variable Beta
improves the performance of the SISO decoder at the -4
Lower Bound
10
expense of complexity. A good trade-off between
performance and complexity is obtained for L=3 as -5
10
shown in Table IV. The results reported in this table have
been obtained with BCH(128,120,4) (i.e. extended -6
10
Hamming) component codes. Using the simplified
algorithm described above, the SISO decoder complexity -7
10
of an extended Hamming code is less than 6000 gates
and is nearly independent of the code length. -8
10
2 2,5 3 3,5
L 1 3 16
Eb/No (dB)
Gain (10-6) Ref. 0.06 dB 0.13 dB
Complexity Ref. +13.5% +90% Fig. 6. Performance comparison for the BTC(32,21,6)2
Table IV : Performance and complexity of the SISO decoder product code using fixed and variable β (QPSK over AWGN).
for a BCH(128,120,4) code as a function of the number l of
considered concurrent code words.
4) Adaptation of the product code parameters
2) Adaptive computation of the scaling factor β
For practical applications it is very often required to
The main draw-back in the SISO algorithm presented adapt the parameters (code length and dimension) of the
previously comes from the use of an empirical constant product code to those of the application. In [27] a method
scaling factor β in the absence concurrent code words was introduced to overcome this problem and that relies
during the soft-output computation operation. This rough on shortening and puncturing techniques. The idea there
approximation of the soft output results in a flattening of is to maximize the number of dummy bits used for
the BER curve at high Eb/N0. This effect is amplified shortening and also minimize the number of punctured
when the considered number L of concurrent code words bits for a given set of code parameters (length, dimension
in Ω is reduced. To mitigate this effect, we also proposed and rate). This strategy is motivated by the fact that
in [26] to dynamically compute β for each decoded row dummy bits are known to the decoder and thus carry a
(or column) using the following equation: high reliability whereas punctured bits carry no
information (zero reliability). Simulation results given in
β = ∑ rl′ (5) [27] show that, within a certain limit, the modified BTC
l∈ω operate within 1 dB of the theoretical limit.
where ω denotes the subset of least reliable bit at the 5) Reed-Solomon BTC
decoder input. To illustrate the resulting improvement, a
comparison of the two methods is given in Fig. 6. A BTC constructed form binary BCH component codes
BCH(32,21,6)2 product code is considered using QPSK have two important limitations. First, very large coded
over an AWGN channel. The number of test patterns was blocks (> 60,000 bits) are required to achieve high code
limited to 16, the number of concurrent code words was rates (R > 0.9). This is not always compatible with
limited to L=3, 5-bit quantization was used at the decoder pratical systems constraints, especially when we consider
input and 4 iterations were performed. We observe a wireless transmissions over fast time-varying fading
significant improvement in coding gain (> 0.25 dB) at channels. In addition, BCH-BTC combined with high-
low BER (10-8). Note also that the decoder performance order Quadrature Amplitude Modulation (QAM) in order
is very close (<0.1 dB) to the theoretical ML asymptotic to achieve spectrally-efficient communications exhibit a
performance bound [9]. This clearly shows that the gap between actual code performance and the Shannon
iterative decoder is asymptotically optimal and realizes a limit that increases with the size of the modulation
good trade-off between complexity and performance. alphabet [28]. Recent research results have shown that
non-binary BTC constructed from single-error correcting
3) Stop criterion Reed-Solomon codes overcome the aforementioned
An efficient stop criterion is easily derived based on the limitations of binary BCH-BTC [29,30]. In particular,
particular structure of the product code. If all the rows RS-BTC can achieve reliable performance (within 1 dB
(resp. columns) after column (resp. row) decoding at a of the theoretical bound) with both binary and QAM
given iteration are code words of C1 (resp. C2), then the modulation over an AWGN channel with a low-
decoding algorithm has converged and the decoding complexity decoder. In addition, non binary RS-BTC
process is stopped. This stop criterion is very efficient outperform binary BCH-BTC of similar code rate in term
of memory size, implementation complexity and
decoding delay since they exhibit an overall smaller code
Trivial Classical Innovativ
length (by a factor of ~2.8). (PU)1 (PU)1 e (PU)p
6) High-rate decoding architectures for BTC Data rate p2 p2 p2
2
RAM p p 1
Several companies including TurboConcept (France) PU complexity p2 p2 p2/2
or ComTech AHA (USA) already provide IP cores for Decoding delay 1 #1/p #1/p2
BTC that can operate at several (typically 20-200) Table V. Comparison of different architectures for high data-
megabits per second. However, there are specific rate decoding of BTC.
applications where very high speed (gigabits per second)
E. BTC in the standards
decoders are required. Typical examples are data storage
systems or optical transmission. In order to meet such BTC are currently in use in several proprietary satellite
throughput constraints, the trivial solution consists in (VSAT) transmission systems. In addition, they have
duplicating decoders operating in parallel with an been adopted in 2001 as an optional FEC system for both
appropriate scheduling scheme (Mux and Demux). This the uplink and downlink of the IEEE 802.16 standard
solution can be extremely expensive in the case of turbo (WiMAX). The product code standardized in WiMAX is
codes as turbo decoders are generally more complex than obtained by serial concatenation of two identical
their classical counterparts. Furthermore, for turbo codes extended Hamming component codes. A straightforward
with large block size very large RAMs (Random Access shortening strategy (deleting the first rows and columns
memory) are required to store channel data and extrinsic of the product code matrix) is applied in order to match
information. Thus, duplicating turbo codes implies the required block size. The mother extended Hamming
duplicating RAMs which is not very cost efficient. code is either the (64,57,4) or the (32,26,4) code. The
An innovative solution was proposed in [31] for resulting BTC configurations are given in Table VI.
increasing the turbo decoding speed of BTC. In the More recently, BTC have been selected by the HomePlug
following, we distinguish here between the memory size Powerline Alliance as part of the FEC system for
and the Processing Unit (PU) used to perform the SISO. broadband home networking over the power line [32]. In
In the case of product codes, all the rows (or columns) at this context, BTC are used to protect sensitive frame
a given decoding step m can be decoded independently. control data from the errors caused by severe impulsive
The idea here is to use several PU in parallel each noise events.
decoding a different row (or column) of the matrix. By
using p parallel PU, the processing speed can be Product Code Rate Payload size
increased by p without increasing the memory size. A (39,32)×(39,32) [Downlink] 0.673 1024 bits
classical solution would be to read and store the data at a (53,46)×(51,44) [Downlink] 0.749 3136 bits
speed p times faster than the decoding speed of one PU. (30,24)×(25,9) [Uplink] 0.608 456 bits
A more elegant solution is to read the p data elements Table VI. BTC configurations for IEEE 802.16.
simultaneously but this approach is not possible with
classical RAM technology. In the innovative solution we It has been recently discovered that both binary and
propose to break down the memory in p2 blocks and non-binary (RS) BTC also offer near-channel capacity
organize the data appropriately so that, at each clock performance over the Binary Symmetric Channel (BSC)
cycle, the data elements for the p decoders are read and Binary Erasure Channel (BEC) models [28,30].
simultaneously (note that the concept is identical for the Combined with the fact that BTC exhibit large minimum
store operation). distance (low error floors) by construction, this result
This new architecture has been studied in [31] in terms suggest that high code-rate BTC is a promising low-
of trade-off between complexity and decoding speed. complexity FEC for applications where soft outputs may
Table V below compares different candidate not be available at the channel output due to economical
architectures that yield a factor p2 on the decoding speed (data storage systems) and/or technological (optical
of the BTC. It is clear from this table that the innovative transmissions) reasons.
architecture provides an extremely attractive solution for
high data rate applications from all point of views (RAM, VI. CONCLUSION
PU complexity and decoding delay). The main limitation
here comes from the maximum number of samples The introduction of Turbo Codes in 1993 really took
processed simultaneously in the PU. The objective the channel coding community by surprise and initially
currently is to design processing units capable of raised a lot of skepticism. Ten years later, Turbo Codes
processing 8 samples per clock cycle. Thus with a 100 are now a mature technology that has found its way in
MHz clock cycle and p=8, a decoding speed of 6.4 practical industry standards. When the aim is to approach
Gigabits per second becomes feasible. closely the performance limit promised by Information
Theory, the iterative decoding concept offers
considerable savings (by several orders of magnitude)
compared to a single code. Furthermore, the additional
complexity required by turbo decoding compared to the
simple 16 or 64-state Viterbi decoders abundantly used
over the last two decades, seems to be quite compatible
with the continuing progress in microelectronics. Higher [15] S. Benedetto, D. Divsalar, G. Montorsi and F. Pollara, "A
circuit frequencies and larger possibilities of parallelism soft-input soft-output maximum a posteriori (MAP)
module to decode parallel and serial concatenated codes,"
may even reduce the latency problem (a real weak point JPL TDA Progress Report, vol. 42-127, Nov. 1996.
of Turbo Codes) down to a negligible level for most [16] A. Matache, S. Dolinar and F. Pollara, "Stopping rules for
applications. And beyond the simple introduction of a turbo decoders," JPL TMO Progress Report, vol. 42-142,
new error-correcting solution, the Turbo Principle (i.e. Aug. 2000.
the way to process data iteratively in receivers so that no [17] C. Berrou, M. Jézéquel, C. Douillard and S. Kérouédan,
"The advantages of non binary turbo codes," in Proc.
information is wasted) has also opened up a new way of IEEE Inform. Theory Workshop ITW’01, Cairns, Australia,
thinking in the construction of communication Sept. 2001, pp. 61-63.
algorithms. [18] C. Douillard and C. Berrou, "Turbo Codes with rate-
m/(m+1) constituent convolutional codes," To appear in
IEEE Trans. Commun., 2005-2006.
ACKNOWLEDGEMENT [19] S. Benedetto and G. Montorsi, "Unveiling Turbo Codes:
Some results on parallel concatenated coding schemes,"
The authors wish to acknowledge the assistance of all IEEE Trans. Inform. Theory, vol. 42, no. 2, Mar. 1996, pp.
their colleagues at ENST Bretagne who contributed to 409-428.
development and promotion of Turbo Codes, as well as [20] R. Garello, R. Pierleoni and S. Benedetto, "Computing the
France Telecom R&D for its continuous financial support free distance of Turbo Codes and serially concatenated
over the years. codes with interleavers," IEEE J. Select Areas Commun.,
vol. 19, no. 5, May 2001, pp. 800-812.
[21] C. Berrou, S. Vaton, M. Jézéquel and C. Douillard,
REFERENCES "Computing the minimum distance of linear codes by the
error impulse method," Proc. IEEE Global Telecommun.
[1] C. Berrou, A. Glavieux and P. Thitimajshima, "Near Conf. GLOBECOM’02, Taipei, China, vol. 2, Nov. 2002,
Shannon limit error-correcting coding and decoding: pp. 1017-1020.
Turbo Codes," in Proc. IEEE Int. Conf. Commun. ICC’93, [22] S. ten Brink, "Convergence behavior of iteratively decoded
Geneva, Switzerland, May 1993, pp. 1064-1070. parallel concatenated codes," IEEE Trans. Commun., vol.
[2] C. Berrou and A. Glavieux, "Near optimum error 49, no. 10, Oct. 2001, pp. 1727-1737.
correcting and decoding: Turbo Codes," IEEE Trans. [23] S. Crozier and P. Guinand, "Distance upper bounds and
Commun., vol. 44, no. 10, Oct. 1996, pp. 1261-1271. true minimum distance results for Turbo Codes with DRP
[3] R. Pyndiah, A. Glavieux, A. Picart and S. Jacq, "Near interleavers," Proc. 3rd Int. Symp. on Turbo Codes &
optimum decoding of product codes," in Proc. IEEE Related Topics ISTC’03, Brest, France, Sept. 2003, pp.
Global Telecommun. Conf. GLOBECOM’94, San 169-172.
Francisco, CA, Dec. 1994, pp. 339-343. [24] C. Berrou, Y. Saouter, C. Douillard, S. Kérouédan and M.
[4] R. Pyndiah, "Near optimum decoding of product codes: Jézéquel, "Designing good permutations for Turbo Codes:
Block Turbo Codes," IEEE Trans. Commun., vol. 46, no. Towards a single model," Proc. IEEE Int. Conf. Commun.
8, Aug. 1998, pp. 1003-1010. ICC’04, Paris, France, vol. 1, Jun. 2004, pp. 341-345.
[5] G. Battail, "Coding for the Gaussian channel: the promise [25] D. Chase, "A class of algorithms for decoding block codes
of weighted-output decoding," Int’l J. Sat. Commun., vol. with channel measurement information," IEEE Trans.
7, 1989, pp. 183-192. Inform. Theory, vol. 18, no. 1, Jan. 1972, pp. 170-182.
[6] J. Hagenauer and P. Hoeher, "A Viterbi algorithm with [26] P. Adde and R. Pyndiah, "Recent simplifications and
soft-decisions outputs and its applications," in Proc. IEEE improvements of Block Turbo Codes," Proc. 2nd Int.
Global Telecommun. Conf. GLOBECOM’89, Dallas, TX, Symp. on Turbo Codes & Related Topics ISTC’00, Brest,
Nov. 1989, pp. 47.11-47.17. France, Sept. 2000, pp. 133-136.
[7] P. Elias, "Error-free coding," IRE Trans. Inform. Theory, [27] R. Pyndiah, "Iterative decoding of product codes: Block
vol. 4, no. 4, Sept. 1954, pp. 29-39. Turbo Codes," Proc. 1st Int. Symp. on Turbo Codes &
[8] R. G. Gallager, "Low-density parity-check codes," IRE Related Topics ISTC’97, Brest, France, Sept. 1997, pp. 71-
Trans. Inform. Theory, vol. IT-8, Jan. 1962, pp. 21-28. 79.
[9] R. M. Tanner, "A recursive approach to low-complexity [28] R. Pyndiah and P. Adde, "Performance of high code rate
codes," IEEE Trans. Inform. Theory, vol. 27, Sept. 1981, BTC for non traditional applications," Proc. 3rd Int. Symp.
pp. 543-547. on Turbo Codes & Related Topics ISTC’03, Brest, France,
[10] S. Benedetto, D. Divsalar, G. Montorsi and F. Pollara, Sept. 2003, pp. 157-160.
"Serial concatenation of interleaved codes: performance [29] R. Zhou, A. Picart, R. Pyndiah and A. Goalic, "Reliable
analysis, design and iterative decoding," IEEE Trans. transmission with low-complexity Reed-Solomon Block
Inform. Theory, vol. 44, no. 3, May 1998, pp. 909-926. Turbo Codes," Proc. IEEE 1st Int. Symp. on Wireless
[11] S. Dolinar and D. Divsalar, "Weight distributions for Commun. Systems ISWCS’04, Mauritius, Sept. 2004, pp.
Turbo Codes using random and nonrandom permutations," 193-197.
JPL TDA Progress Report, vol. 42-122, 15 Aug. 1995. [30] R. Zhou, A. Picart, R. Pyndiah and A. Goalic, "Potential
[12] C. Berrou, C. Douillard and M. Jézéquel, "Multiple applications of low-complexity non-binary high-rate Block
parallel concatenation of circular recursive systematic Turbo Codes," Proc. IEEE Military Commun. Conf.
convolutional (CRSC) codes," Ann. Télécommun., vol. 54, MILCOM’04, Monterey, CA, Oct. 2004.
no 3-4, Mar.-Apr. 1999, pp. 166-172. [31] J. Cuevas, P. Adde, S. Kérouédan and R. Pyndiah, "New
[13] S. Dolinar, D. Divsalar and F. Pollara, "Code performance architecture for high rate turbo decoding of product
as a function of block size," JPL TMO Progress Report, codes," Proc. IEEE Global Telecommun. Conf.
vol. 42-133, May 1998. GLOBECOM’02, Taipei, China, vol. 2, Nov. 2002, pp.
[14] P. Robertson, P. Hoeher and E. Villebrun, "Optimal and 1363-1367.
suboptimal maximum a posteriori algorithms suitable for [32] HomePlug 1.0 Technical White Paper. [Online] Available:
turbo decoding," European Trans. Telecommun., vol. 8, http://www.homeplug.com .
Mar.-Apr. 1997, pp. 119-125.
View publication stats

You might also like