Design Space Exploration of High Throughput Finite Field Multipliers For Channel Coding On Xilinx Fpgas

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Adv. Radio Sci.

, 10, 175–181, 2012


www.adv-radio-sci.net/10/175/2012/
Advances in
doi:10.5194/ars-10-175-2012 Radio Science
© Author(s) 2012. CC Attribution 3.0 License.

Design space exploration of high throughput finite field multipliers


for channel coding on Xilinx FPGAs
C. de Schryver, S. Weithoffer, U. Wasenmüller, and N. Wehn
Microelectronic Systems Design Research Group, University of Kaiserslautern, Erwin-Schrödinger-Str., Germany

Correspondence to: C. de Schryver (schryver@eit.uni-kl.de)

Abstract. Channel coding is a standard technique in all wire- components in BCH decoder implementations and consume
less communication systems. In addition to the typically em- a significant amount of hardware resources in the overall de-
ployed methods like convolutional coding, turbo coding or sign (Chen et al., 2009; Zhang et al., 2010).
low density parity check (LDPC) coding, algebraic codes Up to now, a lot of research has been made on area effi-
are used in many cases. For example, outer BCH coding cient and fast FFM architectures. However, Ahlquist et al.
is applied in the DVB-S2 standard for satellite TV broad- haven shown in 1999 (Ahlquist et al., 1999) that not every
casting. A key operation for BCH and the related Reed- VLSI optimized multiplier architecture performs optimal on
Solomon codes are multiplications in finite fields (Galois FPGAs.
Fields), where extension fields of prime fields are used. A lot Already in 1986, Scott et al. have proposed a scalable
of architectures for multiplications in finite fields have been FFM that is easily adaptable to different GF sizes and primi-
published over the last decades. This paper examines four tive polynomials (Scott et al., 1986). Their bit-slice architec-
different multiplier architectures in detail that offer the po- ture makes efficient use of area providing flexibility for dif-
tential for very high throughputs. We investigate the imple- ferent field sizes. Building on the work of Scott et al., Kitsos
mentation performance of these multipliers on FPGA tech- et al. in 2003 proposed a flexible, reconfigurable FFM ar-
nology in the context of channel coding. We study the effi- chitecture for extension fields GF (2m ) (Kitsos et al., 2003).
ciency of the multipliers with respect to area, frequency and Their bit-serial polynomial basis multiplier features reconfig-
throughput, as well as configurability and scalability. The urability for field sizes up to m and reduced power consump-
implementation data of the fully verified circuits are provided tion due to gated clocking. The low hardware complexity
for a Xilinx Virtex-4 device after place and route. promises good area performance.
In this work, we investigate the potential benefit of these
published FFM architectures as well as two “standard” ar-
1 Introduction chitectures for a high-throughput DVB-S2 standard BCH de-
coder. For this purpose we have implemented the selected
Galois or finite field multiplications (FFMs) are key in a wide FFMs as described in Sect. 3 and fully validated them with
range of technical applications, for example in cryptography a hardware-software co-simulation setup. We have synthe-
or channel coding. In particular BCH- and Reed-Solomon sized our implementations and compare area consumption
codes require a lot of Galois field (GF) multiplications to be and achievable throughput for a common Xilinx Virtex-4
performed in the decoding process. Nowadays, BCH codes FGPA target device as described in Sect. 4. Furthermore,
are widely spread and employed in modern communication we evaluate them with respect to their suitability for FPGA
standards like DVB and in error correction units for flash target devices, area consumption, latency and scalability.
memory devices (Liu et al., 2006), for instance. In Sect. 2 Finally, we comment on the impact of using sophisticated
we summarize the theoretical background of FFM and BCH pipelined FFMs in a DVB-S2 BCH decoder and show which
decoding. parts can profit most from those.
However, the specific FFM requirements vary over the ap-
plication range: high Galois field dimensions are used in
cryptography, whereas for FFMs in channel coding the fo-
cus lies on high throughput and low latency. FFMs are key

Published by Copernicus Publications on behalf of the URSI Landesausschuss in der Bundesrepublik Deutschland e.V.
in standards like DVB-T2, DVB-S2, DB-C2, DVB-H, ITU-T
m−1
X H.261 or ITU-T G.975.1, for example.
= A(x) · bk xk mod P (2) BCH codes are usually decoded by algebraic syndrome de-
176
k=0 coding. Figure
C. de Schryver et al.: Design space1 shows the generic
exploration structure
of FFMs of a syndrome
for channel coding
based BCH decoder.
= ((0 · x + bm−1 A(x)) xm−1 + ··· + b0 A(x)) mod P (x) (3)
2 Theoretical
| {z background
}
r S (
K 0 (x) syndrome key equation Chien
generator solver search
For a !better understanding of  the finite field multiplier archi-
0
= ( Kand
tectures (x)x
the +context
bm−2 A(x)of BCHxm−2 +···+b
coding, we0 A(x))
give anmod P (x)(4)
overview j'
| {z } FIFO error
on the underlying mathematics in this section. For more de- buffer correction
K 1 (x)
tails we refer to available standard literature (Bossert, 1998;
Friedrichs, 1995). .. Fig. 1. Block
Fig. Diagram
1: BlockofDiagram
an BCH of
Decoder.
an BCH Decoder
.
2.1 Finite fields and multiplication
= K m−1 (x) mod P (x) (5) It is important to notice that the syndrome S(x) only de-
Equation
pends (6) canerror
on a possible be vector
simplified
e and by
not using the factcode
on the correct that
The basis for the finite field multiplication architectures im- m mod P (x) = p m−1 + ··· + p x + p and by writ-
xword m−1 x
that has been sent. The decoder therefore has to find a
1 0
Thus, the
plemented in this workC(x)
product is a can be obtainedof
reformulation bythe
performing it-
finite field i−1 m−1
eratively m − 1 times: ing K i−1error
possible and A(x)
(x) vector e withasa minimum
km−1 x weight,
+ ··· + k0i−1
that and
means
multiplication in polynomial representation. The derivation m−1
aan
m−1e with
x a minimum
+ ··· + a 0 number of
respectivelycoefficients
and one unequal
obtains: to zero.
of the algorithm
! i−1 originally proposedby Scott et al. (Scott
i In order to calculate e algebraically, the search for minimum
et K
al.,(x) = K
1986) (x)x +inbm−1−i
is outlined A(x) mod P (x)
the following. (6)
m weight
i
ism−1
Xtransformed into a search for a polynomial with
The nonzero elements of the extension field GF (2 ) can
where K − 1(x) = 0. K (x) =
minimum degree kji x j(Bossert, 1998). (7)
be constructed by powers of the primitive element α, where α j =0
is a root of a primitive polynomial P (x) = x m +pm−1 x m−1 +
··· + p1 x + p0 over GF (2). Since P (α) = 0 (α being root of with
P ) α m = pm−1 α m−1 + ··· + p1 α + p0 and therefore nonzero i−1
the elements can be written in the canonical base representa- kji = km−1 pj + kji−1
−1 + bm−1−i aj and i−1
k−1 =0 (8)
tion: With Eqs. (6) and (8), the multiplication algorithm over
m−1 GF (2m ) is given, that is used for the architectures of the
{am−1 α + ··· + a1 α + a0 |ai ∈ GF (2) for 0 ≤ i ≤ m − 1}
Scott- and Kitsos-multiplier.
2.1.1 Multiplication algorithm
2.2 BCH decoding
Based on the work of Scott et al., an algorithm for FFM can
BCH codes have been invented in 1959 by Hoc-
be derived similar to the design proposed by Kitsos et al. as
quenghem (Hocquenghem, 1959), and independently in
follows: Let A, B ∈ GF (2m ) in canonical base representa-
1960 by Bose and Ray-Chaudhuri (Bose and Ray-Chaudhuri,
tion and P primitive polynomial over GF (2). Then let
1960). These codes provide high flexibility and predictable
C = A(x) · B(x) mod P (1) error correction ability, and hence are used in a wide range
of technical applications. Details about their construction
m−1
X and decoding can be found in literature (Bossert, 1998;
= A(x) · bk x k mod P (2) Friedrichs, 1995).
k=0 Two main categories of BCH codes exist: so-called prim-
itive and non-primitive ones. Code words of primitive BCH
= ((0 · x + bm−1 A(x)) x m−1 + ··· + b0 A(x)) mod P (x) (3) codes are always of fixed length n = 2m − 1, with n being
| {z }
K 0 (x) the length of the code word and m the dimension of the Ga-
lois field 2m . Non-primitive BCH codes provide flexibility
in choosing the appropriate code word size. In contrast to
 
= ( K 0 (x)x + bm−2 A(x) x m−2 + ··· + b0 A(x)) mod P (x) (4) code puncturing, where one or more positions in a code word
| {z } are omitted, the minimum distance in a reduced code is not
K 1 (x) changed (Bossert, 1998). Non-primitive BCH codes are used
in standards like DVB-T2, DVB-S2, DB-C2, DVB-H, ITU-T
.. H.261 or ITU-T G.975.1, for example.
.
BCH codes are usually decoded by algebraic syndrome de-
= K m−1 (x) mod P (x) (5) coding. Figure 1 shows the generic structure of a syndrome
based BCH decoder.
Thus, the product C(x) can be obtained by performing it- It is important to notice that the syndrome S(x) only de-
eratively m − 1 times: pends on a possible error vector e and not on the correct code
  word that has been sent. The decoder therefore has to find a
K i (x) = K i−1 (x)x + bm−1−i A(x) mod P (x) (6) possible error vector e with a minimum weight, that means
an e with a minimum number of coefficients unequal to zero.
where K − 1(x) = 0. In order to calculate e algebraically, the search for minimum

Adv. Radio Sci., 10, 175–181, 2012 www.adv-radio-sci.net/10/175/2012/


C. de Schryver
C. de Schryver, et al.: Design
S. Weithoffer, space exploration
U. Wasenmüller, N. Wehn:of FFMs
DesignforSpace
channel coding of FFMs for Channel Coding
Exploration 177 3

weight is generator
The syndrome transformedunit into(SGU)
a search for a polynomial
computes the syn-with
minimum
drome S(x)C.that degree
is fed into(Bossert, 1998).
a so-called
de Schryver, S. Weithoffer, U. key equation N.
Wasenmüller, solver
Wehn: Design Space Exploration of FFMs for Channel Coding 3
The syndrome generator unit (SGU) computes the syn-
(KES). The KES computes the error locator polynomial
drome S(x) that is fed into a so-called key equation solver
Λ(x) out (KES).
of theThesyndrome.
syndrome The Chien
generator
The KES computes thesearch
error unit
unit (SGU) (CSU)
computes
locator af- syn-
the
polynomial
dromefor
terwards checks S(x)all that is fed into
elements in thea so-called
specified key equation
Galois fieldsolver
3(x) out of the syndrome. The Chien search unit (CSU) af-
if (KES).
GF (2m ) terwards
Λ(x) = The
0. KES
In thiscomputes
case, the the error
error locator
vector e polynomial
has
checks for all elements in the specified Galois field a 1
Λ(x) out of the syndrome. The Chien search unit (CSU) af-
at the corresponding
GF (2m ) if 3(x)position;
= 0. Inthe
thiserror
case,correction then flips
the error vector e has a 1
terwards checks for all elements in the specified Galois field
the bits atatall
therecognized
corresponding
m error positions
position; the in the
error received
correction
GF (2 ) if Λ(x) = 0. In this case, the error vector e has a 1
code
then flips
word r. theatbitsthe at all recognized
corresponding error positions
position; the errorincorrection
the received
thencode
flips
word
the r.
bits at all recognized error positions in the received code
word r.
3 Hardware Implementation
3 Hardware implementation
The hardware 3 implementation
Hardware Implementation of the FFMs should be flexi-
The hardware implementation of the FFMs should be flexi-
ble and generic
bleThe with
andhardware respect
generic with to the primitive
respect to the polynomial and and
implementation of primitive
the FFMspolynomial
should be flexi-
field size.field
Furthermore,
blesize.
and generic
they
Furthermore, should
they to
with respect
exploit
should the underlying
exploit
the primitive the underlying
polynomial and Fig. 2: Scott-Multiplier Basic Cell
structuresstructures
offield
the size.
Virtex-4Furthermore, they should exploit the small
of the FPGA-platform
Virtex-4 FPGA-platform well to yield
well to yield small
underlying Fig. 2. Scott-Multiplier Basic Cell.
Fig. 2: Scott-Multiplier Basic Cell
area usageareawhile
usage
structures providing
while sufficient
of theproviding
Virtex-4 throughput.
sufficient
FPGA-platformthroughput.
well to yield small
For the target
For DVB-S2
the
area usage whilestandard,
target providing BCH
DVB-S2 standard, codes
BCH
sufficient basedbased
codes
throughput. on theon the
Galois 14fields 2 14
16 and 216 are used. As this research was per-
Galois fields 2For and 2 are
the target DVB-S2
used.standard, BCH codes
As this research wasbased
per-on the
formed informed
Galois
the infields
context the of 2a14more
context 16more complex communication sys-
andof2acomplex
are used. As this researchsys-
communication was per-
tem, the common clock frequency of the whole system was was
tem, the
formed common
in the clock
context offrequency
a more of
complexthe whole system
communication sys-
tem, the common
predefined to 200MHz. clock frequency of the whole system was
predefined to 200MHz.
predefined
The hardware to 200MHz.
implementations considered in our work are
The hardware
basedThe
implementations
on hardware
(Scott et implementations
considered
al., 1986) and (Kitsos
in et
consideredour work
in
al.,
areFur-
our work
2003). are
based on thermore,
(Scott et
based onwe al., 1986)
(Scott and
giveetinsight (Kitsos
al., 1986)
intoand et al.,
the (Kitsos 2003). Fur-
et al.,of2003).
architecture Fur-
the plain
thermore,combinatorial
we give insight
thermore, weFFM giveinto the into
insight architecture
implementations used. of theofplain
the architecture the plain
combinatorial FFM implementations
combinatorial FFM implementations used. used.
3.1 Scott-Multiplier
3.1 Scott-Multiplier
3.1 Scott-Multiplier
Fig. 3. Scott-Multiplier ArrayArray
of Four Cells for m =for 4. m = 4.
The architecture of the Scott-multiplier resembles the mul- Fig.Fig. 3: Scott-Multiplier
3: Scott-Multiplier Array of of Four
FourCells Cells for m = 4.
The architecture
tiplication algorithm of derived
the Scott-multiplier
in Sect. 2.1. resembles
Equation the(8)
mul-is
The architecture of the Scott-multiplier resembles the mul- In addition to the two 2-input AND-gates, two 2-input
tiplication algorithm derived
Fig. in
2).Section 2.1.of Equation (8)
tiplicationmapped
algorithm
is
to a basic
mapped derived
to a
cell (see
basic incellSection
(see
It
Figure
consists
2.1. 2). Equation
It
two (8)
consists
2-input
of two Ingated
XORaddition
gates and
clocking to leads
the two
flip-flop 2-input
to anrequired
increased AND-gates,
due to Equation
power two
efficiency (8),2-input
the
when
AND-gates, two 2-input XOR gates and a flip-flop. Addi- XOR gates and flip-flop required due to Equation (8), the
is mapped to a basic
2-input cell (seetwo
AND-gates, Figure
2-input2).XOR It consists
gates and ofa two
flip-flop. basic
the cells
multiplier of the
is Kitsos-multiplier
reconfigured to lowercontain
dimensiona demultiplexer
fields.
tionally, our implementation contains flip-flops for storing
2-input AND-gates, two 2-input XOR gates and flip-flops a flip-flop. basic andcells of thetoXOR
aaddition
two-input Kitsos-multiplier
the and
two AND contain
as shownaindemultiplexer
gateAND-gates, Figure 4.
theAdditionally,
coefficients our of implementation
the A-operand containsduring a calculation for stor-
and In 2-input two 2-input
Additionally, ingourthe implementation A-operandflip-flops
coefficients of the contains for stor-
during a calculation and andXOR a two-input
Similargatestoand XOR and
the flip-flop AND gate
requiredmdue
Scott-multiplier, as
basicshown
cells(8),
to Eq. in
form Figure ar-4.
theanbasic
control logic for that. Still, such cells fit comfortably into a
control
ing the coefficients logic for that.
A-operand
of theslice Still, such
duringcells fit comfortably
a calculation into a
andmul- Similar ray to
cells of to allow
thethe for multiplication
Kitsos-multiplier
Scott-multiplier, over
contain GF (2 m
).
a demultiplexer
m basic The
cells formand Kitsos-
an aar-
Virtex-4 FPGA that offers two 4-input LUTs, two
Virtex-4 FPGA slice that cells
offersfit
two 4-input LUTs, two mul- multiplier can
control logic for that.
tiplexers, Still,
two 1-bit such
registers comfortably
and arithmetic logic. into a ray two-input
to allow XOR for be reconfigured
and AND gateatas
multiplication run
over time
GFto
shown (2support
in m 4. any field
Fig.). The Kitsos-
tiplexers, two 1-bit registers mand arithmetic logic. dimension
Similar j withScott-multiplier,
1 ≤ j ≤ m by selecting a shorter feedback
Virtex-4 FPGA For aslice that offers
multiplier over GF two (2 4-input
), m basicLUTs,cellstwo mul-
are connected
For a multiplier over GF (2m ), m basic cells are connected multiplier cantobethereconfigured at runmtime basic tocells
support form an ar-
any field
path and
ray to allow disabling
for part of
multiplication the flip-flops.
over GF (2Our
m ). implementa-
The Kitsos-
tiplexers, into
two an1-bit
arrayregisters and
that calculates
into an array that calculates
arithmetic
the product logic.
C = A · B mod P in dimension
the product C = A · B mod P in j with 1 ≤ j ≤ m by selecting a shorter feedback
tion also features
multiplier built-in default
can be reconfigured primitive
at run time topolynomials
support any and field
mm clock
For a multiplierclockcycles.
over GF
cycles. This(2m
This array
), mcan
array canbebecells
basic viewed areasconnected
viewed asaalinear
linearfeed-
feed- pathdimension
and disabling
further control partto of the flip-flops. Our implementa-
back shift register and is illustrated in Fig. 3. The primitive j with 1 ≤ j ≤ m by selecting a shorterprimitive
logic be able to use an arbitrary feedback
into an arraybackthatshift
calculates
register and the isproduct C=
illustrated A · B mod
in Figure 3. ThePprimitive
in tionpath
alsoand
polynomial.features An built-in
example default
m=
forthe primitive
4 is given Our inpolynomials
Figure 5. and
polynomial ofofthethe
Galois disabling part of flip-flops. implementa-
m clock cycles. This
polynomial array can field
Galois befieldcan
canarbitrarily
viewed be
as a linear
arbitrarily beset atatdesign
feed-
set design further control logic to be able to use an arbitrary primitive
tion also features built-in default primitive polynomials and
back shifttime.
time. and is illustrated in Figure 3. The primitive
register polynomial.
3.3 Plain
further An
control example
Multiplier m =to4use
forable
logic toArchitectures
be is given in Figure
an arbitrary 5.
primitive
polynomial of the Galois
3.23.2 Kitsos-multiplier field can arbitrarily be set at design polynomial. An example for m = 4 is given in Fig. 5.
Kitsos-Multiplier
time. 3.3 InPlain
the original BCH decoder
Multiplier implementation, the multipliers
Architectures
This
This architecture
architectureextends
extendsthe theScott-multiplier
Scott-multiplierby byaareconfig-
reconfig- used Plain
3.3 were based on a architectures
multiplier pure polynomial division algorithm.
3.2 Kitsos-Multiplier
urable
urable feedback
feedbackpath, path,gated
gatedclocking
clockingand andisisalsoalsobased
based on on As a first step, plain unrolling of this polynomial division
(8). The(8).
Eq.Equation reconfigurable feedback path allows
The reconfigurable feedback path allows for a for a run- In the
In original
ledtheto original BCHBCHdecoder
pure combinatorial decoder implementation,
FFMs. In the secondthe
implementation, themultipliers
step multipliers
we have
This architecture
time extends
reconfiguration the ofScott-multiplier
the multiplier. by
This
run-time reconfiguration of the multiplier. This reconfigura- a reconfig-
reconfigurabil- used usedwere
introduced
were based
based on
pipeline ona a pure
registers
pure polynomial
to shorten
polynomial the division
critical
division algorithm.
path of the
algorithm.
ity
urable feedback aims to support
path,togated
bility aims not only
supportclocking arbitrary primitive
and is also
not only arbitrary polynomials
based
primitive on
polynomi- As a
As first
circuit.
a step,
first plain
step, plain unrolling
unrolling of
of this polynomial
polynomial division
division
but
Equation (8). also
als The field
but also sizes
reconfigurable 1 ≤ i ≤
field sizes 1 ≤feedbackm for an
i ≤ m for path array
an array length
allowslength m. The
form.a The led to
led pure
to purecombinatorial
combinatorial FFMs.
FFMs. In
In the
We use both plain FFM types as a comparison for ourhave second
second step
step wewe have
re-
gated clocking leads to an increased
run-time reconfiguration of the multiplier. This reconfigura- power efficiency when introduced pipeline registers to shorten the critical path of the
sults. Detailed numbers for all multipliers are given in Fig-
bility aims the multiplier is reconfigured to lower
to support not only arbitrary primitive polynomi- dimension fields. ure 8 and Table 2.
circuit.
www.adv-radio-sci.net/10/175/2012/ Adv. Radio Sci., 10, 175–181, 2012
als but also field sizes 1 ≤ i ≤ m for an array length m. The We use both plain FFM types as a comparison for our re-
gated clocking leads to an increased power efficiency when sults. Detailed numbers for all multipliers are given in Fig-
the multiplier is reconfigured to lower dimension fields. ure 8 and Table 2.
4 C. de Schryver, S. Weithoffer, U. Wasenmüller, N. Wehn: Design Space Exploration of FFMs for Channel Cod
4178 C. de Schryver, S. Weithoffer, U. C.
Wasenmüller,
de SchryverN.etWehn: Design
al.: Design Space
space Exploration
exploration of of FFMs
FFMs forfor Channel
channel Coding
coding
!
!
4 C. de Schryver, S. Weithoffer, U. Wasenmüller, N. Wehn: Design Space Exploration of FFMs for Channel Coding
!

"#$ "#$
%"$
%"$
&'"
(
&'"
"#$ (
%"$
&'"
(

inner: m=8, n=128, t=3, p=1 / outer: m=13, n=4096, t=7, p=1
Fig. 4: Kitsos-Multiplier Basic Cell
inner: m=8, n=128, t=3, p=1 / outer: m=13, n=4096, t=7, p=1
Fig. 4. Kitsos-Multiplier Basic Cell.
Fig. 4: Kitsos-Multiplier Basic inner:Fig.
m=8,6:n=128,
Slice t=3,
Distribution
p=1 / outer:ofm=13,
BCHn=4096,
Decoder
t=7,Components
p=1
Fig. 4: Kitsos-Multiplier Basic CellCell Fig. 6. Slice Distribution of BCH Decoder Components.
Fig. Distribution
Fig. 6: Slice 6: Slice Distribution of BCH
of BCH Decoder Decoder Component
Components
They share the same wires and are selected via multiplexers
at the inputs and outputs.
However,
They share
They theshare
the complexity
same wires
the of some
and
same are
wires FFMs
selectedandvia can
are be reducedvia
multiplexers
selected at multiplex
Figure 6 clearly shows for one
two input
explicit configuration cases
atdesign
the inputs time, andsinceoutputs.they have fixed to a constant
at
that these
value
the
of8the
out6 clearly
inputs
∗t+ and outputs.
2 FFMs
specific in total consume the vast majority
Figure shows forGalois field
two explicit used. This
configuration is the
casescase
of the overall
Figure area required by the complete BCH decoder.
for these
that all 2×t t + 2 6FFMs
8 ∗multipliers clearly
ininthe shows
totalsyndrome
consumefor computation
two
the explicit
vast majority configuration
unit and ca
for In
all Table
that
t × p 1
these the
multipliers8latency
∗ t + in 2 of
the the
FFMs
Chien
of the overall area required by the complete BCH decoder. singlein decoder
total
search stages
consume
unit. is given
the vast majo
for
InThe combinatorial
ofkeythe
Table overall
1equation
the latency and ofScott
area
solver theonly FFMs
required
singlecontains based
byfull
decoder the on
FFMs
stagestheiswith
complete decoder
given BCH
non- pa- decode
Fig. 5: Kitsos-Multiplier Array for Maximum m = 4 rameters.
constant
for combinatorial Due
inputs. to To
and the high
support parallelism
bothbased GFthe in the
dimensions KES, the
14 major-
(2 pa-andstages is giv
ity
16 ) of
In
decoding
Table time
1 Scott
the FFMs
latency
is parallelism
spend in SGU
of on the decoder
single
andKES, CSU.
decoder
Nevertheless,
2
rameters. in the
Due DVB-S2
to the BCH
high decoder, inevery
the FFM instance
the major- in-
Fig. 5. Fig. 5: Kitsos-Multiplier
Kitsos-Multiplier Array for Array m = 4. m = 4
for Maximum
Maximum for combinatorial and Scott FFMs based p, onreduc-
the decoder
3.4 BCH Decoder Architecture ity these
ternally units
of decoding consistscan
timeof easily
istwo
spendbe
FFMs,speeded
in SGU oneand up by
forCSU. eachincreasing
GF dimension.
Nevertheless,
Fig. Array for Maximum m = 4 these ingunitsrameters.
the latency Due
in theseto theandhigh
units up1/p.
by parallelism
However, p,in
this the
willKES,
also the maj
3.4 5: BCH Kitsos-Multiplier
Decoder Architecture They share cantheeasily
same bewires
speeded are selected
by increasing via multiplexers
reduc-
Our original decoder implementation
introduced pipeline registers to shorten the critical path of the is derived from the ing increaseity of
the
latencyand
at the inputs decoding
consumed
in these
outputs. units by 1/p. However, this will also Neverthele
time
area is
by spend
a factor in SGU
of nearly and p. CSU.
generic
Our
3.4 circuit.
BCH template
original
Decoder shown
decoder in Figure 1. The key
implementation is derived from the
Architecture equation solver increase
Figurethese
the units shows
consumed
6 clearly can easily
area by two
for beexplicit
a factor speeded
of nearly up
p. by increasing
configuration cases p, red
isgeneric
based on
templatea fully
shown parallel
in Euclidean
Figure 1.
We use both plain FFM types as a comparison for our re- The implementation
key equation solverthat that theseing 8the × tlatency
+ 2 FFMs in inthesetotal units
consume by 1/p.
the vast However,
majority this will a
Ourusesis based
sults.
original on a fully
non-pipelined
Detailed numbers
decoder parallel
combinatorial
for allEuclidean
implementation FFMsimplementation
multipliers as
isaredescribed
given in in
derived that
Fig.
fromSec-2 theof4 theResults
overall area
increase therequired
consumed by thearea completeby a BCH factordecoder.
of nearly p.
uses
tion
and non-pipelined
3.3.
Table SGU 2. and CSU use the same type of FFM. Since we 4 Results
combinatorial FFMs as described in Sec- InTable 1 the latency of the single decoder stages is given
generiction template
3.3. SGU and shownCSU in
useFigure
theto same 1. type
Theofkey FFM.equation
Sincein solver 4.1 Validation
wethe
intend to enhance our decoder Reed-Solomon codes for combinatorial and Scott FFMs based on the decoder pa-
is based
intendBCH
3.4
future, onto ahave
weenhance fullydecided
decoder parallel
our decoder
architecture
to goEuclidean
tofor a fullyimplementation
Reed-Solomon parallelcodes in the that
Euclidean 4.1 Validation
rameters. Due to the high parallelism in the KES, the major-
usesbased
non-pipelined
future, we have combinatorial
decided to go for aFFMs
fully
key equation solver. The decoder can be configured at as described
parallel Euclidean in Sec- With 4increasing
ity of decoding Results time complexity
is spend ininSGU current andhardware designs, per-
CSU. Nevertheless,
based
Our
tiondesign
3.3. SGU key
original
time withequation
decoder
and CSU solver. The
implementationdecoder
use the parameters:
the following can
is be
derivedconfigured
same type of FFM. Since wethesefrom at
the With increasing
manent unitsevaluation complexity
can easilyand in current
verification
be speeded hardware
up by is increasingdesigns,
crucial throughout p, per-
reduc-the
design
generic time with
template the following
shown in 1. parameters:
The key equation solver is manent
design evaluation
process. and verification is crucial throughout the
intend to enhance our decoder to Reed-Solomon codes in thedesign ing the4.1 latency in To
Validationthesespeed
units upby the1/p.otherwise
However, verythistime willconsum-
also
– n:onlength
based a fullyofparallel
the code word implementation that uses
Euclidean process. To speed up the otherwise very time consum-
future, we have decided to go for a fully parallel Euclideaning simulation process, the VHDL models of the FFMs have have
– n: length of the code word ing
increasesimulation
the consumedprocess, areathe by VHDL
a factor models
of nearlyof the
p. FFMs
non-pipelined combinatorial FFMs as described in Sect. 3.3. beenWith exhaustively co-simulated with C++ reference code for
based –key m:equation
dimensionsolver. of the extension
The field 2mm be configured at increasing complexitywith C++inreference current hardware designs, p
SGU – m:anddimension
CSU use the of the
same typedecoder
extension FFM.2can
of field Since we intend to been exhaustively co-simulated
GF-multiplication. This co-simulation has been carried out
code for
design –time
enhance withdecoder
our
t: numbers the following
of errorsto Reed-Solomon
that canparameters: codes in the future,
be corrected manent
GF-multiplication. evaluation
This and
co-simulation verification
has been is crucial
carried out throughout
– t: numbers of errors that can be corrected 4usingResultsMentor Graphics ModelSim and a SystemC test bench
we have decided to go for a fully parallel Euclidean based using MentordesignGraphics process. ModelSim
To speed andup a SystemC
the otherwise test bench very time consu
– keyn:–– length
p: ofsolver.
the code word as illustrated
the as illustrated in Figure
in Figure 7.process, 7. As
As only the only field
fieldVHDL sizes
sizes upmodels to 2 toare
up 16 216 are
p: number
equationnumber of bits
of bits The that
that are processed
decoder
are processed
can be in inparallel
parallelinatinthe
configured de- Validationour BCH decoder, all meaningful stimuli haveFFMs h
ing
4.1 interest simulation of the
sign syndrome
time
syndrome with thegenerator
following
generator unitparameters:
unit andthe
and theChien
Chiensearch searchunitunit of of
interest for for our BCH decoder, all meaningful stimuli have
– m: dimension of the extension field 2 m
beenbeen been
applied
applied
exhaustively
andand the the
modelsmodels co-simulated
havehave therefore
therefore
withfully
been been
C++ reference code
fully
veri- veri-
In all three units in the upper part of Figure 1, FFMs are With
fied. increasing
GF-multiplication. complexity in current
This hardware
co-simulation designs,has per-
been carried
–In n:
alllength
three units
of theincodethe upper
word part of Figure 1, FFMs are fied.
– thet: numbers
key components of errors
(Chen thatet can
al., be
2009)
the key components (Chen et al., 2009) (Zhang et al., 2010). corrected
(Zhang et al., 2010). manent evaluation and verification
using Mentor Graphics ModelSim and a SystemC is crucial throughout the test ben
– m: dimension of the extension m
field 2can design process. To speed up the otherwise very time consum-
However,
However, the
the complexity
complexity of some
of someFFMs FFMs canbebereduced
reducedat at 4.24.2Synthesis Synthesis Results
Results 16
p: number
– design
design time,
time, since of bitstheythat
they have
have are one
one processed fixedin
inputfixed
input totoparallel
a aconstant
constant in theing simulation process, the VHDL models of the FFMs have to 2
as illustrated in Figure 7. As only field sizes up
– t: numbers of errors that can be corrected been of interest for
exhaustively our BCHwith
co-simulated decoder, all meaningful stimuli h
syndrome
value
value out
out of generator
of the specific unit
specific Galois
Galois andfieldtheused.
field Chien
used. search
This
This unit
isisthethecase
case All All multiplier
multiplier architectures
architectures described
described inC++ in reference
Section Section
3 have 3code
beenfor
have been
been
GF-multiplication. applied and
This the models
co-simulation have
has been therefore
carried for low fully v
been
out
for–all
for p:22∗∗number
all tt multipliers
multipliers
of bitsin the
inthatthesyndrome
syndrome
are processedcomputation
computation
in parallel unit inand
unit and implemented
the implemented in VDHL.
in VDHL. They havehave
They beenbeen synthesized
synthesized for low
Inforall
forall three
all
syndrome units
tt∗∗pp multipliers in
multipliers
generatorthe
in
in the upper
unitChien
the Chien part
and the searchof
search
Chien Figure
unit.
unit. 1,
search unit FFMs are using
area fied.
Mentor
consumption
area consumption Graphics
on aon ModelSim
Xilinx
a Xilinx Virtex and
4 FPGA
Virtex a SystemC
4 FPGA (XC4VLX100, test
(XC4VLX100,bench
the keyThe components
The equation(Chen
key equation
key solver
solveronly et al.,
only 2009)full
contains
contains (Zhang
full FFMs
FFMswithetwith
al.,
non-
non- aspackage
2010).packageillustrated
FF1148). in Fig.
FF1148). The7.The
As only
appliedapplied fieldtool
tool sizeschain
chain up 216 are
wastoXilinx
was of in-ISE
ISE
Xilinx
However, In
constant
constant allthe three
inputs. units
complexity
inputs. To in
To support the
support upper
both
both GF
of some part
GFFFMsof Fig.
dimensions
dimensions 1,
can be(2(2FFMs
1414 are
and
reduced and at terest
IDE IDE for
Release
4.2 our
Release BCH
11.3
Synthesis
11.3 decoder,
andand the Results all meaningful
optimization
the optimization stimuli
efforteffort
for place have and
for place beenand
the16 key
2216 ))time, components
in the
thesince
DVB-S2 (Chen
BCH et al.,
decoder, 2009;
every Zhang
everyFFM et
FFMinstanceal., 2010). applied
in-in- route waswas and the
set set models
to high. Tablehave therefore
2 gives detailed been fully
synthesis verified.
resultsresults
design in DVB-S2 they BCH have one input
decoder, fixed to a constant
instance route to high. Table 2 gives detailed synthesis
ternally
ternally consists of two FFMs, one for each GF dimension. (post place and route).
value out ofconsists of two Galois
the specific FFMs, one field forused.
each GF This is the case (postAll
dimension. place and route).
multiplier architectures described in Section 3 have b
Adv. Radio Sci., 10, 175–181, 2012
for all 2 ∗ t multipliers in the syndrome computation unit and implemented in www.adv-radio-sci.net/10/175/2012/
VDHL. They have been synthesized for l
for all t ∗ p multipliers in the Chien search unit. area consumption on a Xilinx Virtex 4 FPGA (XC4VLX1
The key equation solver only contains full FFMs with non- package FF1148). The applied tool chain was Xilinx I
Two main categories of BCH codes exist: so-called prim-
C. de Schryver, S. Weithoffer,
2.1.1 Multiplication U. Wasenmüller, N. Wehn: Design
Algorithm itive andSpace Exploration
non-primitive of FFMs
ones. Code forprimitive
words of Channel Coding
BCH
codes are always of fixed length n = 2m − 1, with n being
C.Based on the work
de Schryver of Design
et al.: Scott etspace
al., anexploration
algorithm forofFFM
FFMs the length
canfor channel of the code word and m the dimension of the Ga-
coding 179
be derived similar to the design proposed by Kitsos et al. as lois field 2m . Non-primitive BCH codes provide flexibility
Component
follows: Let A, B ∈ GF (2m ) in canonical base representa-
Latency inCombinatorial FFMs Latency
choosing the appropriate Scott
code word size.FFMs
In contrast to
Table
tion1.and
Latency Distribution
P primitive of BCHover
polynomial Decoder
GF Components.
(2). Then let code puncturing, where one or more positions in a code word
Key Equation Solver 4∗t+2 (4 ∗ t + 2) ∗ m
are omitted, the minimum distance in a reduced code is not
Syndrome Generator UnitLatency
C = A(x) · B(x) mod PComponent
n/p −1
(1)Combinatorial
(n/p − 1) ∗ m
changedFFMs(Bossert,Latency
1998).Scott
Non-primitive
FFMs BCH codes are used
Chien Search Unit n/p + 1 in standards like DVB-T2, DVB-S2,(n/p + 1) ∗ m DVB-H, ITU-T
DB-C2,
Key Equation Solver 4×t +2 (4 × t + 2) × m
m−1
X H.261 or ITU-T (n/pG.975.1, formexample.
k Syndrome Generator Unit n/p − 1 − 1) ×
= A(x) · bk x mod P TableUnit (2)
1: Latencyn/pDistribution BCH codes are
of BCH Decoder usually decoded by algebraic syndrome de-
Components
Chien Search +1 (n/p + 1) × m
k=0 coding. Figure 1 shows the generic structure of a syndrome
based BCH decoder.
= ((0 · x + bm−1 A(x)) xm−1 + ··· + b0 A(x)) mod P (x) (3) The overhead of the Kitsos-multiplier implemen
| {z }
K 0 (x) heavily
syndrome impacts
r the area
S
key equation ( usage.Chien While still less a
generator solver search
! 0  m−2 needed as for either one of the polynomial-division
= ( K (x)x + bm−2 A(x) x +···+b0 A(x)) mod P (x)(4) multipliers, the Kitsos-multiplier
| {z } FIFO errorrequires
j' roughly ten
1 buffer correction
more area than the Scott-multiplier yet providing the
K (x)

throughput. A BCH decoder for DVB-S2 requires FF


..
.
Fig. 8. Throughput
two per Slice
Fig. different
1: Block GF @200
Diagram ofMHz.
dimensions: 214 and 216 . As a resu
an BCH Decoder
instances of the Scott-multiplier (one for each dime
=K m−1
(x) mod P (x) (5) It is important to notice that the syndrome S(x) only de-
heavily
pends on outperform
impacts
a possible the
theerror
area Kitsos-multiplier
usage.
vector e andWhile
not on still clearly.
less area
the correct codeis
Thus, the product C(x) can be obtained by performing it- needed
word that ashas
forbeen
either oneThe
sent. of decoder
the polynomial-division-based
therefore has to find a
eratively m − 1 times: possible4.3
multipliers, theImpact
error vector on BCH
Kitsos-multiplier
e with Decoder
requires
a minimum roughly
weight, thatten times
means
more areaa than
an e with the Scott-multiplier
minimum yet providing
number of coefficients unequal the same
to zero.
! 
K i (x) = K i−1 (x)x + bm−1−i A(x) mod P (x) (6) throughput.
In orderWith Arespect
BCH edecoder
to calculate for DVB-S2
algebraically, the
to the consumed requires
search area, Tablefor2 shows th
for FFMs
minimum
two different
weight GF dimensions:
is transformed 214 and
into a search for216
a .polynomial
As a result, two
where K − 1(x) = 0.
thedegree
instances
minimum of
total number1998).
the (Bossert,
of slices
Scott-multiplier (one
occupied
for each
bywith
dimension)
the Scott-mu
Fig. 7. HW/SW Co-Simulation Schematic.
Fig. 7: HW/SW Co-Simulation Schematic is 9the
outperform GF 214 . That
forKitsos-multiplier are 5.4% of the 168 slices re
clearly.
by the plain combinatorial FFM for this GF dimensio
4.3 GF 216
Impact on,BCH
the decoder
Scott-FFM only consumes 5.5% of area
4.2 Synthesis results
the FFM pairs in the DVB-S2 decoder as explained i
With respect to the consumed area,Table 2 shows that the the
All multiplier architectures described in 3 have been im- tion 3.4,
total number this occupied
of slices results in byathe
total area savingisof9 around 94
Scott-multiplier
plemented in VDHL. They have been synthesized for low for GF the
2 . FFMs.
14 With
That are 5.4 % ofrespect
the 168to Figure
slices 6, we
required thus expect
by the
area consumption on a Xilinx Virtex 4 FPGA (XC4VLX100, high potential for area saving by employing Scott-FF
plain combinatorial FFM for this GF dimension. For GF 2 16 ,
package FF1148). The applied tool chain was Xilinx ISE the Scott-FFM
IDE Release 11.3 and the optimization effort for place and
the keyonly consumes
equation 5.5 %
solver of area.
part. For the FFMthe area oc
Nevertheless,
pairs in the DVB-S2 decoder as explained in Sect. 3.4, this
route was set to high. Figure 2 gives detailed synthesis re- by the multiplexers and registers will not be reduced
results in a total area saving of around 94 % for the FFMs.
sults (post place and route). thermore,
With respect to Fig.the area
6, we reduction
thus of FFMs
expect a very with constant in
high potential
Due to the given system requirements, the main focus lay for areain the SGU
saving and CSU
by employing will also
Scott-FFMs be key
in the significantly
equation lower.
on a resulting small footprint on the FPGA for given through- solver part.ByNevertheless, the area occupied by the multiplex-
using pipelined multipliers, the latency in each d
puts. Therefore the metric considered most important was ers andcomponent
registers will is
notincreased
be reduced.by Furthermore,
throughput per slice at a fixed clock frequency. Figure 2
a factor ofthemarea (see Table 1
reduction of FFMs with constant inputs as in the SGU and
shows that the required target clock frequency of 200 MHz CSU will thealso
configurations
be significantly shown
lower. in Figure 6, the latencies o
Fig. 8: Throughput per Slice @200MHz
could be achieved by all implementations, except the Kitsos- and pipelined
By using CSU aremultipliers,
already massively
the latency indominating
each decoderover the K
multiplier for field dimensions m ≥ 15. componenttency. Using pipelined
is increased by a factor FFMs
of m (see in Table
SGU1). andForCSU would
For the Scott-multiplier, we can achieve possible
Due to the given system requirements, the main focus lay clock fre- the configurations shown in Fig. 6, the latencies
lead to a significant decrease of the overall of SGU anddecoder th
quencies of more than 1 GHz due to the short
on a resulting small footprint on the FPGA for given through- critical path CSU are already massively dominating over the KES latency.
(see Fig. 2). Furthermore, the Scott-multiplier outperforms
put, by only small area reduction in total.
Using pipelined FFMs in SGU and CSU would there lead to
puts. the
Therefore the metric considered most important
other multiplier architectures and even the plain combi- wasa significantHowever,
decrease ofFigure 6 shows
the overall decoderthatthroughput,
the KESbyis the domi
throughput
natorial per sliceimplementation
multiplier at a fixed clock frequency.
with respect Table 2only small
to the metric partarea
in the decoder
reduction with respect to area, but hardly cont
in total.
showsintroduced
that theabove.
required
This target
is shownclock
in Fig.frequency
2. of 200MHz However, to the overall
Fig. 6 showslatency.
that the KES Thisis the
means that adding
dominating part additio
When comparing the Scott-multiplier and the
could be achieved by all implementations, except the Kitsos- plain com- in the decoder with respect to area, but hardly contributes
tency in the KES will only insignificantly increase the o
binatorial
multiplier multiplier,
for field the area m
dimensions saving
≥ outweighs the lesser
15. to the overall latency. This means that adding additional la-
throughput (factor m1 ) by a factor from 14.75 (for m = 12)
latency of the decoder. On the other hand, smaller FF
tency in the KES will only insignificantly increase the over-
Forupthe Scott-multiplier,
to 18.1 (for m = 16). we can achieve possible clock fre- theofKES
all latency will show
the decoder. a considerable
On the impact
other hand, smaller FFMson the overa
quenciesTheof overhead
more than 1GHz
of the due to the implementation
Kitsos-multiplier short critical pathin the KES sumed willarea,
show since those FFMs
a considerable impact use upoverall
on the most of the har
(see Figure 2). Furthermore, the Scott-multiplier outper- resources (see Section 3.4).
formswww.adv-radio-sci.net/10/175/2012/
the other multiplier architectures and even the plain In order toAdv. achieve
RadioaSci.,
well-balanced
10, 175–181, 2012decoder desig
combinatorial multiplier implementation with respect to the therefore suggest to employ two different FFM types
metric introduced above. This is shown in Figure 8. decoder: combinatorial FFMs in the SGU and CSU
180 C. de Schryver et al.: Design space exploration of FFMs for channel coding

Table 2. Detailed Synthesis Result.

Type m slice-FFs 4-input slices throughput throughput per slice max. freq.
LUTs total [ results
sec ]@200MHz results ]@200MHz
[ sec·slice [MHz]

Scott 12 12 15 8 1.66 × 107 2.08 × 106 1034.1


13 13 16 8 1.53 × 107 1.92 × 106 1177.9
14 14 17 9 1.42 × 107 1.58 × 106 1189.1
15 15 16 8 1.33 × 107 1.66 × 106 1116.1
16 16 19 10 1.25 × 107 1.25 × 106 1083.4
Kitsos 12 59 136 83 1.66 × 107 0.2 × 106 226.50
13 64 160 100 1.53 × 107 0.15 × 106 212.27
14 69 186 110 1.42 × 107 0.12 × 106 209.03
15 74 191 114 ∗ ∗ 169.81
16 79 169 104 ∗ ∗ 198.77
Plain 12 392 177 200 20 × 107 106 833.33
(pipelined) 13 457 205 246 20 × 107 0.81 × 106 840.34
14 527 236 270 20 × 107 0.74 × 106 889.68
15 602 240 323 20 × 107 0.61 × 106 871.84
16 682 302 348 20 × 107 0.57 × 106 701.26
Plain 12 53 169 115 20 × 107 1.73 × 106 373.13
(combinatorial) 13 61 182 130 20 × 107 1.53 × 106 394.01
14 75 237 168 20 × 107 1.19 × 106 365.63
15 66 220 148 20 × 107 1.35 × 106 401.61
16 71 281 181 20 × 107 1.1 × 106 340.14

∗ : Maximum clock frequency was below 200 MHz.

consumed area, since those FFMs use up most of the hard- is outperformed by the other multipliers, despite the inher-
ware resources (see Sect. 3.4). ent architectural flexibility of the Kitsos-multiplier. Further-
In order to achieve a well-balanced decoder design, we more, we conclude that the key equation solver part can profit
therefore suggest to employ two different FFM types in the most from pipelined finite field multipliers, and that simple
decoder: combinatorial FFMs in the SGU and CSU, and combinatorial multipliers provide a higher benefit in the syn-
small pipelined FFMs in the KES. This setup will exploit drome computation and the Chien search.
the benefits of both FFM types and reduce the overall area of
the decoder, by only adding an insignificant amount of total
latency.
References

Ahlquist, G. C., Nelson, B. E., and Rice, M.: Optimal Finite Field
5 Conclusions Multipliers for FPGAs, in: FPL ’99: Proceedings of the 9th In-
ternational Workshop on Field-Programmable Logic and Appli-
Finite field multiplications are key in many technical applica- cations, pp. 51–60, Springer-Verlag, London, UK, 1999.
tions, particularly in decoding BCH codes. In this work we Bose, R. C. and Ray-Chaudhuri, D. K.: On a class of er-
have investigated available pipelined architectures with re- ror correcting binary group codes, Inform. Control, 3, 68–79,
spect to scalability, throughput and area consumption on FP- doi:10.1016/S0019-9958(60)90287-4, 1960.
GAs. We have implemented and synthesized those designs Bossert, M.: Kanalcodierung, B. G. Teubner Stuttgart, 2nd edn.,
1998.
for a common target device, a Xilinx Virtex-4 XC4VLX100
Chen, Z., Zhang, Y., Ying, Y., Wu, C., and Zeng, X.: An Area-
FPGA. Our implementations have been fully verified in a
Efficient and Degree-Computationless BCH Decoder for DVB-
hardware-software co-simulation. We have evaluated and S2, in: Proceedings of the IEEE 8th International Conference on
compared the selected architectures and show that for the ASIC, (ASICON) 2009. , 489–492, doi:10.1109/ASICON.2009.
choosen effiency metric the Scott-multiplier features in prin- 5351625, 2009.
ciple the best performance. For application of the considered Friedrichs, B.: Kanalcodierung, Springer Verlag Berlin Heidelberg,
multipliers in a DVB-S2 BCH decoder the Kitsos-multiplier 1995, ISBN 3-540-59353-5.

Adv. Radio Sci., 10, 175–181, 2012 www.adv-radio-sci.net/10/175/2012/


C. de Schryver et al.: Design space exploration of FFMs for channel coding 181

Hocquenghem, A.: Codes correcteurs derreurs, Chiffres, 2, 147–56, Scott, P., Tavares, S., and Peppard, L.: A Fast VLSI Multiplier for
1959. GF(2m), in: IEEE Journal on Selected Areas in Communications
Kitsos, P., Theodoridis, G., and Koufopavlou, O.: An efficient re- (J-SAC) 1986, 4, 62–66, 1986.
configurable multiplier architecture for Galois field GF (2m), Mi- Zhang, B., Liu, D., Wang, S., Chen, X., and Liu, H.: Design and Im-
croelectr. J., 34, 975–980, 2003. plementation of Area-Efficient DVB-S2 BCH Decoder, in: Com-
Liu, W., Rho, J., and Sung, W.: Low-Power High-Throughput BCH puter Engineering and Technology (ICCET), 2010 2nd Interna-
Error Correction VLSI Design for Multi-Level Cell NAND Flash tional Conference on, 3, V3–179–V3–184, doi:10.1109/ICCET.
Memories, in: Proceedings of the IEEE Workshop on Signal 2010.5485823, 2010.
Processing Systems Design and Implementation, (SIPS) 2006.
, 303–308, doi:10.1109/SIPS.2006.352599, 2006.

www.adv-radio-sci.net/10/175/2012/ Adv. Radio Sci., 10, 175–181, 2012

You might also like