Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Implementation of Viterbi Encoder and Decoder on

FPGA

A Project Report
Submitted for
Digital System Design with FPGAs
(E3: 231)

by

Batch 28
Pramod M and Makarand K Patil

ECE and CEDT


Indian Institute of Science
BANGALORE – 560 012

November 2009
i

Batch 28
Pramod M and Makarand K Patil
November 2009
Acknowledgements

We would like to thank Prof. Kuruvilla Varghese for his guidance and support. We also
thank him for providing us with an opportunity to work with CAD tools. We thank
Mrs. Vedavalli for her support in the course of the project.

ii
Abstract

In this project, Viterbi Encoder and Decoder is implemented on Sparten-3e FPGA. The
transmitter is of constraint length 3 and uses a two state encoder of rate 1/2. The
decoder implements trace back algorithm on a set of 16 bits of received sequence. The
Viterbi decoder can operate at a frequency of 82.7 MHz. It is implemented on FPGA
operation at a frequency of 50 MHz.

iii
Contents

1 Introduction 2
1.1 Viterbi Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Error Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Encoder Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.1 Rate of Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.2 Constraint length (K) . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Encoder Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Implementation Options of Decoder on Hardware . . . . . . . . . . . . . 3
1.5.1 Register Exchange Method . . . . . . . . . . . . . . . . . . . . . . 4
1.5.2 Trace Back Method . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Implementation of Decoder 5
2.1 Branch Metric Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Accumulate Compare Unit . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Normalizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Path Metric Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5 Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.6 LIFO Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.7 Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Performance 11
3.1 Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.1 Timing Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.2 Timing Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Resource Utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1
Chapter 1

Introduction

1.1 Viterbi Algorithm


The Viterbi algorithm is commonly used in a wide range of communications and data
storage applications. The maximum likelihood detection of a digital stream is possible
by Viterbi algorithm. In general convolutional codes are decoded by Viterbi Algorithm.

1.2 Error Correction


For a rate 1/2 encoder with constraint length of 3, the code can correct upto 2 errors in
16 bits of transmitted data. Here it is assumed that errors do not occur in consecutively.
This report details the implementation of the algorithm.

1.3 Encoder Specifications


1.3.1 Rate of Encoder
The code rate, is expressed as a ratio of the number of bits into the convolutional encoder
(k) to the number of channel symbols output by the convolutional encoder (n) in a given
encoder cycle. A rate 1/2 encoder is implemented in the design.

1.3.2 Constraint length (K)


The constraint length parameter, K, denotes the ”length” of the convolutional encoder,
i.e. how many k-bit stages are available to feed the combinatorial logic that produces
the output symbols. Closely related to K is the parameter m, which indicates how many
encoder cycles an input bit is retained and used for encoding after it first appears at the
input to the convolutional encoder. The following are the specifications of the encoder
1. K =3
2. Rate = 1/2

2
3. Hard decision decoding
4. g00 = (111)
5. g01 = (011)
6. Trellis truncation : 16

1.4 Encoder Implementation


Convolutional Encoder shown in Fig. 1.1 takes input data bit and gives out two bits.
Convolutional encoding is a process of adding redundancy to a signal stream. It allows
variable code rates(1/2), constraint lengths(K=3) and generator polynomials. To convo-
lutionally encode data, start with 2 memory registers, each holding 1 input bit. Registers
start with a value of 0. The encoder has 2 modulo-2 adders which is implemented with
a XOR gate. It generates 2 bit polynomials, one for each adder.

X1X0
+ + S0 00 (00)
X0

Din S1 11
01 (01)
D Q0 D Q1
10
11

S2 (10)
00

X1
+
10

CLK
S3 01 (11)

Figure 1.1: Rate 1/2 Viterbi Encoder with Constraint length of 3 along with correspond-
ing trellis diagram

1.5 Implementation Options of Decoder on Hard-


ware
In Viterbi decoder, there are two known memory organization techniques for the storage
of survivor sequences from which the decoded information sequence is retrieved, namely
register exchange method and trace back method. The register exchange method is good
for codes with small number of States and small truncation length of the trellis, while the
trace-back methods are more suitable for codes having many states and long truncation
lengths. Since the design uses decode of code of length 16 bits, trace-back method is
choosen.

3
1.5.1 Register Exchange Method
The register exchange (RE) method is the simplest conceptually and a commonly used
technique. Its disadvantage is the large power consumption and large area required in
VLSI Implementations. The RE method is not practical when K is large.

1.5.2 Trace Back Method


In the TB method, trace-backing of the survivor sequences starts from a node with
minimum accumulated path metric and its surviving path is followed on a depth that is
equal to traceback length (16 here). In the trace-back method the stages of the trellis
are processed in reverse order.
The trace-back method is suitable for codes having many states. During state metric
accumulation, the transition bits for all states and all stages need to be saved in order
to perform traceback. The traceback method stores path information in the form of an
array of recursive pointers.
It is advantageous to think of traceback memory as organized in a two-dimensional
structure, with rows and columns. The number of rows is equal to the number of states
(4 here). Each column stores the results of comparisons corresponding to one symbol
interval or one stage in a trellis diagram. The block diagram of the trace back encoder
is shown in Fig. 1.2.

The subsequent sections detail the implementation TB decoder

Channel
Source Encoder + Decoder Destination
Noise
(a)

BMU ACS Memory Decoder LIFO

(b)

Figure 1.2: (a) Communication system scenario (b) Block diagram implementation of
Trace back Viterbi decoder

4
Chapter 2

Implementation of Decoder

The decoder contains the following blocks. The function of each of the blocks will be
discussed in subsequent sections.
1. Branch metric unit
2. Accumulate compare select unit
3. Normalizer
4. Path metric memory
5. Predictor
6. Last in first out (LIFO) unit
7. Controller

2.1 Branch Metric Unit


It is used to generate branch metrics, which are hamming distances of input data from
00, 01, 10 and 11. The BM unit is used to calculate branch metric for all trellis branches
from the input data. We choose absolute difference as measure for branch metric. These
branch metrics are viewed as being the weights of the branches. The schematic of BMU
is shown in Fig. 2.1.

2.2 Accumulate Compare Unit


The objective of this unit is to choose a path has the least cumulative weight. It is
designed using two adders, one comparator, and a register as shown in Fig. 2.2. Firstly
it adds node value (pmu,pmd) with hamming distances (bmu,bmd) of the two paths
arriving at a node and then compares the two sums. The sum that is lesser is stored in
a 3 bit register (pm register). Depending on result of comparator, a decision output will
be given. The decision will be stored in the path metric memory.

5
y(1:0)

0 0 1 0 1 0 2 0
1 1 0 1 2 1 1 1
1 2 2 2 0 2 1 2
2 3 1 3 1 3 0 3

H0 H1 H2 H3

Figure 2.1: Brach Metric Unit

pmu pmd

bmu bmd
+ +
A B
Decision
A<B

1 0
Din

load 1 0
pm_unreg

rst clk
3-bit reg

pm

Figure 2.2: Accumulate Compare Select Unit

2.3 Normalizer
When the decoder runs continuously, the weight of the branches keeps adding to the reg-
ister of the ACS. When the registers has the maximum value, there may be an overflow.
An overflow can cause erroneous results. Hence a normaliser unit needs to be used. This
unit shifts the contents of all the inputs by one bit left if any of the inputs is greater
than 6.
The path metric calculator unit with the normaliser, branch metric unit and ACS
unit is shown in Fig. 2.3. Each of the ACS units store the accumulated path metric
for each state in the trellis. The Min State Decoder show in Fig. 2.3 is a unit that
picks up the state with minimum accumulated weight and gives a two bit output. This

6
Decision
input
BMU

ACS0 ACS1 ACS2 ACS3

clk

Normalizer

Min state
decoder

Figure 2.3: Path metric calculator for each state

information is used by the Predictor Unit (Sec. 2.5) to decode the bits.

2.4 Path Metric Memory


It is 4x32 bits register bank. The schematic of path metric unit is shown in Fig. 2.4. It
will write 4 bit of data at clock edge while ’we’ input is enabled at the location pointed
by wraddr. It reads one bit of data when ’re’ input is enabled, which is an asynchronous
read operation. The path metric memory stores the decisions provided by the four ACS
wraddr(4:0)
rcaddr(1:0)
rraddr(4:0)

din(3:0)
re Path metric dout
we memory
clk 32x4

Figure 2.4: Path metric memory unit

units. An upper path is encoded as ’1’ and a lower path by a ’0’. The data is written in

7
forward direction (location 0 to location 31) while the data is read in reverse direction.
This operation avoids simultaneous read and write of same memory location.

2.5 Predictor
This unit (shown in Fig. 2.5) is a state machine that is loaded with the state with
minimum accumulated path metric after every 16 clock cycles. This unit uses the state
value to access a bit from the path metric memory unit (PMM), the column address is
provided by the counter. The rows of PMM are accessed in reverse order during trace
back. The bit that is accessed from the PMM is used by the output decoder to predict
the value of the transmitted data. This bit also serves as an input to the next state
decoder. The state diagram is also shown in Fig. 2.5.

=1
rst din
from min_state_decoder

S0 dout=0
=0
din
din

1 0
din=1 load NSD dout
=1

’1’ after every 16 clks


din=0

dout=1 S3 S1 dout=1 OD
clk
1
n=

rst
di

din (from path memory)


di

n=

S2
n=

di
0

dout=0 Reg(1:0)
(column address for path memory)

Figure 2.5: Predictor unit which determines the decoded bit using the present state and
the input din. Here din is acccesed from the PMM unit while the state gets refreshed
every 16 clock cycles.

2.6 LIFO Unit


Every 16 decoded bits put out by the predictor unit is in reverse order of the transmitted
data, this necessitates a LIFO unit. This unit has 2 16-bit registers in which one of the
register is read while the other is written. The schematic of LIFO is shown in Fig. 2.6.

2.7 Controller
A controller is used to synchronize between the different modules described in Sec. 2.2 to
Sec. 2.6. The state diagram of the controller is shown in Fig. 2.7. The complete viterbi

8
COUNT(4)

Left_Shift/Right_shift

0 REG(15:0)1 rst

dout
1
REG(15:0)2

Din

Figure 2.6: 16 bit last in first out unit

decoder is shown in Fig. 2.8.

load=0
start=0 we=0
rst=1 ce=0
re=0
S0
0

sta
rt=

rt=
sta

load=0
we=1 S2 S1 load=1
ce=1 we=0
re=1 ce=0
start=1 re=0

Figure 2.7: State diagram of controller

9
VITERBI DECODER

Sec. 2.7
ACS(3:0)
Normalizer
Dout_decoded
Min_State_Decoder
Din(1:0)
Din(1:0)

LIFO
rst rst
load
clk clk

din

minstate(1:0)
Decision(3:0)
clk
4

count(4)

10
clk clk load

data
32x4
min_state

Path Metric Memory Controller


we we rst
rst
re re
start

waddr
dout

rraddr
rcaddr
start

5
CE din
clk
32-count
Predictor
Max_count/2 Mod 32 Counter
clk
laod dout

Figure 2.8: Viterbi Decoder which includes all the modules described in Sec. 2.2 to
Chapter 3

Performance

3.1 Timing
3.1.1 Timing Constraints
Timing c o n s t r a i n t :

T S c l k = PERIOD TIMEGRP ” c l k ” 1 2 . 2 6 1 ns HIGH 50%;


OFFSET = IN 3 . 2 2 6 ns BEFORE COMP ” c l k ” ;
OFFSET = OUT 1 0 . 1 4 9 ns AFTER COMP ” c l k ” ;

3.1.2 Timing Report


Device , package , s p e e d : x c 3 s 5 0 0 e , f g 3 2 0 , −5

Data S h e e t r e p o r t :
−−−−−−−−−−−−−−−−−
A l l v a l u e s d i s p l a y e d i n n a n o s e c o n d s ( ns )

Setup / Hold t o c l o c k c l k
−−−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−+−−−−−−−−+
| Setup t o | Hold t o | | Clock |
Source | c l k ( edge ) | c l k ( edge ) | I n t e r n a l Clock ( s ) | Phase |
−−−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−+−−−−−−−−+
rst | 1 . 9 8 9 (R ) | 1 . 0 9 5 (R ) | clk BUFGP | 0.000|
start | 2 . 0 1 9 (R ) | −0.190(R ) | clk BUFGP | 0.000|
−−−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−+−−−−−−−−+

Clock c l k t o Pad
−−−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−+−−−−−−−−+
| c l k ( edge ) | | Clock |
Destination | t o PAD | I n t e r n a l Clock ( s ) | Phase |
−−−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−+−−−−−−−−+
dout<0> | 8 . 6 5 6 (R ) | clk BUFGP | 0.000|
dout<1> | 8 . 9 1 4 (R ) | clk BUFGP | 0.000|
dout<2> | 8 . 5 7 6 (R ) | clk BUFGP | 0.000|
dout<3> | 8 . 5 2 6 (R ) | clk BUFGP | 0.000|
dout<4> | 8 . 5 2 9 (R ) | clk BUFGP | 0.000|
dout<5> | 8 . 5 9 8 (R ) | clk BUFGP | 0.000|
dout<6> | 8 . 9 2 7 (R ) | clk BUFGP | 0.000|
dout<7> | 8 . 5 7 8 (R ) | clk BUFGP | 0.000|
−−−−−−−−−−−−+−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−+−−−−−−−−+

Clock t o Setup on d e s t i n a t i o n c l o c k c l k
−−−−−−−−−−−−−−−+−−−−−−−−−+−−−−−−−−−+−−−−−−−−−+−−−−−−−−−+

11
| Src : Rise | Src : F a l l | Src : Rise | Src : F a l l |
S o u r c e Clock | Dest : R i s e | Dest : R i s e | Dest : F a l l | Dest : F a l l |
−−−−−−−−−−−−−−−+−−−−−−−−−+−−−−−−−−−+−−−−−−−−−+−−−−−−−−−+
clk | 12.079| | | |
−−−−−−−−−−−−−−−+−−−−−−−−−+−−−−−−−−−+−−−−−−−−−+−−−−−−−−−+

OFFSET = OUT 1 0 . 1 4 9 ns AFTER COMP ” c l k ” ;


Bus Skew : 0 . 4 0 1 ns ;
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−+−−−−−−−−−−−−−+
PAD | Delay ( ns ) | Edge Skew ( ns ) |
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−+−−−−−−−−−−−−−+
dout<0> | 8.656| 0.130|
dout<1> | 8.914| 0.388|
dout<2> | 8.576| 0.050|
dout<3> | 8.526| 0.000|
dout<4> | 8.529| 0.003|
dout<5> | 8.598| 0.072|
dout<6> | 8.927| 0.401|
dout<7> | 8.578| 0.052|
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−+−−−−−−−−−−−−−+

C o n s t r a i n t s c o v e r 48945 paths , 0 n e t s , and 1649 c o n n e c t i o n s

De s ig n s t a t i s t i c s :
Minimum p e r i o d : 1 2 . 0 7 9 ns {1} (Maximum f r e q u e n c y : 8 2 . 7 8 8MHz)
Minimum i n p u t r e q u i r e d time b e f o r e c l o c k : 2 . 0 1 9 ns
Minimum output r e q u i r e d time a f t e r c l o c k : 8 . 9 2 7 ns

3.2 Resource Utilization


Encoder : 4− b i t R e g i s t e r f o r s i g n a l <A>.

PM Memory : 4− b i t 32−to −1 m u l t i p l e x e r f o r s i g n a l <dread>


128− b i t r e g i s t e r f o r s i g n a l <pmmem>.

Predictor : 4x2−b i t ROM f o r s i g n a l <r c a d d r >.


2− b i t r e g i s t e r f o r s i g n a l <p r s t >.

LIFO : 1− b i t r e g i s t e r f o r s i g n a l <pr <0>>.


16− b i t r e g i s t e r f o r s i g n a l <r eg 1 >.
16− b i t r e g i s t e r f o r s i g n a l <r eg 2 >.

BMU : 4x8−b i t ROM f o r s i g n a l <y>

ACS : 3− b i t comparator l e s s e q u a l f o r s i g n a l <d e c i s i o n $ c m p l e 0 0 0 0 > c r e a t e d a t l i n e 4 8 .


3− b i t adder f o r s i g n a l <downsum>.
3− b i t r e g i s t e r f o r s i g n a l <pm temp >.
3− b i t adder f o r s i g n a l <upsum>.

Viterbi Decoder : f i n i t e s t a t e machine <FSM 0> f o r s i g n a l <pr >.


5− b i t up c o u n t e r f o r s i g n a l <count >.
5− b i t adder f o r s i g n a l <r r a d d r >.

Macro S t a t i s t i c s

# ROMs : 2
4x2−b i t ROM : 1
4x8−b i t ROM : 1
# Adders / S u b t r a c t o r s : 9
3− b i t adder : 8
5− b i t adder : 1
# C ou n ter s : 2
5− b i t up c o u n t e r : 1

12
7− b i t up c o u n t e r : 1
# Registers : 224
F l i p −F l o p s : 224
# Comparators : 7
3− b i t comparator l e s s : 3
3− b i t comparator l e s s e q u a l : 4
# Multiplexers : 1
4− b i t 32−to −1 m u l t i p l e x e r : 1
# Xors : 3
1− b i t x o r 2 : 3

=========================================================================

D e v i c e u t i l i z a t i o n summary :
−−−−−−−−−−−−−−−−−−−−−−−−−−−

S e l e c t e d D e v i c e : 3 s 5 0 0 e f g 3 2 0 −5

Number o f Slices : 237 out o f 4656 5%


Number o f S l i c e Flip Flops : 237 out o f 9312 2%
Number o f 4 i n p u t LUTs : 330 out o f 9312 3%
Number used a s l o g i c : 329
Number used a s S h i f t r e g i s t e r s : 1
Number o f IOs : 12
Number o f bonded IOBs : 12 out o f 232 5%
Number o f GCLKs : 1 out o f 24 4%

Conclusion
Viterbi Encoder and Decoder is implemented on Sparten-3e FPGA. A rate 1/2 encoder
with constraint length 3 is used. The decoder uses Trace Back approach for decoding
since it is less demanding on hardware resources. The decoder is capable of correcting two
bits for every 16 bits of transmitted information. The whole communication system can
operate at a frequency of 82.7 MHz. The design is implemented on an FPGA operating
at 50 MHz. A total of 237 Slices out of 4656 (5%) is used for the implementation.

13
Bibliography

[1] C. Bhargav and HKS Randhawa. Design and Implementation of Viterbi Decoder
Using FPGA. Soft Computing, page 387.

[2] P.J. Black and T.H. Meng. A 1-Gb/s, four-state, sliding block Viterbi decoder.
IEEE Journal of Solid-State Circuits, 32(6):797–805, 1997.

[3] M. Boo, F. Arguello, JD Bruguera, R. Doallo, and EL Zapata. High-performance


VLSI architecture for the Viterbi algorithm. IEEE transactions on communications,
45(2):168–176, 1997.

[4] O. Collins and F. Pollara. Memory management in traceback Viterbi decoders. TDA
Prog. Rep, pages 42–99, 1989.

[5] G. Fettweis and H. Meyr. PARALLEL VITERBI DECODING BY BREAK-


ING THE COMPARESELECT FEEDBACK BOTTLENECK. Communications,
201:88, 1988.

[6] G. Feygin and P. Gulak. Architectural tradeoffs for survivor sequence memory man-
agement in Viterbi decoders. IEEE Transactions on Communications, 41(3):425–
429, 1993.

[7] H.L. Lou. Implementing the Viterbi algorithm. IEEE Signal processing magazine,
12(5):42–52, 1995.

[8] B. Pandita and SK Roy. Design and implementation of a Viterbi decoder using
FPGAs. In Proceedings of the IEEE International Conference on VLSI Design,
pages 611–614.

[9] J.G. Proakis and M. Salehi. Digital communications. McGraw-hill New York, 1995.

[10] S.S. Shah, F. Suleman, S. Yaqub, and C. Logics. VITERBI DECODING IN FIELD
PROGRAMMABLE GATE ARRAYs (FPGAs).

[11] F.L. Vargas, R.D.R. Fagundes, and D. Junior. A FPGA-based Viterbi algorithm
implementation for speech recognition systems. In IEEE INTERNATIONAL CON-
FERENCE ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, volume 2.
Citeseer, 2001.

14

You might also like