FPGA Implementation of An OFDM Modem: Aifeng Ren, Ming Luo, Fangming Hu

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

FPGA Implementation of an OFDM Modem

Aifeng Ren*, Ming Luo*, Fangming Hu*

*School of Electronic and Engineering, XiDian University, Xi’an 710071, P. R. China


E-mail: afren@mail.xidian.edu.cn; Fax: 8629-88202830

Keywords: OFDM, FPGA, Intellectual property, FFT, EDA the home networking space, working groups such as HomeRF
and HomePlug have adopted OFDM multi-carrier
Abstract modulation[2,3]. Key advantages to the adoption of OFDM in
the PHY layer for these applications include simplified
Orthogonal frequency division multiplexing(OFDM) is a equalization for narrowband channels, high system
multi-carrier system where data bits are encoded to multiple throughput, and immunity to noise.
sub-carriers, while being transmitted simultaneously. OFDM In recent years, FPGAs have become key components in the
modulation can reduce the influence of inter-symbol implementation of high performance DSP systems, especially
interference (ISI) and enables high-quality communication, in the areas of digital communications, networking, video and
and is increasingly being used in environments that exhibit imaging. In addition, the main manufactures of PLD offer a
severe multipath. Although OFDM in theory has been in great variety of IP cores for communications and DSP
existence for a long time, recent developments in digital applications such as FFT func-tion, FIR compiler,
signal processing (DSP) and field programmable gate array numerically controlled oscillators (NCO) function,
(FPGA) technologies have made it a feasible option. In this convolutional encoder, Viterbi compiler and so on, and offer
paper, an implementation of an OFDM transceiver on FPGA electronic design automation (EDA) development kits which
by instantiating parameterizable signal processing intellectual allow to quickly design a communication system such as DSP
property (IP) functions is presented. The FPGA resource Builder, SOPC (system on a programmable chip) Builder,
requirements of the various sub-systems is reported and the Quartus II and Nios II embedded soft processor from Altera.
design methodology employed IP design, verification and
FPGA implementation is described. 2 OFDM Baseband Model
1 Introduction In this paper we focus on OFDM baseband structure, shown
as Fig. 1. At the transmitter, the data is coded and interleaved.
OFDM is a multi-carrier modulation scheme that encodes If there are to be M subcarriers, baseband processing allows
data onto a radio frequency (RF) signal. Unlike conventional M parallel subcarrier modulation streams to be generated in
single-carrier modulation schemes, such as amplitude or the frequency domain as complex vectors reflecting the
frequency modulation (AM/FM), which send only signal at a amplitude and phase of each subcarrier. An inverse FFT
time using one radio frequency, OFDM sends multiple high- (IFFT) of size N t M converts the complex data from the
speed signals concurrently on specially computed, orthogonal frequency domain into the time domain, effectively
carrier frequencies [1]. The result is much more efficient use modulating the parallel streams onto M subcarriers. The
of bandwidth as well as robust communications during noise cyclic prefix (CP) is then inserted to each symbol prior to
and other interferences. digital-to-analog conversion (DAC) and transmission.
Using OFDM to combat ISI and inter-channel interference Transmitter Pilot Symbols
(ICI) is not new. However, practical implementation of
Data FEC Constellation Inserting
OFDM has been historically limited to the speed and
Ă

Interleaver S/P P/S


Encoder Mapper IFFT CP
efficiency of the fast Fourier transform (FFT) function. High- /
Channel
Model
performance programming logic devices (PLDs) have enabled
Data Viterbi Constellation FFT Removing
modern OFDM systems. Recently, OFDM has been adopted
Ă

Deinterleaver P/S S/P


Decoder Demapper CP
into several European wireless communications applications Ă

such as the digital audio broadcast (DAB) and terrestrial Receiver Channel Estimation
digital video broadcast (DVB-T) systems. In the United States,
Fig. 1 The baseband block diagram of implemented OFDM
OFDM has been adopted in multipoint multichannel
transceiver
distribution services (MMDS). Both wireless LAN
At the receiver, after analog-to-digital conversion (ADC) and
applications—using standards such as IEEE 802.11a—and
removing the CP, a size N FFT acts as a bank of matched
the new European Telecommuni-cations Standard Institute’s
filters to translate the received signal to a parallel stream of
(ETSI) HiperLAN/2 specification have also installed OFDM
as the modulation scheme. Wired applications have already M d N complex data representations of the received
implemented OFDM-based systems as discrete multitone modulation constellation values for each of the M subcarriers.
(DMT) systems in xDSL and cable modem applications. In Equalization for channel distortions, deinterleaving, and
decoding then result in the receiver’s estimate of the

761
transmitted data stream. decoder checks for and corrects any errors. In RS IP function,
OFDM PHY layer baseband transmitter is composed of three the variable option allows us to vary L and R, form their
major parts, which are channel coding, modulation, and minimum allowable values up to their selected values.
OFDM transmitter. For the receiver complimentary (2) Convolutional Encoder/Viterbi Decoder IP functions: A
operations are applied in the reverse order. The channel convolutional encoder and Viterbi decoder can be used
coding includes randomization, forward error correction together to provide error correction over a noisy channel.
(FEC), and interleaving. The goal of channel coding is to Viterbi decoding, which is also known as maximum
improve the bit error rate (BER) performance of power- likelihood (ML) decoding or forward dynamic programming,
limited and bandwidth limited channels by adding structured is the most common way of decoding convolutional codes by
redundancy to the transmitted data [4]. Modulation is the using an asymptotically optimum decoding technique. The
process of mapping the digital information to analog form so Viterbi algorithm finds the most likely sequence of bits that is
it can be transmitted over the channel. For an OFDM system closest to the actual received sequence. The Viterbi decoder
the changing of phase and amplitude can be done but the uses the redundancy, which the convolutional encoder
frequency cannot change because they have to be kept imparted, to decode the bit stream and remove the errors.
orthogonal. The constellation mapper implemented in FPGA The Viterbi decoder IP function works on blocks of data, or
user logic can take symbols as inputs and map them to continuous streams. It takes in n symbols at a time for
appropriate constellation points as dictated by the modulation processing, where n is the number of encoded symbols. The
method specified. traceback length is the number of trellis states processed
The OFDM transmitter includes assembling OFDM frame, before the decoder makes a decision on a bit. This IP function
creating OFDM signal by performing IFFT and FFT, and is capable of a throughput (decoded bits output) of over 240
inserting CP used to cancel ISI. Practical implementations of Mbps; 218 Mbps is achievable for relatively large constraint
OFDM modem rely upon the IFFT/FFT to eliminate the need lengths, such as 7. The IP function can support many different
to separately demodulate and modulate the many different puncturing rates, which can be changed in real time.
subcarriers. The advantage of using available signal (3) Turbo encoder/decoder IP function: Turbo codes have
processing IP cores in OFDM implementation is that each of been advocated for channel encoding, which outperform
the functional blocks of the OFDM baseband can be mapped convolutional encoding.
onto dedicated, parallel hardware resources within the Turbo Encoder
Input bits
programmable logic device (PLD)—avoiding the difficult
programming and optimization challenges of scheduling time- Interleaver
critical operations through a single DSP device. Encoder 1

Encoder 2
3 System Implementation
Puncture
3.1 Forward Error Correction (FEC) Transmitted Transmitted
parity bits bits
The signal processing FEC IP cores from Altera include high-
performance encoding and decoding for Reed-Solomon (RS), Channel
convolutional, Viterbi, and turbo codes. Turbo Decoder
(1) Reed-Solomon encoder/decoder IP function: Fig. 2 shows Received Received bits
parity bits
the RS codeword example. max-logMAP
De-puncture
4 to 10 bits per symbol codeword Decoder 1
Estimated bits

0010 0110 1010 0011 0111 1011


Interleaver n De-interleaver

Information symbols, which Check


contain the original data symbols
max-logMAP
Fig. 2 RS codeword example Decoder 2
RS codes are described as (L, K) where L is the total number
of symbols per codeword, K is the number of information Fig. 3 The block diagram of Turbo Encoder/Decoder IP
symbols. R=N-K is the number of check symbols known as function
redundancy. To use RS codes, a data stream is first broken Turbo encoding gives a relatively large encoding gain which
into a series of codewords. Each codeword consists of several a reasonable computational complexity. This encoding
information symbols followed by several check symbols. In scheme is useful for data services that permit longer
theory, symbols can contain an arbitrary number of bits. The transmission delays. The Altera Turbo Encoder/Decoder IP
Altera RS IP function supports four to ten bits per symbol. In function dramatically shortens design cycles. Fig. 3 shows the
an error correction system, the encoder adds check symbols to block diagram of Altera Turbo Encoder/Decoder IP function.
the data stream prior to its transmission over a Table 1 shows the performance of the turbo encoder and
communications channel. Once the data is received, the RS decoder IP function using the QuartusII EDA software and
default parameters available in Stratix and Cyclone FPGA

762
devices, which are EP1S10F780C6 and EP1C20F400C7, error rate (BER), is computed as
including all of the off-IP function memories. Pr C ( I r2  Qr2 ) (2)
Table 1 The performance of turbo encode/decode IP function  (2 m  p 1)
Logic where C 2 is a scaling constant, m is the resolution
Memory Frequency of the I-Q components and p is the desired resolution.
Device Elements
(Bits) (MHz)
(LEs) Table 2 shows the performance of demapper function for the
EP1S10F780C6 7517 73216 95 constellation mapper/demapper IP-core. For user-defined
EP1C20F400C7 7517 73216 83 decoding, the resource usage varies depending on the
specified demapping. The results were generated using
3.2 Interleaver and Deinterleaver Quartus II 8.0 software and EP1C20F400C7 device.
Table 2 The performance of demapper function for
Depending on the practical design of the transmitter, we can constellation mapper/demapper IP-core
use the symbol interleaver/deinterleaver IP-core wizard to I/Q Logic
Demodula- Decoding Eb/No Frequency
implement interleaving/deinterleaving functions, including tion Scheme
Resolu
Scheme (dB)
Elements
(MHz)
convolutional and block interleaving. -tion (LEs)
Binary
Convolutional interleaver/deinterleaver functions process data BPSK 8 15 347 274.53
Decoding
in a continuous stream, which makes them ideal for high- 8-PSK 8 User-Defined 15 452 256.87
speed applications that require correction for burst errors, Gray
16-QAM 11 15 376 245.22
such as DVB. Typically, these functions are used with Reed- Decoding
256-QAM 16 User-Defined 15 413 228.50
Solomon encoder/decoder functions.
Block interleaver/deinterleavers process data in a discrete
stream and are used in applications such as GSM (i.e., mobile 3.4 IFFT Transmitter and FFT Receiver
phones). These functions are often used with RS functions or Since an OFDM symbol can be defined by an IFFT, the
Turbo code encoder/decoder functions. mathematical model of an OFDM symbol to be transmitted is
Compared to block interleaver/deinterleaver functions, given by
convolutional interleaver/deinterleaver provide reduced delay 2S jnk
and lower memory usage for the same distribution of errors. 1 N 1
xn ¦ k N ,
Nk0
X e n 0,1," , N  1 (3)

3.3 Constellation mapper In order to avoid inter-channel interference, zeros are padded
equally at the beginning and end of an OFDM symbol to
The constellation mapper/demapper IP-core function allows
perform N-point IFFT at the transmitter [2].
designers to rapidly prototype constellation mappers
The FFT IP-core function, which can be parameterized to use
/demappers for use in many digital modulation/demodulation
quad-output or single-output engine architecture, can
schemes. Using this core, both the mapper and demapper can
implement a complex FFT or IFFT for high-performance
be designed rapidly in tandem, which provide automatically
applications. A quad-output FFT engine architecture is
matched symbol encoding and decoding.
optimal for the applications where transform time is to be
Multi-carrier OFDM systems are considered superior to n-
minimized. And a single-output engine is most suitable for
many independent subbands, each modulated by a single-
the applications where the minimum-size FFT function is
carrier modulation technique. The constellation mapper takes
desired. To increase the overall throughput of the FFT
symbols as inputs and maps them to appropriate constellation
function, we can use multiple parallel engines of a variation.
points as dictated by the modulation method specified. This
x[k ,0] G[ k , 0] X [k ,0]
process generates I and Q values which are then filtered and RAM-0 BFPU RAM-0
x[k ,1] G[ k ,1] X [k ,1]
sent to the IFFT for transformation. RAM-1 BFPU RAM-1
SW

SW

The constellation mapper/demapper IP function provides RAM-2 x[k , 2] G[ k , 2] X [k , 2]


BFPU RAM-2
error computation and received signal power estimation. This RAM-3 x[k ,3] G[ k ,3] X [k ,3] RAM-3
BFPU
IP function outputs the cartesian/rectangular error ( I e , Qe )
between the received signal point coordinates ( I r , Qr ) and 0 1 2
ROM
the expected I-Q components that map directly to a signal in
Fig. 4 The diagram of the quad-output FFT engine from
the constellation ( I s , Qs ), which is computed as Altera FFT IP function
Ie Ir  Is Fig. 4 shows a diagram of the quad-output FFT engine from
(1)
Qe Qr  Qs Altera FFT IP function[6]. The engine implementation
This error indicates how much signal degradation has computes all four radix-4 butterfly complex outputs in a
occurred due to channel noise and the error is often fed back single clock cycle.
into channel equalization circuits and clock data recovery Complex data samples x[k , m] are read from internal
(CDR) circuits [5]. memory in parallel and reordered by switch (SW). Then the
The power estimation of the received signal, which automatic ordered samples are processed by the radix-4 butterfly
gain control (AGC) loops can use to determine if there is processor to form the complex outputs G[k , m] . Due to the
sufficient energy in the constellation to maintain the same bit inherent mathematics of the radix-4 decimation-in-frequency

763
(DIF) decomposition, only three complex multipliers are complex zero-mean Gaussian noise with variance V w2 .
required to perform the three non-trivial twiddle-factor
We choose two statistical criteria which are Least Mean
multiplications on the outputs of the butterfly processor. To
Square (LS) estimation and Linear Minimum Mean Square
discern the maximum dynamic range of the samples, the four
estimation (LMMSE).
outputs are evaluated in parallel by the block-floating point
units (BFPU). The appropriate least significant bits (LSBs) Șˆ LS y./ xt (8)
are discarded and the complex values are rounded and Șˆ LMMSE Rhh [ Rhh  V w2 ( xt xtH ) 1 ]1 Șˆ LS (9)
reordered before being written back to internal memory. H
where Rhh E[Șˆ LS <Șˆ LS ] . [<] represents Hermitian transpose.
H
Table 3 shows the performance of the FFT IP function using
the streaming data flow engine architecture in two FPGA From (8) and (9), we can see the computation complexity of
devices including all of the off-IP function memories. LS based channel estimation is much less than LMMSE, but
Table 3 The performance of FFT IP function using the taken into the consideration of estimation accuracy LMMSE
streaming data flow engine architecture gives a better solution.
Clock The channel estimation algorithms can be implemented in
Logic Frequency Transform
Device Points Cycle High-density FPGA with NiosII embedded processor. The
Elements (MHz) Time (us)
Count channel estimator occupy 738 logic units (LU) which includes
EP1S10F780C6 512 4510 255.62 512 1.03 1 NiosII CPU, 2 embedded memory blocks.
EP1C20F400C7 512 4671 243.18 512 2.0

3.5 Cyclic Prefix (CP) 4 Conclusion


After performing IFFT and serialization, user logic of FPGA Fourth generation (4G) wireless communications systems will
can be designed to append the last part of the OFDM symbol offer exciting new functionality such as high quality, high
creating a cyclic prefix. data-rates and a broad range of services at the price of an
increased complexity in the design and verification of the 4G
3.6 Channel Estimation equipments. However, a high throughput platform, such as
new system architectures in application-specific integrated
Channel estimation can be done both in time domain and circuit (ASIC), system-on-chip (SoC) and embedded software
frequency domain. For an OFDM system FFT need to be design, are required to support capacity enhancement
performed for all carriers, so frequency domain processing is techniques. At the same time new design methodologies,
straightforward. The impulse response of a time varying radio based on system level simulation platforms and on the
channel is usually represented as a discrete time finite availability of pre-designed IP functions, have to be employed
impulse response (FIR) filter [7]. in order to accelerate the development process and to meet the
D n (t )e  j 2S fcW n (t )G (W  W n (t )) aggressive product development schedules.
h(W ; t ) ¦
n
(4)
The proposed architectures of the paper serve as a solid
where D n (t ) is the attenuation factor for the signal received foundation for future cost-effective ASIC implementation of
the baseband processing circuit in OFDM systems.
on the nth path, W n (t ) is the propagation delay for this path.
The practical applications generally assume that the channel References
is quasistationary, that is, the channel does not change during
the data packet [8]. With this assumption the time dependency [1] V. N. Richard, P. Ramjee. “OFDM Wireless Multimedia
in (4) can be dropped Communications”, Artech House, Boston (2000).
h(W ) ¦ D n e  j 2S fcW n G (W  W n ) (5) [2] H. Shinsuke, P. Ramjee. “Multicarrier Techniques for 4G
n Mobile Communications”, Artech House, Boston (2003).
Then the discrete time frequency response of the channel is [3] L. L. Zhang, “A study of IEEE 802.16a OFDM-PHY
the Fourier transform of the channel impulse response Baseband”, Master thesis, Electronics Systems at the
Department of Electrical Engineering Linkoping Institute
Kk DFT {hn }, (k 0,1,", N  1) (6)
of Technology (2005).
Hence Kˆk is represented as the channel estimation of Kk . [4] H. Juha, T. John, “OFDM Wireless LANs:A Theoretical
and Practical Guide”, First Edition, Sams Publishing
Frequency domain channel estimation can be performed with
(2001).
training data transmitted on every subcarrier. We define xt [5] Shannon C. E. “A Mathematical Theory of
to be the transmitted training vector, Ș [K0 ,"K N 1 ]T to be Communication”, Bell System Technical Journal, volume
27, pp.379-423 (1948).
the frequency domain channel response, w to be the white
[6] Altera. “FFT Megacore Function User Guide”, Altera
noise vector of the channel, where [<]T denotes transpose. Corporation (2004).
Then the received signal vector y will be [3] [7] Proakis, J. G, “Digital Communications”, McGraw-Hill,
Boston, 3ed. (1995).
y diag (Ș) ˜ xt  w (7) [8] Joha Heiskala, John Terry. “OFDM Wireless LANs: a
Without loss of generality, the training data amplitudes have Theoretical and Practical Guide”, Indianapolis, Ind. Sams
been selected to be equal to one, and each entry of the noise (2002).
vector is the independent identically distributed (i.i.d.)

764

You might also like