You are on page 1of 5

A Rapid Prototyping Methodology for Algorithm Development in Wireless

Communications

A.Burg, B. Haller, E. Beck, M. Guillaud, M. Rupp, L. Mailaender


Bell-Labs Wireless Research, Lucent Tech.
791 Holmdel-Keyport Rd., Holmdel NJ, 07733-400
apburg@iis.ee.ethz.ch, {bhaller1, ericbeck, maxime, rupp, lm}@lucent.com

Abstract: are therefore being used to work around this problem.


However, even these simulation models often do not
Rapid prototyping has become an important means to represent the actual system in sufficient detail. Working
verify the performance and feasibility of algorithms and in a real environment becomes especially critical when
concepts for wireless communications. This paper algorithms rely on special properties of the transmission
presents an FPGA/DSP based rapid prototyping channel. Multiple antenna systems employing transmit
methodology and platform that requires only minimal diversity or space-time coding (e.g., BLAST [5]) are
IC/FPGA design skills and merely a very basic good examples. It is also important for establishing
knowledge of hardware description languages such as reliable and reasonable standards, and for developing
VHDL. It allows algorithm designers to implement their more advanced channel models.
ideas quickly and to test them in real-time under real Real-time testbeds also become increasingly
world conditions. The design and implementation of a important as functions at the application layer start to
WCDMA downlink is used as an example to describe interact more closely with functions at the physical
our initial experience with the proposed flow. layer. The design of speech codecs and of joint
source/channel or source/modulation coding algorithms
for wireless multimedia applications are good examples.
I. Introduction The goal of the “Bell Labs Algorithm Development
and Evaluation” (BLADE) initiative is to provide a
In many areas rapid prototyping has become an platform and a methodology that allows a fast and easy
important means for the functional verification of a verification of research ideas from the physical through
design in a well-defined environment. Its main goal is to the application layer at an early stage in the
to avoid extremely long simulation runs and to improve development process. It allows researchers to complete
the functional assessment of complex systems. simulations in a real environment and to get realistic
In the development of wireless communication assessments about the complexity and implementation
systems rapid prototyping becomes important at an costs of their ideas.
early stage of the development [1−3]. The fact that these This paper is organized as follows: In Section II a
systems have to deal with a variety of effects in a hardware platform that fits the needs of prototyping for
multifaceted environment makes the development of wireless communications is presented and the used
new algorithms and their accurate verification via software tools as well as the BLADE methodology are
computer simulations a difficult task. A first evaluation described. The system is designed to provide
of the performance of a novel idea is usually done researchers with a means to take their algorithms to a
mathematically based on very basic, typically simplistic hardware implementation, without requiring in-depth
assumptions. Gaining confidence in algorithms knowledge of IC/FPGA design methodologies or
developed with theoretical models by means of hardware description languages (e.g., VHDL or
simulations is an essential part of the early development Verilog). We briefly present a prototype implementation
stage, since wireless channels suffer from a wide variety of a WCDMA downlink system together with some
of effects that are often difficult to model accurately. measurement results and the integration of a voice
Simulation tools like THE MATHWORKS Simulink [4] codec, based on the BLADE methodology in Section III.
In Section IV we discuss our experiences from this first language or HDLs in order to rapidly realize ideas in
experiment and give some suggestions for future hardware. It should also provide algorithm designers
improvements. Finally, Section V concludes the paper. with a way to integrate their blocks into an existing
high-level Simulink simulation setup for verification
II Setup and Methodology and successive refinement down to the implementation
level.
A. System Hardware To achieve this the suggested design methodology
relies on the use of C as the primary description
language for the implementation and on the 3L
The hardware setup consists of a flexible combination
Diamond real-time DSP operating system [9] for inter-
of TI ’C6x series digital signal processors (DSPs) [6]
process communication. The proposed design flow is
and XILINX Virtex series field-programmable gate
outlined in Figure 2. The right side of the figure shows
arrays (FPGAs) [7]. The basis for the system is a
the simulation flow and the left side depicts the flow
development platform from SUNDANCE [8] that consists
that leads to the real-time implementation. C code that
of carrier boards with a PCI bus interface along with a
describes cycle or block oriented operations can be
variety of high performance DSP and FPGA modules.
simulated using the block oriented Simulink
Each module is equipped with four asynchronous bi-
environment. Automatically generated wrappers are
directional 20Mbytes/s links. Some FPGA modules
used to create the required S-functions. However typical
provide additional synchronous 16-bit parallel links. A
systems also contain a certain amount of control flow
typical PCI carrier board supports up to four modules.
that can either be described as a finite state machine
Evaluation boards from different vendors are also used
(FSM) or as procedural (i.e., linear) C code with
for data conversion (i.e., ADC and DAC). The use of
synchronization capabilities. Procedural descriptions
well supported off-the-shelf components saves a lot of
are often more convenient and intuitive They are also
time during the implementation phase as compared to
required for complex block oriented functions that
custom-made systems. Figure 1 shows a basic setup.
execute asynchronously with respect to the other
Supplementing the high speed DSPs with FPGAs is
functions in the system and are preferably mapped onto
necessary since the DSPs’ capabilities of handling high
a DSP. During the simulation phase, these pieces of
rates are very limited both in terms of compute power
code need to be co-simulated, running as separate tasks
and I/O bandwidth. The FPGAs also help relieve the
in a multitasking environment in parallel to the cycle
DSPs from performing the relatively simple but high
oriented Simulink simulation. The interfaces into the
rate data path type operations that frequently occur in
Simulink environment (and later to the FPGA) are
the front end of wideband communication systems. [14]
provided through a library of interface templates that
SUNDANCE are specific to the current simulation (or realization)
environment. In a UNIX simulation for example
sockets are used to emulate the com-port interface.
PCI
DSP-1 Memories, ADCs or DACs and their interfaces are also
TI-C6x
ADC/DAC

instantiated from the template library.


Host-PC

FPGA
FPGA

Implementation Simulation
RTL
DSP-2 C-Code
TI-C6x
A|RT-Builder Wrapper
Template Library
VHDL

Synthesis, C-compiler
Figure 1: Basic hardware setup for BLADE P&R
procedural
B. BLADE Methodology C-Code

The main goal of the BLADE methodology is to provide FPGA DSP/3L UNIX SIMULINK
a way to make use of the combined capabilities of DSPs
and FPGAs without having to employ assembler Figure 2: BLADE design flow
After a successful initial simulation of the system, the CDMA system, with a separate sync/pilot-channel and a
design can be transferred to the real-time platform. A variable set of users transmitting at different rates. The
final decision has to be made regarding whether algorithm designers’ motivation behind putting this
functions should be mapped onto the DSP or FPGA. system onto the test platform was to explore the
Procedural functions are placed on the DSP, while both performance gain that can be achieved in such a system
DSPs and FPGAs are possible targets for the remaining by using an equalizer structure instead of the usual rake
cycle/block oriented functions, depending on their receiver. The system also contains many of the
performance requirements. frequently used blocks in wideband communications
For those block operations that are mapped onto the and it was therefore a good candidate to build up a
DSP a real-time operating system (3L Diamond) is used collection of blocks for future reuse.
to provide the communication interfaces between them.
Each block runs as a separate task. Activity on the ports
can control the scheduling process. Communication Voice- CDMA- IF
channels replace the ports of the blocks in the Simulink Coder Transmitter
block diagram and a configuration file is used to specify
the links between them. The linear code that was used Analog Signal-Processing
for the simulation in the Unix environment can be Noise
reused with the DSPs real-time operating system by Channel-
linking it with the system specific interface template Emulator
library for the interfaces to the FPGA and to other
DSPs. This system also allows transferring functionality
onto the host PC if necessary. CDMA- Voice-
For the mapping of blocks that will be placed onto IF Receiver DeCoder
the FPGA we use FRONTIER DESIGN’s A|RT Builder
[10] to automatically create synthesizable VHDL code Figure 3: WCDMA downlink prototype
from the original C code. It performs a direct
translation of C code into VHDL, without doing any A. Transmitter
type of behavioral synthesis. The designers are therefore
responsible for refining their C code designed for FPGA
The transmitter consists of a combination of one Virtex
implementation into a syntax that describes a register
400k gate FPGA and a TI ’C67 DSP. The DSP runs a
transfer level (RTL) structure of the block that can be
standard ACELP voice encoder and generates the data
mapped into hardware. In contrast to System-C [11] the
stream. It also configures the FPGA and controls the
tools expect sequential C code that has no capability of
transmitter parameters. The FPGA contains the actual
expressing concurrency. Floating-point operations in
WCDMA downlink functionality for a variable data rate
these translated functions are replaced with fixed-point
main user. Besides the transmission of the desired user,
operations by using the A|RT Library [10] fixed-point
“dummy-users”, sending random data, can be turned
data types. Simulink blocks are either replaced by
on. Their number and power can be adjusted to simulate
VHDL templates or also described in RTL-C. Note that
different traffic situations. A sync/pilot signal is also
each of these blocks can still be used in the Simulink
added before the signal is digitally up converted to an IF
simulation, as RTL-C code is standard executable C.
frequency and sent to the DAC.
Due to the moderate size of the designs a top-down
In the laboratory the wireless channel is modeled
synthesis approach using the SYNOPSYS Design
with a radio channel emulator and an additive white
Compiler [12] can be used to translate the generated
Gaussian noise (AWGN) generator. Alternatively, an
VHDL code into an EDIF netlist, which is imported
RF air interface may be used to do measurements in an
into the XILINX Alliance place & route tool. Scripts are
exterior environment.
used to hide the details of the synthesis process from the
designer.
B. Receiver
III. Design Example The receiver is the more interesting and more
challenging part of the system. It consists of a ’C62
fixed-point and a ’C67 floating-point DSP and a Virtex
As an example we implemented a basic WCDMA
1000k gate FPGA. It incorporates the following
downlink system. Its high-level block diagram is
functionality:
depicted in Figure 3. The system is a 4Mchips/s DS-
• IF interface simulation and of a theoretical analysis are given as
• Synchronization well for comparison.
• Tracking of frequency offset
• 4-finger rake receiver
• Polyphase equalizer
• Data interface/buffer and BER measurement
Table 1 shows some of the criteria that were applied for
the partitioning of the functionality between the DSP
and the FPGA. It also shows where the main blocks of
the system were finally placed. Putting all the functions
that operate on a sample by sample basis in a 4Mchips/s
4× oversampled system onto the FPGA is unavoidable,
because of the tremendous number of operations. An
important reason for implementing almost all the
control flow operations in the DSP was to allow a high
degree of flexibility to make changes quickly and on the
fly, without the need to redo the time consuming FPGA Figure 4: Simulink model of the receiver
synthesis. This was particularly important, since the
turnaround time for the receiver FPGA implementation Besides the actual receiver functionality a voice decoder
was on the order of 5 hours. was implemented on the second DSP, together with
The DSP-FPGA interface uses a global bus on the independent interface blocks to the CDMA receiver and
FPGA that connects to all modules and can be used to to an audio codec. It works on a frame by frame basis
read and write parameters to and from each block. The and is embedded in the system using 3L Diamond to
designer needs to handle these interface commands in handle the blocks connections as described above.It was
his C code that describes the blocks on the FPGA. A also simulated in Simulink just like the actual WCDMA
relatively simple protocol, incorporating only marginal receiver and then mapped solely onto the second DSP.
overhead was designed. Figure 4 shows the Simulink
model of the receiver. The interfaces from the blocks
are collected in the left bottom corner of the top level
schematic. They connect to the “DSP” interface from
the template library at the next level of hierarchy. The
procedural code that handles the control flow and the
other DSP-based operations listed in Table 1 is not
shown, as it runs as separate task in the Unix
environment in parallel to the cycle oriented Simulink
simulation. In the real-time environment it run on the
DSP in parallel to the FPGA implementation.
Figure 5 shows the BER-performance of the RAKE-
receiver plotted vs. the number of users at a rate of
256kb/s for a standard outdoor-to-indoor channel model
at 5Hz doppler frequency. The results of a floating point
Figure 5: BER vs. number of users
Table 1: DSP/FPGA partitioning for the receiver
Criterion Function
DSP • Control flow (linear code) • Synchronization and tracking control
• Complex & irregular operations • Rake finger assignment
• Complex data path ops. Operations at low rates • Equalizer computation (matrix-inversion)
with low I/O requirements
• Floating-point or high precision fixed-point
operations
FPGA • Data path operations • IF down-conversion, filters
• Operations at sampling and chip rate • Rake receiver, equalizer
• Correlators for sync. and channel estimation
IV. Summary V. Conclusions

Slow turn-around times, mainly due to long P&R runs We presented a methodology that allows rapid
for the FPGA turned out to be one of the major prototyping for research in wireless communication. It
problems in the rapid prototyping and evaluation uses C as the primary language and integrates common
process. This overhead was reduced by keeping system design tools such as Simulink into the flow. The
parameters in the FPGA modules programmable and C based approach is well suited for hardware/software
controlling them from the DSP at run time. This also co-design, allowing one to exploit the unique features of
keeps modules flexible and easier to reuse. The fact that DSPs and FPGAs in a combined system. The BLADE
C code is being used to describe RTL-type parallel methodology also provides a way to co-simulate DSPs
structures imposes certain restrictions on the design and FPGAs in a multitasking environment. It offers
hierarchy that are sometimes less elegant and less very high flexibility and a methodology to combine
intuitive than the structure that one would impose when cycle oriented code with procedural code to efficiently
writing VHDL. The use of multiple clock designs is describe control flow and relatively irregular operations
currently not supported in the proposed flow. This is in a natural and intuitive way. For algorithm research
one of the main restrictions that limits the possibility to and development it is a real alternative to the use of
make use of optimizations that are especially well suited relatively expensive high-level system design tools such
for medium data rate systems on FPGAs (e.g., bit-serial as COSSAP [12] and SPW [13]. However, it does not
architectures or decomposition of multipliers). The use claim to be a solution for a production environment, as
of C as design language also limits the use of its capabilities to achieve performance optimized
specifically optimized blocks from vendor specific designs are certainly limited.
FPGA libraries, such as the XILINX CoreGen library.
This is because our methodology does not provide hooks References
to instantiate these blocks directly from within a C
module. [1] M. RUPP, E. BECK, AND R. KRISHNAMOORTHY, “Rapid
However, using a C-based approach for an FPGA Prototyping for High Data Rate Wireless Local Loop,” in Proc
implementation yielded a lot of advantages. Avoiding 33rd Asilomar Conf., 1999, Vol. 2, pp. 993−997.
the use of VHDL was highly appreciated by the [2] S. DAS, S. RAJAGOPAL, C. SENGUPTA, AND J. CAVALLARO,
algorithm designers working on the project, even “Arithmetic Acceleration Techniques for Wireless
though the C to VHDL translation does not involve any Communication Receivers,” in Proc 33rd Asilomar Conf.,
1999, Vol. 2, pp. 1469−1474.
behavioral synthesis and therefore forces the designer to
[3] V. SUNDARAMURTHY AND J. CAVALLARO, “A Software
write register transfer level (RTL)-type C code. A new
Simulation Testbed for Third Generation CDMA Wireless
tool called A|RT Designer [10] may avoid this problem Systems,” in Proc 33rd Asilomar Conf., 1999, Vol. 2, pp.
in the future by offering behavioral synthesis capability. 1680−1684.
Using Simulink for debugging of the design, it [4] MATLAB and Simulink are trademarks of THE
turned out that a capability to store waveforms of the MATHWORKS INC. http://www.mathworks.com.
signals and/or variables inside the C code is missing. [5] G.D. GOLDEN, G.J. FOSCHINI, R.A. VALENZUELA, AND P.W.
Waveforms can only be recorded on the block WOLNIANSKY, “Detection Algorithm and Initial Laboratory
boundaries in Simulink. A regular C debugger was used Results Using V-BLAST Space-Time Communication
to track changes inside the C code on a cycle by cycle Architecture”, Electronics Letters, Vol. 35, No. 1, pp. 11−14,
basis, but this was a tedious task. Using a standard Jan. 1999.
[6] TEXAS INSTRUMENTS TMS320C6000 DSP Platform
VHDL simulation tool with a Simulink interface or an
http://www.ti.com/sc/docs/products/dsp/c6000/index.htm.
on-chip logic analyzer like XILINX’ ChipScope [7] [7] Virtex and ChipScope are trademarks of XILINX INC.
might be a valuable add-on. Additionally, an add-on to [8] Sundance is a trademark of SUNDANCE MULTIPROCESSOR
Frontier Design's fixed-point library also offers a TECHNOLOGY LTD. & SUNDANCE DSP INC.
statistics capability to track overflows and other [9] Diamond is a trademark of 3L LTD.
numerical problems in the simulation. Using Simulink [10] A|RT Builder, A|RT Designer and A|RT Library are
proved to be a valuable means for system verification as trademarks of FRONTIER DESIGN http://www.frontierd.com.
existing blocks could be incorporated at an early [11] http://www.systemc.org.
evaluation/design stage. However functional Simulink [12] Design Compiler and COSSAP are trademarks of
SYNOPSYS INC. http://www.synopsys.com.
blocks need to be replaced with custom C code in the
[13] SPW is a trademark of CADANCE CORP.
implementation phase or need to be replaced by [14] R. BAINES, “The DSP Bottleneck,” IEEE Comm.
templates. Magazine, 33(5):46–54, 1995.

You might also like