Professional Documents
Culture Documents
Communications
FPGA
FPGA
Implementation Simulation
RTL
DSP-2 C-Code
TI-C6x
A|RT-Builder Wrapper
Template Library
VHDL
Synthesis, C-compiler
Figure 1: Basic hardware setup for BLADE P&R
procedural
B. BLADE Methodology C-Code
The main goal of the BLADE methodology is to provide FPGA DSP/3L UNIX SIMULINK
a way to make use of the combined capabilities of DSPs
and FPGAs without having to employ assembler Figure 2: BLADE design flow
After a successful initial simulation of the system, the CDMA system, with a separate sync/pilot-channel and a
design can be transferred to the real-time platform. A variable set of users transmitting at different rates. The
final decision has to be made regarding whether algorithm designers’ motivation behind putting this
functions should be mapped onto the DSP or FPGA. system onto the test platform was to explore the
Procedural functions are placed on the DSP, while both performance gain that can be achieved in such a system
DSPs and FPGAs are possible targets for the remaining by using an equalizer structure instead of the usual rake
cycle/block oriented functions, depending on their receiver. The system also contains many of the
performance requirements. frequently used blocks in wideband communications
For those block operations that are mapped onto the and it was therefore a good candidate to build up a
DSP a real-time operating system (3L Diamond) is used collection of blocks for future reuse.
to provide the communication interfaces between them.
Each block runs as a separate task. Activity on the ports
can control the scheduling process. Communication Voice- CDMA- IF
channels replace the ports of the blocks in the Simulink Coder Transmitter
block diagram and a configuration file is used to specify
the links between them. The linear code that was used Analog Signal-Processing
for the simulation in the Unix environment can be Noise
reused with the DSPs real-time operating system by Channel-
linking it with the system specific interface template Emulator
library for the interfaces to the FPGA and to other
DSPs. This system also allows transferring functionality
onto the host PC if necessary. CDMA- Voice-
For the mapping of blocks that will be placed onto IF Receiver DeCoder
the FPGA we use FRONTIER DESIGN’s A|RT Builder
[10] to automatically create synthesizable VHDL code Figure 3: WCDMA downlink prototype
from the original C code. It performs a direct
translation of C code into VHDL, without doing any A. Transmitter
type of behavioral synthesis. The designers are therefore
responsible for refining their C code designed for FPGA
The transmitter consists of a combination of one Virtex
implementation into a syntax that describes a register
400k gate FPGA and a TI ’C67 DSP. The DSP runs a
transfer level (RTL) structure of the block that can be
standard ACELP voice encoder and generates the data
mapped into hardware. In contrast to System-C [11] the
stream. It also configures the FPGA and controls the
tools expect sequential C code that has no capability of
transmitter parameters. The FPGA contains the actual
expressing concurrency. Floating-point operations in
WCDMA downlink functionality for a variable data rate
these translated functions are replaced with fixed-point
main user. Besides the transmission of the desired user,
operations by using the A|RT Library [10] fixed-point
“dummy-users”, sending random data, can be turned
data types. Simulink blocks are either replaced by
on. Their number and power can be adjusted to simulate
VHDL templates or also described in RTL-C. Note that
different traffic situations. A sync/pilot signal is also
each of these blocks can still be used in the Simulink
added before the signal is digitally up converted to an IF
simulation, as RTL-C code is standard executable C.
frequency and sent to the DAC.
Due to the moderate size of the designs a top-down
In the laboratory the wireless channel is modeled
synthesis approach using the SYNOPSYS Design
with a radio channel emulator and an additive white
Compiler [12] can be used to translate the generated
Gaussian noise (AWGN) generator. Alternatively, an
VHDL code into an EDIF netlist, which is imported
RF air interface may be used to do measurements in an
into the XILINX Alliance place & route tool. Scripts are
exterior environment.
used to hide the details of the synthesis process from the
designer.
B. Receiver
III. Design Example The receiver is the more interesting and more
challenging part of the system. It consists of a ’C62
fixed-point and a ’C67 floating-point DSP and a Virtex
As an example we implemented a basic WCDMA
1000k gate FPGA. It incorporates the following
downlink system. Its high-level block diagram is
functionality:
depicted in Figure 3. The system is a 4Mchips/s DS-
• IF interface simulation and of a theoretical analysis are given as
• Synchronization well for comparison.
• Tracking of frequency offset
• 4-finger rake receiver
• Polyphase equalizer
• Data interface/buffer and BER measurement
Table 1 shows some of the criteria that were applied for
the partitioning of the functionality between the DSP
and the FPGA. It also shows where the main blocks of
the system were finally placed. Putting all the functions
that operate on a sample by sample basis in a 4Mchips/s
4× oversampled system onto the FPGA is unavoidable,
because of the tremendous number of operations. An
important reason for implementing almost all the
control flow operations in the DSP was to allow a high
degree of flexibility to make changes quickly and on the
fly, without the need to redo the time consuming FPGA Figure 4: Simulink model of the receiver
synthesis. This was particularly important, since the
turnaround time for the receiver FPGA implementation Besides the actual receiver functionality a voice decoder
was on the order of 5 hours. was implemented on the second DSP, together with
The DSP-FPGA interface uses a global bus on the independent interface blocks to the CDMA receiver and
FPGA that connects to all modules and can be used to to an audio codec. It works on a frame by frame basis
read and write parameters to and from each block. The and is embedded in the system using 3L Diamond to
designer needs to handle these interface commands in handle the blocks connections as described above.It was
his C code that describes the blocks on the FPGA. A also simulated in Simulink just like the actual WCDMA
relatively simple protocol, incorporating only marginal receiver and then mapped solely onto the second DSP.
overhead was designed. Figure 4 shows the Simulink
model of the receiver. The interfaces from the blocks
are collected in the left bottom corner of the top level
schematic. They connect to the “DSP” interface from
the template library at the next level of hierarchy. The
procedural code that handles the control flow and the
other DSP-based operations listed in Table 1 is not
shown, as it runs as separate task in the Unix
environment in parallel to the cycle oriented Simulink
simulation. In the real-time environment it run on the
DSP in parallel to the FPGA implementation.
Figure 5 shows the BER-performance of the RAKE-
receiver plotted vs. the number of users at a rate of
256kb/s for a standard outdoor-to-indoor channel model
at 5Hz doppler frequency. The results of a floating point
Figure 5: BER vs. number of users
Table 1: DSP/FPGA partitioning for the receiver
Criterion Function
DSP • Control flow (linear code) • Synchronization and tracking control
• Complex & irregular operations • Rake finger assignment
• Complex data path ops. Operations at low rates • Equalizer computation (matrix-inversion)
with low I/O requirements
• Floating-point or high precision fixed-point
operations
FPGA • Data path operations • IF down-conversion, filters
• Operations at sampling and chip rate • Rake receiver, equalizer
• Correlators for sync. and channel estimation
IV. Summary V. Conclusions
Slow turn-around times, mainly due to long P&R runs We presented a methodology that allows rapid
for the FPGA turned out to be one of the major prototyping for research in wireless communication. It
problems in the rapid prototyping and evaluation uses C as the primary language and integrates common
process. This overhead was reduced by keeping system design tools such as Simulink into the flow. The
parameters in the FPGA modules programmable and C based approach is well suited for hardware/software
controlling them from the DSP at run time. This also co-design, allowing one to exploit the unique features of
keeps modules flexible and easier to reuse. The fact that DSPs and FPGAs in a combined system. The BLADE
C code is being used to describe RTL-type parallel methodology also provides a way to co-simulate DSPs
structures imposes certain restrictions on the design and FPGAs in a multitasking environment. It offers
hierarchy that are sometimes less elegant and less very high flexibility and a methodology to combine
intuitive than the structure that one would impose when cycle oriented code with procedural code to efficiently
writing VHDL. The use of multiple clock designs is describe control flow and relatively irregular operations
currently not supported in the proposed flow. This is in a natural and intuitive way. For algorithm research
one of the main restrictions that limits the possibility to and development it is a real alternative to the use of
make use of optimizations that are especially well suited relatively expensive high-level system design tools such
for medium data rate systems on FPGAs (e.g., bit-serial as COSSAP [12] and SPW [13]. However, it does not
architectures or decomposition of multipliers). The use claim to be a solution for a production environment, as
of C as design language also limits the use of its capabilities to achieve performance optimized
specifically optimized blocks from vendor specific designs are certainly limited.
FPGA libraries, such as the XILINX CoreGen library.
This is because our methodology does not provide hooks References
to instantiate these blocks directly from within a C
module. [1] M. RUPP, E. BECK, AND R. KRISHNAMOORTHY, “Rapid
However, using a C-based approach for an FPGA Prototyping for High Data Rate Wireless Local Loop,” in Proc
implementation yielded a lot of advantages. Avoiding 33rd Asilomar Conf., 1999, Vol. 2, pp. 993−997.
the use of VHDL was highly appreciated by the [2] S. DAS, S. RAJAGOPAL, C. SENGUPTA, AND J. CAVALLARO,
algorithm designers working on the project, even “Arithmetic Acceleration Techniques for Wireless
though the C to VHDL translation does not involve any Communication Receivers,” in Proc 33rd Asilomar Conf.,
1999, Vol. 2, pp. 1469−1474.
behavioral synthesis and therefore forces the designer to
[3] V. SUNDARAMURTHY AND J. CAVALLARO, “A Software
write register transfer level (RTL)-type C code. A new
Simulation Testbed for Third Generation CDMA Wireless
tool called A|RT Designer [10] may avoid this problem Systems,” in Proc 33rd Asilomar Conf., 1999, Vol. 2, pp.
in the future by offering behavioral synthesis capability. 1680−1684.
Using Simulink for debugging of the design, it [4] MATLAB and Simulink are trademarks of THE
turned out that a capability to store waveforms of the MATHWORKS INC. http://www.mathworks.com.
signals and/or variables inside the C code is missing. [5] G.D. GOLDEN, G.J. FOSCHINI, R.A. VALENZUELA, AND P.W.
Waveforms can only be recorded on the block WOLNIANSKY, “Detection Algorithm and Initial Laboratory
boundaries in Simulink. A regular C debugger was used Results Using V-BLAST Space-Time Communication
to track changes inside the C code on a cycle by cycle Architecture”, Electronics Letters, Vol. 35, No. 1, pp. 11−14,
basis, but this was a tedious task. Using a standard Jan. 1999.
[6] TEXAS INSTRUMENTS TMS320C6000 DSP Platform
VHDL simulation tool with a Simulink interface or an
http://www.ti.com/sc/docs/products/dsp/c6000/index.htm.
on-chip logic analyzer like XILINX’ ChipScope [7] [7] Virtex and ChipScope are trademarks of XILINX INC.
might be a valuable add-on. Additionally, an add-on to [8] Sundance is a trademark of SUNDANCE MULTIPROCESSOR
Frontier Design's fixed-point library also offers a TECHNOLOGY LTD. & SUNDANCE DSP INC.
statistics capability to track overflows and other [9] Diamond is a trademark of 3L LTD.
numerical problems in the simulation. Using Simulink [10] A|RT Builder, A|RT Designer and A|RT Library are
proved to be a valuable means for system verification as trademarks of FRONTIER DESIGN http://www.frontierd.com.
existing blocks could be incorporated at an early [11] http://www.systemc.org.
evaluation/design stage. However functional Simulink [12] Design Compiler and COSSAP are trademarks of
SYNOPSYS INC. http://www.synopsys.com.
blocks need to be replaced with custom C code in the
[13] SPW is a trademark of CADANCE CORP.
implementation phase or need to be replaced by [14] R. BAINES, “The DSP Bottleneck,” IEEE Comm.
templates. Magazine, 33(5):46–54, 1995.