FFT v1 0 ds002

CONFIDENTIAL
Fast Fourier Transform v1.0

Introduction The Fast Fourier Transform (FFT) IP core is a computationally efficient algorithm used to compute the Discrete Fourier Transform for a given input data set (real or complex) using the Cooley-Tukey Algorithm. It is optimally designed specifically for the eASIC Nextreme-2 and Nextreme devices with a focus on throughput characteristics required for OFDM modulation / demodulation as well as other applications requiring FFTs. Implementation Summary
Families Supported Design File Formats Certification Implementation Details Support
Nextreme-2, Nextreme Verilog Level 2*

See Performance and Resource Section
eASIC
Note: * Level 2 denotes that the core has been taken through the eASIC design flow including synthesis, placement and routing.
Features Device Support for Nextreme-2 and Nextreme Devices FFT point sizes from 8 16K pt in steps of powers of 2 (i.e. 256, 512, 1024).* Fixed Point C-Model for system modeling available Support for two architecture: Radix-2 & Radix-4 Loop Engine trading off area vs latency Support for both FFT & iFFT, run-time configurable Optional Run-time configurable Transform Length. *Radix-4 Loop Engine only supports N points up to powers of 4 Release Information Below is a list of the files and documents contain in this release of the eASIC FFT IP Core function: 1) RTL files in Verilog 2) Bit true C-Model 3) Test bench for RTL simulation with Test vectors covering the FFT features 4) Documentation (Data Sheet, test case list, functional verification plan, testcase register) 5) Scripts for regression run or individual run for testcases
Rev: FFT_v1_0_ds002 www.eASIC.com 1
Input data bit width: 2s Complement 8 18 bits Phase Factor bit width: 2s Complement 8 18 bits Convergent Rounding Decimation in Frequency (DIF) FFT Scaling: Fixed Bit Reversed or Natural Order Input Complete Verilog RTL Code Testbench for Simulation
CONFIDENTIAL
Performance and Resource Utilization

The following tables list the maximum clock performance, corresponding transform time and resource usage for a selected set of parameters. The eCell usage changes significantly depending on the timing constraints used in the target design with the faster the constraints the larger the eCell count. The latency is from asserting the START input to the last sample of output data coming out of the core assuming that the UNLOAD input is asserted immediately after the transform is completed. The following device families are detailed in the tables below Nextreme For the determination of maximum frequency, the core was generated with double registers on each input and output. The registers directly connected to the core run on the core clock, whereas the outer registers run off a separate clock. This ensures that all paths in the core are included in the timing constraint without artificially distorting the design to fit the chip. The device voltage library used for the implementation is specified at the top of each table.
Nextreme
Note: All implementations use 16-bit Data and Phase Factors & 1.2v library
Table 1:
Performance and Resource Usage for Nextreme

Point Size 512 1024 2048 4096 1024 4096 1024 1024 2048 2048 1024 1024 16384 16384 Input Order Natural Natural Natural Natural Natural Natural Reverse Reverse Reverse Reverse Reverse Reverse Natural Natural Run Time N Yes Yes Yes Yes Yes Yes Yes No Yes No Yes No No No Performance (MHZ) 178 181 178 178 177 176 181 180 178 181 176 181 180 174 Latency (Cycles) 3,479 7,335 15,543 32,967 3,440 14,469 7,335 7,335 15,543 15,543 3,440 7,335 147,687 61,594 Latency (s) 19.51 40.52 87.07 184.68 19.43 82.18 40.52 40.63 87.07 85.87 19.43 40.52 820.48 353.58
FFT Architecture R-2 Loop Engine R-2 Loop Engine R-2 Loop Engine R-2 Loop Engine R-4 Loop Engine R-4 Loop Engine R-2 Loop Engine R-2 Loop Engine R-2 Loop Engine R-2 Loop Engine R-4 Loop Engine R-4 Loop Engine R-2 Loop Engine R-4 Loop Engine
eCells 3,504 3,585 3,563 3,613 9,169 9,188 3,585 3,519 3,563 3,557 9,169 9,078 3,802 9,353
bRAM 5 5 5 10 11 11 5 5 5 5 11 11 40 44
Rev: FFT_v1_0_ds002
www.eASIC.com
CONFIDENTIAL
General Description
The formulae for evaluating the DFT is Forward DFT
Where K range from 0 to N-1 Inverse DFT
Where n range from 0 to N-1 We note here that the inverse DFT only change in the phase factor is conjugate of the forward DFT. Fast Fourier transform is an efficient algorithm to find the DFT of a given block of input data. Basically, a divide and conquer rule is applied in this algorithm so that the long computation is broken down in smaller repetitive one. This repetitive structure is called butterfly structure. This basic structure can be implemented in such a way that it takes 2 inputs at a time (Radix-2) or 4 inputs at a time (Radix-4). The eASIC FFT core supports both Radix-2 and Radix-4 butterfly architectures for computation of the DFT. Furthermore, it is important to note that both architectures of the eASIC FFT core use the Decimation in Frequency decomposition method (DIF). The iFFT is calculated by conjugating the phase factors/phase factors of the corresponding forward FFT & scaling the result by N. The computation of an input frame is always a loop engine structure and computation takes place in 3 stages. 1) Loading stage (Takes input data to the core) 2) Computation stage (Computes DFT) 3) Unloading stage (Gives the data after computation) This FFT core has 2 options that user can select between. 1) Radix-2 Loop Engine 2) Radix-4 Loop Engine Figure 1 illustrates the throughput and area difference between the two architectures.
Rev: FFT_v1_0_ds002
www.eASIC.com
CONFIDENTIAL
Figure 1: Resource Utilization vs Throughput of the FFT architectures
Radix-2 Loop Engine

This architecture uses a Radix-2 Butterfly structure for the FFT computation. This is the smallest area implementation of all in the FFT computation. Figure 2 shows Radix-2 computation block.
Radix-2
Twiddle factor Memory

Figure 2: Conceptual block diagram of Radix-2 Loop Engine
CONFIDENTIAL Radix-4 Loop Engine
This architecture uses Radix-4 Butterfly structure for the computation of the FFT. This architecture is faster compared to the radix-2 loop engine, as 4 complex inputs are processed every clock cycle. However, the faster throughput requires more resources. A block diagram of radix-4 computation is shown in Figure . This core supports scaled fixed point arithmetic.
ST1
ST2 ST3
TF 1 TF 2
j Radix-4
TF 3

Figure 3: Conceptual block diagram of the Radix-4 Engine
Run Time Configurable Point Size

Both architectures support the capability to change the point size on a frame by frame basis. When selected a input port is provided to determine the desired point size. There is a minor size increase in both architectures when this option is selected. This capability is often required for wireless communications applications like OFDM systems (WiMAX, LTE) where the point size routinely changes over short time intervals.
Natural or Bit Order Input / Output

Both architectures provide the option of Natural or Reversed order of data input and output. Natural order is where the data points are output in the same order as the input data points, i.e., 0, 1, 2, 3, and so on.
The Bit Reverse order is simple to calculate, by taking the index of the data point, written in binary, and reversing the order of the Bits. Hence, 0000, 0001, 0010, 0011, 0100,...(0, 1, 2, 3, 4,...) becomes 0000, 1000, 0100, 1100, 0010,...(0, 8, 4, 12, 2,...). In the case of the Radix-4 Loop Engine, the binary reversal applies to every 2 Bits. Hence, 0000, 0001, 0010, 0011, 0100,...(0, 1, 2, 3, 4,...) becomes 0000, 0100, 1000, 1100, 0001,...(0, 4, 8, 12, 1,...), as the pairs of bits are reversed. When the transform size requires an odd number of index bits, the odd bit in the
Rev: FFT_v1_0_ds002
www.eASIC.com
CONFIDENTIAL
least significant place is moved to the most significant place, so 00000, 00001, 00010, 00011, 00100,... (0, 1, 2, 3, 4,...) becomes 00000, 10000, 00100, 10100, 01000,...(0, 16, 4, 20, 8,...
Scaling
The FFT processes an array of data by successive passes over the input data array. On each pass, the algorithm performs Radix-4 or Radix-2 butterflies, where each butterfly picks up four or two complex numbers, respectively, and returns four or two complex numbers to the same memory. The numbers returned to memory by the core are potentially larger than the numbers picked up from memory. A strategy must be employed to accommodate this dynamic range expansion. For a Radix-4 Loop Engine FFT, the values computed in a butterfly stage can experience growth by a factor of up to 3 bits. For Radix-2, the growth is by a factor of up to 2 bits. Currently, only one option is available to be handle this bit growth (v2.0 will support Block Floating Point): 1. Scaling at each stage using a fixed-scaling schedule When using scaling, a scaling schedule is used to scale by a factor of 1, 2, 4, or 8 in each stage. If scaling is insufficient, a butterfly output may grow beyond the dynamic range and cause an overflow. As a result of the scaling applied in the FFT implementation, the transform computed is a scaled transform. If a Radix-4 algorithm scales by a factor of 4 in each stage, the scaling factor is equal to the factor of 1/N in the inverse FFT equation. For Radix-2, scaling by a factor of 2 in each stage provides the factor of 1/N. Otherwise, additional scaling is necessary.
Rev: FFT_v1_0_ds002
www.eASIC.com
CONFIDENTIAL
FFT Loop Engine Structure

Figure 4 shows the block diagram of the complete FFT structure (Radix-2 Loop Engine core taken for illustration)
Start
Done
Address Control Generation

Ctrl_sigs Ctrl_sigs W_addr R_addr
Data valid
Data Memory
Radix Computation
Data Reorder Block
Output data Input data
Tw_addr
Figure 4: Conceptual block diagram of the FFT core
Address Generation Control

This block is the main controlling block for the entire FFT operation and implements the following functions key functions: a) Controls the entire core b) Generation of the read & write address data c) Read address for fetching the phase factors d) Indicates when the Loading ,Computation & Unloading stage occurs e) Generation of data valid, done signals.
Rev: FFT_v1_0_ds002
www.eASIC.com
CONFIDENTIAL
Data Memory
This block and implements the following functions key functions a) Stores the input data given by user b) Stores the intermediate data after the Radix computation. c) Takes complex data as input. d) Memory used is the block memory and output is registered.
Radix Computation
This block and implements the following functions key functions a) This block contains the basic butterfly structure (Radix-2 or Radix-4). b) This block accepts the complex data as input & gives out the complex data.
Data Reorder Block

This block and implements the following functions key functions a) This block transposes the result of the butterfly structure before putting back to the memory. This is required as we are doing in-place computation b) In the loading/unloading stage the input/output data are directed to/from memory to input/output pins through this block.
Phase factor memory

a) This block stores the phase factors for the given N-Point value. b) At the beginning the core will be initialization stage. In this stage the phase factors are computed & stored in the block RAM. c) In the initialization stage the user should not give data. Only when the config_o port signal goes low then only data should be fed.
Rev: FFT_v1_0_ds002
www.eASIC.com
CONFIDENTIAL
FFT Core Symbol

Figure shows the core symbol of FFT core
clk_i config_o rst_ni ce_i xn_re_i xn_im_i start_i unload_i nfft_i nfft_we_i fwd_inv_i fwd_inv_we_i scale_sch_i scale_sch_we_i dv_o done_o blk_exp_o ovflo_o
5 B B Bxk Bxk
xk_re_o xk_im_o
xn_index_o xk_index_o
FFT CORE
rfd_o busy_o
Figure 5: FFT core symbol
Rev: FFT_v1_0_ds002
www.eASIC.com
CONFIDENTIAL
Port Interface
Table 2: Input Port Descriptions Directio n Input Input Input Input
Port Name clk_i rst_ni ce_i xn_re_i
Width 1 1 1 B
Description Rising-edge clock Master asynchronous reset (Active low) Clock enable (Active High): Input data bus: Real component (B = 8 - 18) in 2s complement format Input data bus: Imaginary component (B = 8 18) in 2s complement format FFT start signal (Active High): START is asserted to begin the data loading and transform calculation (for the Burst I/O architectures). When this port is high unloading will happen in natural order & when low unloading will happen in the bit reverse order. This port specifies the N_value that user need to feed in or configure the core with. N-point would be (2^nfft_i). If this port is Zero then the least value is selected. (According to the architecture) Write enable for NFFT port Control signal that indicates if a forward FFT or an inverse FFT is performed. When FWD_INV=1, a forward transform is computed. If FWD_INV=0, an inverse transform is computed. Write enable for FWD_INV (Active High). Scaling schedule: The scaling schedule is specified with two bits for each stage, starting at the two LSBs. The scaling can be specified as 3, 2, 1, or 0, which represents the number of bits to be shifted from the computed result of Radix block. For N=128, Radix-2 one possible scaling schedule is [1, 1, 1, 1, 0, 1, 2]. Write enable for SCALE_SCH (Active High): This port is available only with scaled arithmetic and not with full precision.
xn_im_i
Input
start_i
Input
unload_i
Input
nfft_i
Input
nfft_we_i
Input
fwd_inv_i
Input
fwd_inv_we_i
Input
1 2 x ceil(number_of_stage/2) for Radix-4 Loop Engine or 2 x (number_of_stages) for Radix-2 Loop Engine
scale_sch_i
Input
scale_sch_we_ i
Input
Rev: FFT_v1_0_ds002
www.eASIC.com
10
CONFIDENTIAL
Table 3: Port Name Output Port Descriptions Directio n Output Width Description
config_o
Indicates that the core is still in the configuration stage. (That is the core is still in the evaluation of the Phase factors). Nothing should be done until this signal goes low. Output data bus: Real component in twos complement format. Output data bus: Imaginary component in twos complement format. Index of input data. (Here maximum Point size is the point size that is configured while generation the core) Index of output data. ( Here maximum Point size is the point size that is configured while generation the core) Ready for data (Active High): RFD is High during the load operation. Core computation stage(Active High): This signal goes High while the core is computing the transform. Data valid (Active High): This signal is High when valid data is presented at the output bus. FFT computation complete strobe (Active High): DONE transitions High for one clock cycle when the transform calculation has completed for the frame. Block exponent: The number of bits scaled for every point in the data frame. Available only when block-floating point is used. Arithmetic overflow indicator (Active High): OVFLO is High during result unloading if any value in the data frame overflowed. The OVFLO signal is reset at the beginning of a new frame of data. This port is optional and only available with scaled arithmetic.
xk_re_o xk_im_o
Output Output
B B
xn_index_o
Output
log2 (max pt size)
xk_index_o
Output
log2 (max pt size)
rfd_o
Output
busy_o
Output
dv_o
Output
done_o
Output
blk_exp_o
Output
ovflo_o
Output
Rev: FFT_v1_0_ds002
www.eASIC.com
11
CONFIDENTIAL
I/O Data Flow Architectures

All architectures currently supported by FFT v1.0 are buffered I/O data flow. Users can modify them to become a buffered streaming with an external memory as long as the data rate is slow enough for the FFT to have processed the current FFT frame before the complete
Input Data Flow Waveform

In the Figure 1 shows the signals that one should note for feeding the data. 1) config_o should be low before feeding the data (I.e.,, before start pulse is asserted) 2) Before start pulse the run time configuration signals should be asserted 3) After the assertion of the start pulse the rdf_o signal will go high after one clock pulse 4) User should fed data in the next positive edge of the clock after getting the index.
n_fft_i n_fft_we_i fwd_inv_i fwd_inv_we_i scale_sch_i scale_sch_we_i start_i rfd_i xn_index_i x(n)
0 1 1 0 2 1 2 n-1 n-1
Figure 1 : Input data flow
Output Data Flow Waveform

In the Figure 2 shows the signals that one should note for feeding the data. 1) When the busy_o signal is deasserted the done will go high for one pulse 2) After 3 clock pulse the data valid will go high 3) The index and the data will be given in the same clock pulse
busy_o done_o dv_o xk_index_o X(k)

0 0 1 1 2 2 n-1 n-1
Figure 2 : Output data flow
Rev: FFT_v1_0_ds002
www.eASIC.com
12
CONFIDENTIAL Top level Timing diagram
n_fft_i n_fft_we_i fwd_inv_i fwd_inv_we_i scale_sch_i scale_sch_we_i start_i rfd_i xn_index_i x(n) busy_o done_o dv_o xk_index_o X(k)
0 1 2 0 1 2 n-1 n-1 1 0 1 2 0 1 2 n-1 n-1
Figure 3 : Top level Timing diagram
Rev: FFT_v1_0_ds002
www.eASIC.com
13
CONFIDENTIAL
FFT User Parameters

The table below details the user parameters to configure the core.
Table 4:
User Parameters for the FFT Values 16 to 36 Description Real = data_width/2 Imag = data_width/2 Real = phase_width/2 Imag = phase_width/2 Specifies the Fourier Transform Length in steps of multiple of 2 (8,16,32..) for Radix-2 architecture Transform Length in steps of multiple of 4 (16,64,256..) for Radix-4 architecture To make the core configurable for taking bit reverse order input To Configure the core to compute only IFFT. To make the core configurable for the run time FFT/iFFT computation If defined then the core works for Nextreme device.
Parameter Name data_width
phase_width
16 to 36
n_point
8 to 16K
INPUT_ORDER STATIC_IFFT DYN_FFT_IFFT
Defines Defines Defines
NX
Defines
If NOT defined then the core works for N2X device (Nextreme-2 Device). To make the FFT core run time N point configurable
RUN_TIME_N_CONFIG
Defines
Rev: FFT_v1_0_ds002
www.eASIC.com
14
CONFIDENTIAL
Component Instantiation
The FFT v1.0 can be instantiated into Verilog or VHDL code (VHDL will require a mixed-mode design flow). Below are component declarations for Verilog and VHDL design flows.
Verilog Module Declaration

Here Radix-2 Architecture is taken as an example (For Radix-2 Architecture) module fft_r2_top_rtl ( clk_i, rst_ni, ce_i, xn_re_i, xn_im_i, start_i, unload_i, nfft_i, nfft_we_i, fwd_inv_i, fwd_inv_we_i, scale_sch_i, scale_sch_we_i, config_o, xk_re_o, xk_im_o, xn_index_o, xk_index_o, rfd_o, busy_o, dv_o, done_o, blk_exp_o, ovflo_o);
Rev: FFT_v1_0_ds002
www.eASIC.com
15
CONFIDENTIAL
VHDL Component Declaration
component fft_r2_top_rtl is generic( data_width phase_width n_point latency_radix ); port( clk_i : rst_ni : ce_i : xn_re_i : xn_im_i : start_i : unload_i : nfft_i : nfft_we_i : fwd_inv_i : fwd_inv_we_i : scale_sch_i : scale_sch_we_i: config_o : xk_re_o : xk_im_o : xn_index_o : xk_index_o : rfd_o : busy_o : dv_o : done_o : blk_exp_o : ovflo_o : ); in std_logic; in std_logic; in std_logic; in std_logic_vector(FFT_DATA_WIDTH/2 - 1 downto 0); in std_logic_vector(FFT_DATA_WIDTH/2 - 1 downto 0); in std_logic; in std_logic; in std_logic_vector(4:0); in std_logic; in std_logic; in std_logic; in std_logic_vector(SCALING_WIDTH-1 downto 0); in std_logic; out std_logic; out std_logic_vector(FFT_DATA_WIDTH/2 - 1 downto 0); out std_logic_vector(FFT_DATA_WIDTH/2 - 1 downto 0); out std_logic_vector(FFT_INDEX_WIDTH - 1 downto 0); out std_logic_vector(FFT_INDEX_WIDTH - 1 downto 0); out std_logic; out std_logic; out std_logic; out std_logic; out std_logic; out std_logic
: : : :
natural natural natural natural
:= := := :=
`FFT_DATA_WIDTH; `FFT_PHASE_WIDTH; `N_POINT; 5
Rev: FFT_v1_0_ds002
www.eASIC.com
16
CONFIDENTIAL
Bit-Accurate C Model
The C Model is designed for bit-accurate modelling of the FFT core. The model produces the same exact result as Verilog implementation of the FFT core. It is important to note that the C-model is not cycle accurate and does not model interface or clock latency. The files provided with the C-Model are 1. fft_bitacc_cmodel.c - The complete C-Model 2. fft_inter_parameter.h Internal parameters required for the IP 3. fft_user_defines.h - User parameters for the FFT The C-Model mainly consists of two functions which mimic the two loop engine architectures: 1. r2_fft_m() for Radix-2 Architecture 2. r4_fft_m() for Radix-4 Architecture
System Requirements
A 64-bit C compiler is required to use the C-Model
Rev: FFT_v1_0_ds002
www.eASIC.com
17
CONFIDENTIAL
User Defines
Change the "fft_user_defines.h" file for setting the required parameters for c model Table 5: User Parameters for the C Model Description The Point size of a Transform if radix2 computation is required define RADIX2 if radix4 computation is required define RADIX4 fft data width including real and imaginary data fft data width including real and imaginary data number of frames first frame N_POINT value second frame N_POINT value depending on NO_OF_FRAMES values we need to have that many F*_N_POINT defines first frame transform value second frame transform value forward transform value = 1 for reverse transform value = 0 first frame scaling value second frame scaling value this value indicates the scaling after each stage whether to include scaling or not if it is 1 then scaling is enabled if it is 0 then scaling is disabled and default scaling is applied. when STATIC_IFFT = 0, negates the imaginary values of phase factor to calculate the IFFT in multi frame transform input file name output file name This is to print intermediate stage results This is to print phase factor values make the array of F*_N_POINT make the array of F*_FWD_INV make the array of F*_SCA_VAL
www.eASIC.com 18
Parameter Name N_POINT RADIX2 RADIX4 FFT_DATA_WIDTH FFT_PHASE_WIDTH NO_OF_FRAMES F1_N_POINT F2_N_POINT
F1_FWD_INV F2_FWD_INV
F1_SCA_VAL F2_SCA_VAL
SCALING_EN
STATIC_IFFT input output EN_STAGE_RESULT PRINT_TWIDDLE F_N_POINT F_FWD_INV F_SCA_VAL

Rev: FFT_v1_0_ds002
CONFIDENTIAL Input Data File Format
In "fft_user_defines.h" the character array "input" specifies the input file name. This file should contain the input data to be transformed. The data should be in decimal format, and contain the real and imaginary values separated by space. An example is shown below: +19783 +47534 +61825 +16308 +118822 +43074 +96314 +117995 . The left most column is the real part and the right portion is the imaginary part. Note that the data are in decimal format.
Output Data File Format

In "fft_user_defines.h" the character array "output" specifies the output file name. It contains the transformed values of input data. The data is in decimal format and it contains real part, imaginary part and overflow indication bit for that frame. An example is shown below: +19783 +475341 +61825 +163081 +118822 +430741 +96314 +1179951 . The left most column is the real part of the output data. Next column is the imaginary part of the data. The last column is for the indication of the overflow bit. The C-model is tested using GCC compiler in the Linux environment.
Steps to run FFT C-Model in Linux environment:

1. 2. 3. 4. 5. 6. Provide the correct parameters required for your purpose in the fft_user_defines.h file. Place the input file in the same folder where you are running the model Type : $gcc lm fft_bitacc_cmodel.c This would produce a executable a.out. Type : $./a.out The output file would be created in the same folder.
Rev: FFT_v1_0_ds002
www.eASIC.com
19
CONFIDENTIAL
Directory structure
Figure 4 shows the directory structure after unpacking the release package. Make sure the directory structure is correct before using the core:
Figure 4 : Top level directory
Figure 5: Interface folder containing testcases
Figure 6 : Simulation folder

Rev: FFT_v1_0_ds002
Figure 7 : Testcase folder arrangement

www.eASIC.com 20
CONFIDENTIAL
Compile & Simulate the Design

The following steps are required for running regression for the available testcases. The description of each of the testcases are given in an excel sheet (fft_test_cases.xls). If any a particular testcase is required to be simulated then reference the list available the spreadsheet fft_test_cases.xls in DOCUMENT folder. 1. Before you start simulation, ensure that the Modelsim present working directory is set to the \data\dsp_cores\fft\SIMULATION folder. 2. Now go to the file \data\dsp_cores\fft\SIMULATION\scripts\run_r2_all.do and set the variable ELIBS to the path where the below files are available a) nxfc_logic_bram.v b) nxfc_logic_bram_wrapper.v c) nxfc_logic_core.v d) eip_nx_bram_2p_v2.v e) ecell_delay.veip_nx_bram_v2.v Note: These files should be in the same folder 3. Repeat the above step for file \data\dsp_cores\fft\SIMULATION\scripts\run_r4_all.do. Once the directory is set, then go to Modelsim command/transcript window and type do ./scripts/run_all.do. Modelsim will call the macro run_all.do and executes commands. 5. The FFT implementation is done for both Radix-2 and Radix-4 architecture. run_all.do macro will run both the architectures. To run specifically Radix-2 architecture then run the command do ./scripts/run_r2_all.do in transcript & to run Radix-4 architecture specifically type command do ./scripts/run_r4_all.do in transcript window 6. Open the simulation script run_r2_all.do/run_r4_all.do, which is located in the folder \data\dsp_cores\fft\SIMULATION\scripts\ in any text editor. This file has commands to run each of the test cases. Test case names are assigned to TESTCASE variable in the script. The information regarding the configuration used for a particular test case will be available in the fft_config.vh file under the folder \data\dsp_cores\fft\INTERFACE\<testcase_name>. 7. Once the simulation for a test case is finished, **** Simulation End **** message is displayed on Modelsim command/transcript window.
4.
Rev: FFT_v1_0_ds002
www.eASIC.com
21
CONFIDENTIAL
8.
To generate verdict report, go the folder \data\dsp_cores\fft\SIMULATION\ in Modelsim, then type do fft_gen_verdict_rpt.tcl
9.
Final report of the test cases status PASS/FAIL will be present in the verdict report (verdict.rpt) in the folder data\dsp_cores\fft\TEST. This verdict will indicate that any value mismatch has occurred if any. Report doesnt contain the reason for the failure or dump of DUT.
10. RTL Dumps of each test case will be present in \data\dsp_cores\fft\TESTBENCH\simdata\<testcase>\report\<testcase>.rpt
NOTE: 1. Before compilation of any testcase set the variable ELIBS to the correct path where the files for the simulation of single port ram is available For running one testcase following changes are required a. Go to the ./scripts/run_r2_all.do file. b. Look for quietly set TESTCASE {<testcase_name>}. This would have the list of the testcases that needs to run. c. Set TESTCASE to a particular testcase that needs to be run. d. After modification follow the steps mentioned above for compilation & simulation of the core e. The same procedure applies for the radix-4 architecture also. We just need the change the list in the file ./scripts/run_r4_all.do
2.
Script Descriptions
The scripts are available in the folder SIMULATION/scripts/ to run a testcase. Following are the files present: 1) compile_all.do 2) compile_r4_all.do 3) run_r2_all.do of Radix-2 architecture files 4) run_r4_all.do of // Compilation of library files. Loading the design & running the simulation Radix-4 architecture files 5) run_all.do //Simulation of both Radix-2 & Radix-4 architecture //Compilation of Radix 2 Architecture source & TB files //Compilation of Radix 4 Architecture source & TB files //Compilation of library files. Loading the design & running the simulation
6) fft_gen_verdict_rpt.tcl // Generation of final verdict

CONFIDENTIAL
compile_all.do
This script will compile source file & Test bench files required for the Radix-2 architecture. Following are the list of the files that this script compiles Test bench files a) fft_tb_top.v b) fft_clock.v c) fft_data_driver.v d) fft_protocol_checker.v e) fft_result_analyser.v f) fft_throughput.v g) fft_test_script_driver.v RTL files a) fft_r2_top_rtl.v b) fft_r2_comparator_rtl.v c) fft_r2_counter_rtl.v d) fft_r2_masking_rtl.v e) fft_r2_nfft_gen_rtl.v f) fft_r2_addr_ctrl_rtl.v g) fft_r2_addr_manip_rtl.v h) fft_r2_data_reordr_rtl.v i) fft_r2_twi_spt_rtl.v j) fft_r2_comp_rtl.v k) fft_r2_scale_rtl.v l) fft_mux_2to1_rtl.v m) fft_delay_line_rtl.v n) fft_delay_line_enb_rtl.v o) fft_cmplx_add_rtl.v p) fft_cmult_rtl.v q) fft_cmplx_sub_rtl.v r) fft_saturation_indication_rtl.v s) fft_saturation_rtl.v t) fft_r2_phase_mem_rtl.v u) fft_r2_data_mem_rtl.v
Rev: FFT_v1_0_ds002
www.eASIC.com
23
CONFIDENTIAL
compile_r4_all.do
This script will compile source file & Test bench files required for the Radix-4 architecture. Following are the list of the files that this script compiles Test Bench files a) fft_tb_top_r4 b) fft_clock.v c) fft_data_driver.v d) fft_protocol_checker.v e) fft_result_analyser.v f) fft_throughput.v g) fft_test_script_driver.v RTL files a) fft_r4_top_rtl.v b) fft_r4_comparator_rtl.v c) fft_r4_masking_rtl.v d) fft_r4_nfft_gen_rtl.v e) fft_r4_counter_rtl.v f) fft_r4_comp_rtl.v g) fft_r4_scale_rtl.v h) fft_r4_addr_ctrl_rtl.v i) fft_r4_addr_manip_rtl.v j) fft_r4_data_reordr_rtl.v k) fft_r4_twi_spt_rtl.v l) fft_r4_comparator_rtl.v m) fft_generic_mux_rtl.v n) fft_mux_2to1_rtl.v o) fft_delay_line_rtl.v p) fft_delay_line_enb_rtl.v q) fft_cmplx_add_rtl.v r) fft_cmult_rtl.v s) fft_cmplx_sub_rtl.v t) fft_saturation_indication_rtl.v u) fft_saturation_rtl.v v) fft_r4_phase_mem_rtl.v w) fft_r4_data_mem_rtl.v
Rev: FFT_v1_0_ds002
www.eASIC.com
24
CONFIDENTIAL
run_r2_all.do
This script will create the work directory when user runs for the 1 time. The following library files are compiled (Specific to eASIC technology) Following are the list of files that are compiled nxfc_logic_bram.v nxfc_logic_bram_wrapper.v nxfc_logic_core.v eip_nx_bram_2p_v2.v ecell_delay.v eip_nx_bram_v2.v In this script the compile_all.do script is called for the compilation of the source & TB files To run regression & a particular testcase follow the procedure given in the section above. This script is for simulation of Radix-2 FFT architecture
st
run_r4_all.do This script is very similar to the Radix-2 run_all.do script. Only change is that this script ids for the simulation of the Radix-4 architecture.
run_all.do This is the top level top script which intern calls run_r2_all.do & run_r4_all.do scripts fft_gen_verdict_rpt.tcl This script is for the generation of the verdict file, which will tell what is the status of the all the testcase. The report also provides the time stamp of each of the test cases.
Rev: FFT_v1_0_ds002
www.eASIC.com
25
CONFIDENTIAL
An Example Testcase
Select a testcase to be run. Let us consider TV_FFT_21h is to be run. As per the testcase register .xls TV_FFT_21h has the following configuration. 1) Run time configurable 2) N point = 1024 3) iFFT 4) Data width = 36 (Real = 18 & Imag = 18) 5) Phase width = 36 (Real = 18 & Imag = 18) Hence, in the fft_config.vh file under the directory \ip_libs\data\dsp_cores\fft\INTERFACE\ TV_FFT_21h\ which is the configuration file for the core should have following definitions `define N_POINT 1024 `define FFT_DATA_WIDTH 36 `define FFT_PHASE_WIDTH 36 `define RUN_TIME_N_CONFIG `define STATIC_IFFT In addition to this the user can also define clock period & duty cycle This completes the FFT core configuration. Next is changing the testcase name in the run_r2_all.do for simulation and running it In the run_r2_all.do file which is under the folder \ip_libs\data\dsp_cores\fft\SIMULATION\scripts, set the test case name to be run as follows and save the file. quietly set TESTCASE {" TV_FFT_21h "} Run the script using the command do ../scripts/run_r2_all.do After the simulation is complete, the output report file can be viewed for final results. The report file will be available in the file TV_FFT_21h.rpt under the folder \ip_libs\data\dsp_cores\fft\TESTBENCH\simdata\TV_FFT_21h\reports
Rev: FFT_v1_0_ds002
www.eASIC.com
26
CONFIDENTIAL
References
1) Digital Signal Processing - Principles, Algorithms & Applications Proakis & Manolakis][3rd Ed].
Revision History
Date 01/28/2009 01/28/2009 Version v1.0 ds001 v1.0 ds002 Summary of Changes Initial release Removal of Block Floating Point Scaling
Rev: FFT_v1_0_ds002
www.eASIC.com
27

FFT v1 0 ds002

Uploaded by

Copyright:

Available Formats

You might also like

FFT v1 0 ds002

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FFT v1 0 ds002

Uploaded by

Copyright:

Available Formats

CONFIDENTIAL

Fast Fourier Transform v1.0

Fast Fourier Transform v1.0

Families Supported Design File Formats Certification Implementation Details Support

Nextreme-2, Nextreme Verilog Level 2*

Fast Fourier Transform v1.0

Performance and Resource Utilization

Performance and Resource Usage for Nextreme

Fast Fourier Transform v1.0

Where K range from 0 to N-1 Inverse DFT

Fast Fourier Transform v1.0

Figure 1: Resource Utilization vs Throughput of the FFT architectures

Radix-2 Loop Engine

Twiddle factor Memory

CONFIDENTIAL Radix-4 Loop Engine

Fast Fourier Transform v1.0

Twiddle factor Memory

Run Time Configurable Point Size

Natural or Bit Order Input / Output

Fast Fourier Transform v1.0

Fast Fourier Transform v1.0

FFT Loop Engine Structure

Address Control Generation

Data Reorder Block

Output data Input data

Twiddle factor Memory

Figure 4: Conceptual block diagram of the FFT core

Address Generation Control

Fast Fourier Transform v1.0

Data Reorder Block

Phase factor memory

Fast Fourier Transform v1.0

FFT Core Symbol

Figure 5: FFT core symbol

Fast Fourier Transform v1.0

Port Name clk_i rst_ni ce_i xn_re_i

Fast Fourier Transform v1.0

log2 (max pt size)

log2 (max pt size)

Fast Fourier Transform v1.0

I/O Data Flow Architectures

Input Data Flow Waveform

Figure 1 : Input data flow

Output Data Flow Waveform

busy_o done_o dv_o xk_index_o X(k)

Figure 2 : Output data flow

CONFIDENTIAL Top level Timing diagram

Fast Fourier Transform v1.0

Figure 3 : Top level Timing diagram

Fast Fourier Transform v1.0

FFT User Parameters

Parameter Name data_width

INPUT_ORDER STATIC_IFFT DYN_FFT_IFFT

Defines Defines Defines

Fast Fourier Transform v1.0

Verilog Module Declaration

Fast Fourier Transform v1.0