FFT v1 0 ds002

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

CONFIDENTIAL

Fast Fourier Transform v1.0

Fast Fourier Transform v1.0


Introduction The Fast Fourier Transform (FFT) IP core is a computationally efficient algorithm used to compute the Discrete Fourier Transform for a given input data set (real or complex) using the Cooley-Tukey Algorithm. It is optimally designed specifically for the eASIC Nextreme-2 and Nextreme devices with a focus on throughput characteristics required for OFDM modulation / demodulation as well as other applications requiring FFTs. Implementation Summary

Families Supported Design File Formats Certification Implementation Details Support

Nextreme-2, Nextreme Verilog Level 2*


See Performance and Resource Section

eASIC

Note: * Level 2 denotes that the core has been taken through the eASIC design flow including synthesis, placement and routing.

Features Device Support for Nextreme-2 and Nextreme Devices FFT point sizes from 8 16K pt in steps of powers of 2 (i.e. 256, 512, 1024).* Fixed Point C-Model for system modeling available Support for two architecture: Radix-2 & Radix-4 Loop Engine trading off area vs latency Support for both FFT & iFFT, run-time configurable Optional Run-time configurable Transform Length. *Radix-4 Loop Engine only supports N points up to powers of 4 Release Information Below is a list of the files and documents contain in this release of the eASIC FFT IP Core function: 1) RTL files in Verilog 2) Bit true C-Model 3) Test bench for RTL simulation with Test vectors covering the FFT features 4) Documentation (Data Sheet, test case list, functional verification plan, testcase register) 5) Scripts for regression run or individual run for testcases
Rev: FFT_v1_0_ds002 www.eASIC.com 1

Input data bit width: 2s Complement 8 18 bits Phase Factor bit width: 2s Complement 8 18 bits Convergent Rounding Decimation in Frequency (DIF) FFT Scaling: Fixed Bit Reversed or Natural Order Input Complete Verilog RTL Code Testbench for Simulation

CONFIDENTIAL

Fast Fourier Transform v1.0

Performance and Resource Utilization


The following tables list the maximum clock performance, corresponding transform time and resource usage for a selected set of parameters. The eCell usage changes significantly depending on the timing constraints used in the target design with the faster the constraints the larger the eCell count. The latency is from asserting the START input to the last sample of output data coming out of the core assuming that the UNLOAD input is asserted immediately after the transform is completed. The following device families are detailed in the tables below Nextreme For the determination of maximum frequency, the core was generated with double registers on each input and output. The registers directly connected to the core run on the core clock, whereas the outer registers run off a separate clock. This ensures that all paths in the core are included in the timing constraint without artificially distorting the design to fit the chip. The device voltage library used for the implementation is specified at the top of each table.

Nextreme
Note: All implementations use 16-bit Data and Phase Factors & 1.2v library

Table 1:

Performance and Resource Usage for Nextreme


Point Size 512 1024 2048 4096 1024 4096 1024 1024 2048 2048 1024 1024 16384 16384 Input Order Natural Natural Natural Natural Natural Natural Reverse Reverse Reverse Reverse Reverse Reverse Natural Natural Run Time N Yes Yes Yes Yes Yes Yes Yes No Yes No Yes No No No Performance (MHZ) 178 181 178 178 177 176 181 180 178 181 176 181 180 174 Latency (Cycles) 3,479 7,335 15,543 32,967 3,440 14,469 7,335 7,335 15,543 15,543 3,440 7,335 147,687 61,594 Latency (s) 19.51 40.52 87.07 184.68 19.43 82.18 40.52 40.63 87.07 85.87 19.43 40.52 820.48 353.58

FFT Architecture R-2 Loop Engine R-2 Loop Engine R-2 Loop Engine R-2 Loop Engine R-4 Loop Engine R-4 Loop Engine R-2 Loop Engine R-2 Loop Engine R-2 Loop Engine R-2 Loop Engine R-4 Loop Engine R-4 Loop Engine R-2 Loop Engine R-4 Loop Engine

eCells 3,504 3,585 3,563 3,613 9,169 9,188 3,585 3,519 3,563 3,557 9,169 9,078 3,802 9,353

bRAM 5 5 5 10 11 11 5 5 5 5 11 11 40 44

Rev: FFT_v1_0_ds002

www.eASIC.com

CONFIDENTIAL

Fast Fourier Transform v1.0

General Description
The formulae for evaluating the DFT is Forward DFT

Where K range from 0 to N-1 Inverse DFT

Where n range from 0 to N-1 We note here that the inverse DFT only change in the phase factor is conjugate of the forward DFT. Fast Fourier transform is an efficient algorithm to find the DFT of a given block of input data. Basically, a divide and conquer rule is applied in this algorithm so that the long computation is broken down in smaller repetitive one. This repetitive structure is called butterfly structure. This basic structure can be implemented in such a way that it takes 2 inputs at a time (Radix-2) or 4 inputs at a time (Radix-4). The eASIC FFT core supports both Radix-2 and Radix-4 butterfly architectures for computation of the DFT. Furthermore, it is important to note that both architectures of the eASIC FFT core use the Decimation in Frequency decomposition method (DIF). The iFFT is calculated by conjugating the phase factors/phase factors of the corresponding forward FFT & scaling the result by N. The computation of an input frame is always a loop engine structure and computation takes place in 3 stages. 1) Loading stage (Takes input data to the core) 2) Computation stage (Computes DFT) 3) Unloading stage (Gives the data after computation) This FFT core has 2 options that user can select between. 1) Radix-2 Loop Engine 2) Radix-4 Loop Engine Figure 1 illustrates the throughput and area difference between the two architectures.

Rev: FFT_v1_0_ds002

www.eASIC.com

CONFIDENTIAL

Fast Fourier Transform v1.0

Figure 1: Resource Utilization vs Throughput of the FFT architectures

Radix-2 Loop Engine


This architecture uses a Radix-2 Butterfly structure for the FFT computation. This is the smallest area implementation of all in the FFT computation. Figure 2 shows Radix-2 computation block.

Radix-2

Twiddle factor Memory


Figure 2: Conceptual block diagram of Radix-2 Loop Engine
Rev: FFT_v1_0_ds002 www.eASIC.com 4

CONFIDENTIAL Radix-4 Loop Engine

Fast Fourier Transform v1.0

This architecture uses Radix-4 Butterfly structure for the computation of the FFT. This architecture is faster compared to the radix-2 loop engine, as 4 complex inputs are processed every clock cycle. However, the faster throughput requires more resources. A block diagram of radix-4 computation is shown in Figure . This core supports scaled fixed point arithmetic.

ST1

ST2 ST3

TF 1 TF 2

j Radix-4

TF 3

Twiddle factor Memory


Figure 3: Conceptual block diagram of the Radix-4 Engine

Run Time Configurable Point Size


Both architectures support the capability to change the point size on a frame by frame basis. When selected a input port is provided to determine the desired point size. There is a minor size increase in both architectures when this option is selected. This capability is often required for wireless communications applications like OFDM systems (WiMAX, LTE) where the point size routinely changes over short time intervals.

Natural or Bit Order Input / Output


Both architectures provide the option of Natural or Reversed order of data input and output. Natural order is where the data points are output in the same order as the input data points, i.e., 0, 1, 2, 3, and so on.

The Bit Reverse order is simple to calculate, by taking the index of the data point, written in binary, and reversing the order of the Bits. Hence, 0000, 0001, 0010, 0011, 0100,...(0, 1, 2, 3, 4,...) becomes 0000, 1000, 0100, 1100, 0010,...(0, 8, 4, 12, 2,...). In the case of the Radix-4 Loop Engine, the binary reversal applies to every 2 Bits. Hence, 0000, 0001, 0010, 0011, 0100,...(0, 1, 2, 3, 4,...) becomes 0000, 0100, 1000, 1100, 0001,...(0, 4, 8, 12, 1,...), as the pairs of bits are reversed. When the transform size requires an odd number of index bits, the odd bit in the

Rev: FFT_v1_0_ds002

www.eASIC.com

CONFIDENTIAL

Fast Fourier Transform v1.0

least significant place is moved to the most significant place, so 00000, 00001, 00010, 00011, 00100,... (0, 1, 2, 3, 4,...) becomes 00000, 10000, 00100, 10100, 01000,...(0, 16, 4, 20, 8,...

Scaling
The FFT processes an array of data by successive passes over the input data array. On each pass, the algorithm performs Radix-4 or Radix-2 butterflies, where each butterfly picks up four or two complex numbers, respectively, and returns four or two complex numbers to the same memory. The numbers returned to memory by the core are potentially larger than the numbers picked up from memory. A strategy must be employed to accommodate this dynamic range expansion. For a Radix-4 Loop Engine FFT, the values computed in a butterfly stage can experience growth by a factor of up to 3 bits. For Radix-2, the growth is by a factor of up to 2 bits. Currently, only one option is available to be handle this bit growth (v2.0 will support Block Floating Point): 1. Scaling at each stage using a fixed-scaling schedule When using scaling, a scaling schedule is used to scale by a factor of 1, 2, 4, or 8 in each stage. If scaling is insufficient, a butterfly output may grow beyond the dynamic range and cause an overflow. As a result of the scaling applied in the FFT implementation, the transform computed is a scaled transform. If a Radix-4 algorithm scales by a factor of 4 in each stage, the scaling factor is equal to the factor of 1/N in the inverse FFT equation. For Radix-2, scaling by a factor of 2 in each stage provides the factor of 1/N. Otherwise, additional scaling is necessary.

Rev: FFT_v1_0_ds002

www.eASIC.com

CONFIDENTIAL

Fast Fourier Transform v1.0

FFT Loop Engine Structure


Figure 4 shows the block diagram of the complete FFT structure (Radix-2 Loop Engine core taken for illustration)

Start

Done

Address Control Generation


Ctrl_sigs Ctrl_sigs W_addr R_addr

Data valid

Data Memory

Radix Computation

Data Reorder Block

Output data Input data

Tw_addr

Twiddle factor Memory

Figure 4: Conceptual block diagram of the FFT core

Address Generation Control


This block is the main controlling block for the entire FFT operation and implements the following functions key functions: a) Controls the entire core b) Generation of the read & write address data c) Read address for fetching the phase factors d) Indicates when the Loading ,Computation & Unloading stage occurs e) Generation of data valid, done signals.

Rev: FFT_v1_0_ds002

www.eASIC.com

CONFIDENTIAL

Fast Fourier Transform v1.0

Data Memory
This block and implements the following functions key functions a) Stores the input data given by user b) Stores the intermediate data after the Radix computation. c) Takes complex data as input. d) Memory used is the block memory and output is registered.

Radix Computation
This block and implements the following functions key functions a) This block contains the basic butterfly structure (Radix-2 or Radix-4). b) This block accepts the complex data as input & gives out the complex data.

Data Reorder Block


This block and implements the following functions key functions a) This block transposes the result of the butterfly structure before putting back to the memory. This is required as we are doing in-place computation b) In the loading/unloading stage the input/output data are directed to/from memory to input/output pins through this block.

Phase factor memory


a) This block stores the phase factors for the given N-Point value. b) At the beginning the core will be initialization stage. In this stage the phase factors are computed & stored in the block RAM. c) In the initialization stage the user should not give data. Only when the config_o port signal goes low then only data should be fed.

Rev: FFT_v1_0_ds002

www.eASIC.com

CONFIDENTIAL

Fast Fourier Transform v1.0

FFT Core Symbol


Figure shows the core symbol of FFT core

clk_i config_o rst_ni ce_i xn_re_i xn_im_i start_i unload_i nfft_i nfft_we_i fwd_inv_i fwd_inv_we_i scale_sch_i scale_sch_we_i dv_o done_o blk_exp_o ovflo_o
5 B B Bxk Bxk

xk_re_o xk_im_o

xn_index_o xk_index_o

FFT CORE

rfd_o busy_o

Figure 5: FFT core symbol

Rev: FFT_v1_0_ds002

www.eASIC.com

CONFIDENTIAL

Fast Fourier Transform v1.0

Port Interface
Table 2: Input Port Descriptions Directio n Input Input Input Input

Port Name clk_i rst_ni ce_i xn_re_i

Width 1 1 1 B

Description Rising-edge clock Master asynchronous reset (Active low) Clock enable (Active High): Input data bus: Real component (B = 8 - 18) in 2s complement format Input data bus: Imaginary component (B = 8 18) in 2s complement format FFT start signal (Active High): START is asserted to begin the data loading and transform calculation (for the Burst I/O architectures). When this port is high unloading will happen in natural order & when low unloading will happen in the bit reverse order. This port specifies the N_value that user need to feed in or configure the core with. N-point would be (2^nfft_i). If this port is Zero then the least value is selected. (According to the architecture) Write enable for NFFT port Control signal that indicates if a forward FFT or an inverse FFT is performed. When FWD_INV=1, a forward transform is computed. If FWD_INV=0, an inverse transform is computed. Write enable for FWD_INV (Active High). Scaling schedule: The scaling schedule is specified with two bits for each stage, starting at the two LSBs. The scaling can be specified as 3, 2, 1, or 0, which represents the number of bits to be shifted from the computed result of Radix block. For N=128, Radix-2 one possible scaling schedule is [1, 1, 1, 1, 0, 1, 2]. Write enable for SCALE_SCH (Active High): This port is available only with scaled arithmetic and not with full precision.

xn_im_i

Input

start_i

Input

unload_i

Input

nfft_i

Input

nfft_we_i

Input

fwd_inv_i

Input

fwd_inv_we_i

Input

1 2 x ceil(number_of_stage/2) for Radix-4 Loop Engine or 2 x (number_of_stages) for Radix-2 Loop Engine

scale_sch_i

Input

scale_sch_we_ i

Input

Rev: FFT_v1_0_ds002

www.eASIC.com

10

CONFIDENTIAL
Table 3: Port Name Output Port Descriptions Directio n Output Width Description

Fast Fourier Transform v1.0

config_o

Indicates that the core is still in the configuration stage. (That is the core is still in the evaluation of the Phase factors). Nothing should be done until this signal goes low. Output data bus: Real component in twos complement format. Output data bus: Imaginary component in twos complement format. Index of input data. (Here maximum Point size is the point size that is configured while generation the core) Index of output data. ( Here maximum Point size is the point size that is configured while generation the core) Ready for data (Active High): RFD is High during the load operation. Core computation stage(Active High): This signal goes High while the core is computing the transform. Data valid (Active High): This signal is High when valid data is presented at the output bus. FFT computation complete strobe (Active High): DONE transitions High for one clock cycle when the transform calculation has completed for the frame. Block exponent: The number of bits scaled for every point in the data frame. Available only when block-floating point is used. Arithmetic overflow indicator (Active High): OVFLO is High during result unloading if any value in the data frame overflowed. The OVFLO signal is reset at the beginning of a new frame of data. This port is optional and only available with scaled arithmetic.

xk_re_o xk_im_o

Output Output

B B

xn_index_o

Output

log2 (max pt size)

xk_index_o

Output

log2 (max pt size)

rfd_o

Output

busy_o

Output

dv_o

Output

done_o

Output

blk_exp_o

Output

ovflo_o

Output

Rev: FFT_v1_0_ds002

www.eASIC.com

11

CONFIDENTIAL

Fast Fourier Transform v1.0

I/O Data Flow Architectures


All architectures currently supported by FFT v1.0 are buffered I/O data flow. Users can modify them to become a buffered streaming with an external memory as long as the data rate is slow enough for the FFT to have processed the current FFT frame before the complete

Input Data Flow Waveform


In the Figure 1 shows the signals that one should note for feeding the data. 1) config_o should be low before feeding the data (I.e.,, before start pulse is asserted) 2) Before start pulse the run time configuration signals should be asserted 3) After the assertion of the start pulse the rdf_o signal will go high after one clock pulse 4) User should fed data in the next positive edge of the clock after getting the index.

n_fft_i n_fft_we_i fwd_inv_i fwd_inv_we_i scale_sch_i scale_sch_we_i start_i rfd_i xn_index_i x(n)
0 1 1 0 2 1 2 n-1 n-1

Figure 1 : Input data flow

Output Data Flow Waveform


In the Figure 2 shows the signals that one should note for feeding the data. 1) When the busy_o signal is deasserted the done will go high for one pulse 2) After 3 clock pulse the data valid will go high 3) The index and the data will be given in the same clock pulse

busy_o done_o dv_o xk_index_o X(k)


0 0 1 1 2 2 n-1 n-1

Figure 2 : Output data flow

Rev: FFT_v1_0_ds002

www.eASIC.com

12

CONFIDENTIAL Top level Timing diagram

Fast Fourier Transform v1.0

n_fft_i n_fft_we_i fwd_inv_i fwd_inv_we_i scale_sch_i scale_sch_we_i start_i rfd_i xn_index_i x(n) busy_o done_o dv_o xk_index_o X(k)
0 1 2 0 1 2 n-1 n-1 1 0 1 2 0 1 2 n-1 n-1

Figure 3 : Top level Timing diagram

Rev: FFT_v1_0_ds002

www.eASIC.com

13

CONFIDENTIAL

Fast Fourier Transform v1.0

FFT User Parameters


The table below details the user parameters to configure the core.

Table 4:

User Parameters for the FFT Values 16 to 36 Description Real = data_width/2 Imag = data_width/2 Real = phase_width/2 Imag = phase_width/2 Specifies the Fourier Transform Length in steps of multiple of 2 (8,16,32..) for Radix-2 architecture Transform Length in steps of multiple of 4 (16,64,256..) for Radix-4 architecture To make the core configurable for taking bit reverse order input To Configure the core to compute only IFFT. To make the core configurable for the run time FFT/iFFT computation If defined then the core works for Nextreme device.

Parameter Name data_width

phase_width

16 to 36

n_point

8 to 16K

INPUT_ORDER STATIC_IFFT DYN_FFT_IFFT

Defines Defines Defines

NX

Defines

If NOT defined then the core works for N2X device (Nextreme-2 Device). To make the FFT core run time N point configurable

RUN_TIME_N_CONFIG

Defines

Rev: FFT_v1_0_ds002

www.eASIC.com

14

CONFIDENTIAL

Fast Fourier Transform v1.0

Component Instantiation
The FFT v1.0 can be instantiated into Verilog or VHDL code (VHDL will require a mixed-mode design flow). Below are component declarations for Verilog and VHDL design flows.

Verilog Module Declaration


Here Radix-2 Architecture is taken as an example (For Radix-2 Architecture) module fft_r2_top_rtl ( clk_i, rst_ni, ce_i, xn_re_i, xn_im_i, start_i, unload_i, nfft_i, nfft_we_i, fwd_inv_i, fwd_inv_we_i, scale_sch_i, scale_sch_we_i, config_o, xk_re_o, xk_im_o, xn_index_o, xk_index_o, rfd_o, busy_o, dv_o, done_o, blk_exp_o, ovflo_o);

Rev: FFT_v1_0_ds002

www.eASIC.com

15

CONFIDENTIAL

Fast Fourier Transform v1.0

VHDL Component Declaration

component fft_r2_top_rtl is generic( data_width phase_width n_point latency_radix ); port( clk_i : rst_ni : ce_i : xn_re_i : xn_im_i : start_i : unload_i : nfft_i : nfft_we_i : fwd_inv_i : fwd_inv_we_i : scale_sch_i : scale_sch_we_i: config_o : xk_re_o : xk_im_o : xn_index_o : xk_index_o : rfd_o : busy_o : dv_o : done_o : blk_exp_o : ovflo_o : ); in std_logic; in std_logic; in std_logic; in std_logic_vector(FFT_DATA_WIDTH/2 - 1 downto 0); in std_logic_vector(FFT_DATA_WIDTH/2 - 1 downto 0); in std_logic; in std_logic; in std_logic_vector(4:0); in std_logic; in std_logic; in std_logic; in std_logic_vector(SCALING_WIDTH-1 downto 0); in std_logic; out std_logic; out std_logic_vector(FFT_DATA_WIDTH/2 - 1 downto 0); out std_logic_vector(FFT_DATA_WIDTH/2 - 1 downto 0); out std_logic_vector(FFT_INDEX_WIDTH - 1 downto 0); out std_logic_vector(FFT_INDEX_WIDTH - 1 downto 0); out std_logic; out std_logic; out std_logic; out std_logic; out std_logic; out std_logic

: : : :

natural natural natural natural

:= := := :=

`FFT_DATA_WIDTH; `FFT_PHASE_WIDTH; `N_POINT; 5

Rev: FFT_v1_0_ds002

www.eASIC.com

16

CONFIDENTIAL

Fast Fourier Transform v1.0

Bit-Accurate C Model
The C Model is designed for bit-accurate modelling of the FFT core. The model produces the same exact result as Verilog implementation of the FFT core. It is important to note that the C-model is not cycle accurate and does not model interface or clock latency. The files provided with the C-Model are 1. fft_bitacc_cmodel.c - The complete C-Model 2. fft_inter_parameter.h Internal parameters required for the IP 3. fft_user_defines.h - User parameters for the FFT The C-Model mainly consists of two functions which mimic the two loop engine architectures: 1. r2_fft_m() for Radix-2 Architecture 2. r4_fft_m() for Radix-4 Architecture

System Requirements
A 64-bit C compiler is required to use the C-Model

Rev: FFT_v1_0_ds002

www.eASIC.com

17

CONFIDENTIAL

Fast Fourier Transform v1.0

User Defines
Change the "fft_user_defines.h" file for setting the required parameters for c model Table 5: User Parameters for the C Model Description The Point size of a Transform if radix2 computation is required define RADIX2 if radix4 computation is required define RADIX4 fft data width including real and imaginary data fft data width including real and imaginary data number of frames first frame N_POINT value second frame N_POINT value depending on NO_OF_FRAMES values we need to have that many F*_N_POINT defines first frame transform value second frame transform value forward transform value = 1 for reverse transform value = 0 first frame scaling value second frame scaling value this value indicates the scaling after each stage whether to include scaling or not if it is 1 then scaling is enabled if it is 0 then scaling is disabled and default scaling is applied. when STATIC_IFFT = 0, negates the imaginary values of phase factor to calculate the IFFT in multi frame transform input file name output file name This is to print intermediate stage results This is to print phase factor values make the array of F*_N_POINT make the array of F*_FWD_INV make the array of F*_SCA_VAL
www.eASIC.com 18

Parameter Name N_POINT RADIX2 RADIX4 FFT_DATA_WIDTH FFT_PHASE_WIDTH NO_OF_FRAMES F1_N_POINT F2_N_POINT

F1_FWD_INV F2_FWD_INV

F1_SCA_VAL F2_SCA_VAL

SCALING_EN

STATIC_IFFT input output EN_STAGE_RESULT PRINT_TWIDDLE F_N_POINT F_FWD_INV F_SCA_VAL


Rev: FFT_v1_0_ds002

CONFIDENTIAL Input Data File Format

Fast Fourier Transform v1.0

In "fft_user_defines.h" the character array "input" specifies the input file name. This file should contain the input data to be transformed. The data should be in decimal format, and contain the real and imaginary values separated by space. An example is shown below: +19783 +47534 +61825 +16308 +118822 +43074 +96314 +117995 . The left most column is the real part and the right portion is the imaginary part. Note that the data are in decimal format.

Output Data File Format


In "fft_user_defines.h" the character array "output" specifies the output file name. It contains the transformed values of input data. The data is in decimal format and it contains real part, imaginary part and overflow indication bit for that frame. An example is shown below: +19783 +475341 +61825 +163081 +118822 +430741 +96314 +1179951 . The left most column is the real part of the output data. Next column is the imaginary part of the data. The last column is for the indication of the overflow bit. The C-model is tested using GCC compiler in the Linux environment.

Steps to run FFT C-Model in Linux environment:


1. 2. 3. 4. 5. 6. Provide the correct parameters required for your purpose in the fft_user_defines.h file. Place the input file in the same folder where you are running the model Type : $gcc lm fft_bitacc_cmodel.c This would produce a executable a.out. Type : $./a.out The output file would be created in the same folder.

Rev: FFT_v1_0_ds002

www.eASIC.com

19

CONFIDENTIAL

Fast Fourier Transform v1.0

Directory structure
Figure 4 shows the directory structure after unpacking the release package. Make sure the directory structure is correct before using the core:

Figure 4 : Top level directory

Figure 5: Interface folder containing testcases

Figure 6 : Simulation folder


Rev: FFT_v1_0_ds002

Figure 7 : Testcase folder arrangement


www.eASIC.com 20

CONFIDENTIAL

Fast Fourier Transform v1.0

Compile & Simulate the Design


The following steps are required for running regression for the available testcases. The description of each of the testcases are given in an excel sheet (fft_test_cases.xls). If any a particular testcase is required to be simulated then reference the list available the spreadsheet fft_test_cases.xls in DOCUMENT folder. 1. Before you start simulation, ensure that the Modelsim present working directory is set to the \data\dsp_cores\fft\SIMULATION folder. 2. Now go to the file \data\dsp_cores\fft\SIMULATION\scripts\run_r2_all.do and set the variable ELIBS to the path where the below files are available a) nxfc_logic_bram.v b) nxfc_logic_bram_wrapper.v c) nxfc_logic_core.v d) eip_nx_bram_2p_v2.v e) ecell_delay.veip_nx_bram_v2.v Note: These files should be in the same folder 3. Repeat the above step for file \data\dsp_cores\fft\SIMULATION\scripts\run_r4_all.do. Once the directory is set, then go to Modelsim command/transcript window and type do ./scripts/run_all.do. Modelsim will call the macro run_all.do and executes commands. 5. The FFT implementation is done for both Radix-2 and Radix-4 architecture. run_all.do macro will run both the architectures. To run specifically Radix-2 architecture then run the command do ./scripts/run_r2_all.do in transcript & to run Radix-4 architecture specifically type command do ./scripts/run_r4_all.do in transcript window 6. Open the simulation script run_r2_all.do/run_r4_all.do, which is located in the folder \data\dsp_cores\fft\SIMULATION\scripts\ in any text editor. This file has commands to run each of the test cases. Test case names are assigned to TESTCASE variable in the script. The information regarding the configuration used for a particular test case will be available in the fft_config.vh file under the folder \data\dsp_cores\fft\INTERFACE\<testcase_name>. 7. Once the simulation for a test case is finished, **** Simulation End **** message is displayed on Modelsim command/transcript window.

4.

Rev: FFT_v1_0_ds002

www.eASIC.com

21

CONFIDENTIAL
8.

Fast Fourier Transform v1.0

To generate verdict report, go the folder \data\dsp_cores\fft\SIMULATION\ in Modelsim, then type do fft_gen_verdict_rpt.tcl

9.

Final report of the test cases status PASS/FAIL will be present in the verdict report (verdict.rpt) in the folder data\dsp_cores\fft\TEST. This verdict will indicate that any value mismatch has occurred if any. Report doesnt contain the reason for the failure or dump of DUT.

10. RTL Dumps of each test case will be present in \data\dsp_cores\fft\TESTBENCH\simdata\<testcase>\report\<testcase>.rpt

NOTE: 1. Before compilation of any testcase set the variable ELIBS to the correct path where the files for the simulation of single port ram is available For running one testcase following changes are required a. Go to the ./scripts/run_r2_all.do file. b. Look for quietly set TESTCASE {<testcase_name>}. This would have the list of the testcases that needs to run. c. Set TESTCASE to a particular testcase that needs to be run. d. After modification follow the steps mentioned above for compilation & simulation of the core e. The same procedure applies for the radix-4 architecture also. We just need the change the list in the file ./scripts/run_r4_all.do

2.

Script Descriptions
The scripts are available in the folder SIMULATION/scripts/ to run a testcase. Following are the files present: 1) compile_all.do 2) compile_r4_all.do 3) run_r2_all.do of Radix-2 architecture files 4) run_r4_all.do of // Compilation of library files. Loading the design & running the simulation Radix-4 architecture files 5) run_all.do //Simulation of both Radix-2 & Radix-4 architecture //Compilation of Radix 2 Architecture source & TB files //Compilation of Radix 4 Architecture source & TB files //Compilation of library files. Loading the design & running the simulation

6) fft_gen_verdict_rpt.tcl // Generation of final verdict


Rev: FFT_v1_0_ds002 www.eASIC.com 22

CONFIDENTIAL
compile_all.do

Fast Fourier Transform v1.0

This script will compile source file & Test bench files required for the Radix-2 architecture. Following are the list of the files that this script compiles Test bench files a) fft_tb_top.v b) fft_clock.v c) fft_data_driver.v d) fft_protocol_checker.v e) fft_result_analyser.v f) fft_throughput.v g) fft_test_script_driver.v RTL files a) fft_r2_top_rtl.v b) fft_r2_comparator_rtl.v c) fft_r2_counter_rtl.v d) fft_r2_masking_rtl.v e) fft_r2_nfft_gen_rtl.v f) fft_r2_addr_ctrl_rtl.v g) fft_r2_addr_manip_rtl.v h) fft_r2_data_reordr_rtl.v i) fft_r2_twi_spt_rtl.v j) fft_r2_comp_rtl.v k) fft_r2_scale_rtl.v l) fft_mux_2to1_rtl.v m) fft_delay_line_rtl.v n) fft_delay_line_enb_rtl.v o) fft_cmplx_add_rtl.v p) fft_cmult_rtl.v q) fft_cmplx_sub_rtl.v r) fft_saturation_indication_rtl.v s) fft_saturation_rtl.v t) fft_r2_phase_mem_rtl.v u) fft_r2_data_mem_rtl.v

Rev: FFT_v1_0_ds002

www.eASIC.com

23

CONFIDENTIAL
compile_r4_all.do

Fast Fourier Transform v1.0

This script will compile source file & Test bench files required for the Radix-4 architecture. Following are the list of the files that this script compiles Test Bench files a) fft_tb_top_r4 b) fft_clock.v c) fft_data_driver.v d) fft_protocol_checker.v e) fft_result_analyser.v f) fft_throughput.v g) fft_test_script_driver.v RTL files a) fft_r4_top_rtl.v b) fft_r4_comparator_rtl.v c) fft_r4_masking_rtl.v d) fft_r4_nfft_gen_rtl.v e) fft_r4_counter_rtl.v f) fft_r4_comp_rtl.v g) fft_r4_scale_rtl.v h) fft_r4_addr_ctrl_rtl.v i) fft_r4_addr_manip_rtl.v j) fft_r4_data_reordr_rtl.v k) fft_r4_twi_spt_rtl.v l) fft_r4_comparator_rtl.v m) fft_generic_mux_rtl.v n) fft_mux_2to1_rtl.v o) fft_delay_line_rtl.v p) fft_delay_line_enb_rtl.v q) fft_cmplx_add_rtl.v r) fft_cmult_rtl.v s) fft_cmplx_sub_rtl.v t) fft_saturation_indication_rtl.v u) fft_saturation_rtl.v v) fft_r4_phase_mem_rtl.v w) fft_r4_data_mem_rtl.v

Rev: FFT_v1_0_ds002

www.eASIC.com

24

CONFIDENTIAL
run_r2_all.do

Fast Fourier Transform v1.0

This script will create the work directory when user runs for the 1 time. The following library files are compiled (Specific to eASIC technology) Following are the list of files that are compiled nxfc_logic_bram.v nxfc_logic_bram_wrapper.v nxfc_logic_core.v eip_nx_bram_2p_v2.v ecell_delay.v eip_nx_bram_v2.v In this script the compile_all.do script is called for the compilation of the source & TB files To run regression & a particular testcase follow the procedure given in the section above. This script is for simulation of Radix-2 FFT architecture

st

run_r4_all.do This script is very similar to the Radix-2 run_all.do script. Only change is that this script ids for the simulation of the Radix-4 architecture.

run_all.do This is the top level top script which intern calls run_r2_all.do & run_r4_all.do scripts fft_gen_verdict_rpt.tcl This script is for the generation of the verdict file, which will tell what is the status of the all the testcase. The report also provides the time stamp of each of the test cases.

Rev: FFT_v1_0_ds002

www.eASIC.com

25

CONFIDENTIAL

Fast Fourier Transform v1.0

An Example Testcase
Select a testcase to be run. Let us consider TV_FFT_21h is to be run. As per the testcase register .xls TV_FFT_21h has the following configuration. 1) Run time configurable 2) N point = 1024 3) iFFT 4) Data width = 36 (Real = 18 & Imag = 18) 5) Phase width = 36 (Real = 18 & Imag = 18) Hence, in the fft_config.vh file under the directory \ip_libs\data\dsp_cores\fft\INTERFACE\ TV_FFT_21h\ which is the configuration file for the core should have following definitions `define N_POINT 1024 `define FFT_DATA_WIDTH 36 `define FFT_PHASE_WIDTH 36 `define RUN_TIME_N_CONFIG `define STATIC_IFFT In addition to this the user can also define clock period & duty cycle This completes the FFT core configuration. Next is changing the testcase name in the run_r2_all.do for simulation and running it In the run_r2_all.do file which is under the folder \ip_libs\data\dsp_cores\fft\SIMULATION\scripts, set the test case name to be run as follows and save the file. quietly set TESTCASE {" TV_FFT_21h "} Run the script using the command do ../scripts/run_r2_all.do After the simulation is complete, the output report file can be viewed for final results. The report file will be available in the file TV_FFT_21h.rpt under the folder \ip_libs\data\dsp_cores\fft\TESTBENCH\simdata\TV_FFT_21h\reports

Rev: FFT_v1_0_ds002

www.eASIC.com

26

CONFIDENTIAL

Fast Fourier Transform v1.0

References
1) Digital Signal Processing - Principles, Algorithms & Applications Proakis & Manolakis][3rd Ed].

Revision History
Date 01/28/2009 01/28/2009 Version v1.0 ds001 v1.0 ds002 Summary of Changes Initial release Removal of Block Floating Point Scaling

Rev: FFT_v1_0_ds002

www.eASIC.com

27

You might also like