Professional Documents
Culture Documents
FFT v1 0 ds002
FFT v1 0 ds002
FFT v1 0 ds002
eASIC
Note: * Level 2 denotes that the core has been taken through the eASIC design flow including synthesis, placement and routing.
Features Device Support for Nextreme-2 and Nextreme Devices FFT point sizes from 8 16K pt in steps of powers of 2 (i.e. 256, 512, 1024).* Fixed Point C-Model for system modeling available Support for two architecture: Radix-2 & Radix-4 Loop Engine trading off area vs latency Support for both FFT & iFFT, run-time configurable Optional Run-time configurable Transform Length. *Radix-4 Loop Engine only supports N points up to powers of 4 Release Information Below is a list of the files and documents contain in this release of the eASIC FFT IP Core function: 1) RTL files in Verilog 2) Bit true C-Model 3) Test bench for RTL simulation with Test vectors covering the FFT features 4) Documentation (Data Sheet, test case list, functional verification plan, testcase register) 5) Scripts for regression run or individual run for testcases
Rev: FFT_v1_0_ds002 www.eASIC.com 1
Input data bit width: 2s Complement 8 18 bits Phase Factor bit width: 2s Complement 8 18 bits Convergent Rounding Decimation in Frequency (DIF) FFT Scaling: Fixed Bit Reversed or Natural Order Input Complete Verilog RTL Code Testbench for Simulation
CONFIDENTIAL
Nextreme
Note: All implementations use 16-bit Data and Phase Factors & 1.2v library
Table 1:
FFT Architecture R-2 Loop Engine R-2 Loop Engine R-2 Loop Engine R-2 Loop Engine R-4 Loop Engine R-4 Loop Engine R-2 Loop Engine R-2 Loop Engine R-2 Loop Engine R-2 Loop Engine R-4 Loop Engine R-4 Loop Engine R-2 Loop Engine R-4 Loop Engine
eCells 3,504 3,585 3,563 3,613 9,169 9,188 3,585 3,519 3,563 3,557 9,169 9,078 3,802 9,353
bRAM 5 5 5 10 11 11 5 5 5 5 11 11 40 44
Rev: FFT_v1_0_ds002
www.eASIC.com
CONFIDENTIAL
General Description
The formulae for evaluating the DFT is Forward DFT
Where n range from 0 to N-1 We note here that the inverse DFT only change in the phase factor is conjugate of the forward DFT. Fast Fourier transform is an efficient algorithm to find the DFT of a given block of input data. Basically, a divide and conquer rule is applied in this algorithm so that the long computation is broken down in smaller repetitive one. This repetitive structure is called butterfly structure. This basic structure can be implemented in such a way that it takes 2 inputs at a time (Radix-2) or 4 inputs at a time (Radix-4). The eASIC FFT core supports both Radix-2 and Radix-4 butterfly architectures for computation of the DFT. Furthermore, it is important to note that both architectures of the eASIC FFT core use the Decimation in Frequency decomposition method (DIF). The iFFT is calculated by conjugating the phase factors/phase factors of the corresponding forward FFT & scaling the result by N. The computation of an input frame is always a loop engine structure and computation takes place in 3 stages. 1) Loading stage (Takes input data to the core) 2) Computation stage (Computes DFT) 3) Unloading stage (Gives the data after computation) This FFT core has 2 options that user can select between. 1) Radix-2 Loop Engine 2) Radix-4 Loop Engine Figure 1 illustrates the throughput and area difference between the two architectures.
Rev: FFT_v1_0_ds002
www.eASIC.com
CONFIDENTIAL
Radix-2
This architecture uses Radix-4 Butterfly structure for the computation of the FFT. This architecture is faster compared to the radix-2 loop engine, as 4 complex inputs are processed every clock cycle. However, the faster throughput requires more resources. A block diagram of radix-4 computation is shown in Figure . This core supports scaled fixed point arithmetic.
ST1
ST2 ST3
TF 1 TF 2
j Radix-4
TF 3
The Bit Reverse order is simple to calculate, by taking the index of the data point, written in binary, and reversing the order of the Bits. Hence, 0000, 0001, 0010, 0011, 0100,...(0, 1, 2, 3, 4,...) becomes 0000, 1000, 0100, 1100, 0010,...(0, 8, 4, 12, 2,...). In the case of the Radix-4 Loop Engine, the binary reversal applies to every 2 Bits. Hence, 0000, 0001, 0010, 0011, 0100,...(0, 1, 2, 3, 4,...) becomes 0000, 0100, 1000, 1100, 0001,...(0, 4, 8, 12, 1,...), as the pairs of bits are reversed. When the transform size requires an odd number of index bits, the odd bit in the
Rev: FFT_v1_0_ds002
www.eASIC.com
CONFIDENTIAL
least significant place is moved to the most significant place, so 00000, 00001, 00010, 00011, 00100,... (0, 1, 2, 3, 4,...) becomes 00000, 10000, 00100, 10100, 01000,...(0, 16, 4, 20, 8,...
Scaling
The FFT processes an array of data by successive passes over the input data array. On each pass, the algorithm performs Radix-4 or Radix-2 butterflies, where each butterfly picks up four or two complex numbers, respectively, and returns four or two complex numbers to the same memory. The numbers returned to memory by the core are potentially larger than the numbers picked up from memory. A strategy must be employed to accommodate this dynamic range expansion. For a Radix-4 Loop Engine FFT, the values computed in a butterfly stage can experience growth by a factor of up to 3 bits. For Radix-2, the growth is by a factor of up to 2 bits. Currently, only one option is available to be handle this bit growth (v2.0 will support Block Floating Point): 1. Scaling at each stage using a fixed-scaling schedule When using scaling, a scaling schedule is used to scale by a factor of 1, 2, 4, or 8 in each stage. If scaling is insufficient, a butterfly output may grow beyond the dynamic range and cause an overflow. As a result of the scaling applied in the FFT implementation, the transform computed is a scaled transform. If a Radix-4 algorithm scales by a factor of 4 in each stage, the scaling factor is equal to the factor of 1/N in the inverse FFT equation. For Radix-2, scaling by a factor of 2 in each stage provides the factor of 1/N. Otherwise, additional scaling is necessary.
Rev: FFT_v1_0_ds002
www.eASIC.com
CONFIDENTIAL
Start
Done
Data valid
Data Memory
Radix Computation
Tw_addr
Rev: FFT_v1_0_ds002
www.eASIC.com
CONFIDENTIAL
Data Memory
This block and implements the following functions key functions a) Stores the input data given by user b) Stores the intermediate data after the Radix computation. c) Takes complex data as input. d) Memory used is the block memory and output is registered.
Radix Computation
This block and implements the following functions key functions a) This block contains the basic butterfly structure (Radix-2 or Radix-4). b) This block accepts the complex data as input & gives out the complex data.
Rev: FFT_v1_0_ds002
www.eASIC.com
CONFIDENTIAL
clk_i config_o rst_ni ce_i xn_re_i xn_im_i start_i unload_i nfft_i nfft_we_i fwd_inv_i fwd_inv_we_i scale_sch_i scale_sch_we_i dv_o done_o blk_exp_o ovflo_o
5 B B Bxk Bxk
xk_re_o xk_im_o
xn_index_o xk_index_o
FFT CORE
rfd_o busy_o
Rev: FFT_v1_0_ds002
www.eASIC.com
CONFIDENTIAL
Port Interface
Table 2: Input Port Descriptions Directio n Input Input Input Input
Width 1 1 1 B
Description Rising-edge clock Master asynchronous reset (Active low) Clock enable (Active High): Input data bus: Real component (B = 8 - 18) in 2s complement format Input data bus: Imaginary component (B = 8 18) in 2s complement format FFT start signal (Active High): START is asserted to begin the data loading and transform calculation (for the Burst I/O architectures). When this port is high unloading will happen in natural order & when low unloading will happen in the bit reverse order. This port specifies the N_value that user need to feed in or configure the core with. N-point would be (2^nfft_i). If this port is Zero then the least value is selected. (According to the architecture) Write enable for NFFT port Control signal that indicates if a forward FFT or an inverse FFT is performed. When FWD_INV=1, a forward transform is computed. If FWD_INV=0, an inverse transform is computed. Write enable for FWD_INV (Active High). Scaling schedule: The scaling schedule is specified with two bits for each stage, starting at the two LSBs. The scaling can be specified as 3, 2, 1, or 0, which represents the number of bits to be shifted from the computed result of Radix block. For N=128, Radix-2 one possible scaling schedule is [1, 1, 1, 1, 0, 1, 2]. Write enable for SCALE_SCH (Active High): This port is available only with scaled arithmetic and not with full precision.
xn_im_i
Input
start_i
Input
unload_i
Input
nfft_i
Input
nfft_we_i
Input
fwd_inv_i
Input
fwd_inv_we_i
Input
1 2 x ceil(number_of_stage/2) for Radix-4 Loop Engine or 2 x (number_of_stages) for Radix-2 Loop Engine
scale_sch_i
Input
scale_sch_we_ i
Input
Rev: FFT_v1_0_ds002
www.eASIC.com
10
CONFIDENTIAL
Table 3: Port Name Output Port Descriptions Directio n Output Width Description
config_o
Indicates that the core is still in the configuration stage. (That is the core is still in the evaluation of the Phase factors). Nothing should be done until this signal goes low. Output data bus: Real component in twos complement format. Output data bus: Imaginary component in twos complement format. Index of input data. (Here maximum Point size is the point size that is configured while generation the core) Index of output data. ( Here maximum Point size is the point size that is configured while generation the core) Ready for data (Active High): RFD is High during the load operation. Core computation stage(Active High): This signal goes High while the core is computing the transform. Data valid (Active High): This signal is High when valid data is presented at the output bus. FFT computation complete strobe (Active High): DONE transitions High for one clock cycle when the transform calculation has completed for the frame. Block exponent: The number of bits scaled for every point in the data frame. Available only when block-floating point is used. Arithmetic overflow indicator (Active High): OVFLO is High during result unloading if any value in the data frame overflowed. The OVFLO signal is reset at the beginning of a new frame of data. This port is optional and only available with scaled arithmetic.
xk_re_o xk_im_o
Output Output
B B
xn_index_o
Output
xk_index_o
Output
rfd_o
Output
busy_o
Output
dv_o
Output
done_o
Output
blk_exp_o
Output
ovflo_o
Output
Rev: FFT_v1_0_ds002
www.eASIC.com
11
CONFIDENTIAL
n_fft_i n_fft_we_i fwd_inv_i fwd_inv_we_i scale_sch_i scale_sch_we_i start_i rfd_i xn_index_i x(n)
0 1 1 0 2 1 2 n-1 n-1
Rev: FFT_v1_0_ds002
www.eASIC.com
12
n_fft_i n_fft_we_i fwd_inv_i fwd_inv_we_i scale_sch_i scale_sch_we_i start_i rfd_i xn_index_i x(n) busy_o done_o dv_o xk_index_o X(k)
0 1 2 0 1 2 n-1 n-1 1 0 1 2 0 1 2 n-1 n-1
Rev: FFT_v1_0_ds002
www.eASIC.com
13
CONFIDENTIAL
Table 4:
User Parameters for the FFT Values 16 to 36 Description Real = data_width/2 Imag = data_width/2 Real = phase_width/2 Imag = phase_width/2 Specifies the Fourier Transform Length in steps of multiple of 2 (8,16,32..) for Radix-2 architecture Transform Length in steps of multiple of 4 (16,64,256..) for Radix-4 architecture To make the core configurable for taking bit reverse order input To Configure the core to compute only IFFT. To make the core configurable for the run time FFT/iFFT computation If defined then the core works for Nextreme device.
phase_width
16 to 36
n_point
8 to 16K
NX
Defines
If NOT defined then the core works for N2X device (Nextreme-2 Device). To make the FFT core run time N point configurable
RUN_TIME_N_CONFIG
Defines
Rev: FFT_v1_0_ds002
www.eASIC.com
14
CONFIDENTIAL
Component Instantiation
The FFT v1.0 can be instantiated into Verilog or VHDL code (VHDL will require a mixed-mode design flow). Below are component declarations for Verilog and VHDL design flows.
Rev: FFT_v1_0_ds002
www.eASIC.com
15
CONFIDENTIAL
component fft_r2_top_rtl is generic( data_width phase_width n_point latency_radix ); port( clk_i : rst_ni : ce_i : xn_re_i : xn_im_i : start_i : unload_i : nfft_i : nfft_we_i : fwd_inv_i : fwd_inv_we_i : scale_sch_i : scale_sch_we_i: config_o : xk_re_o : xk_im_o : xn_index_o : xk_index_o : rfd_o : busy_o : dv_o : done_o : blk_exp_o : ovflo_o : ); in std_logic; in std_logic; in std_logic; in std_logic_vector(FFT_DATA_WIDTH/2 - 1 downto 0); in std_logic_vector(FFT_DATA_WIDTH/2 - 1 downto 0); in std_logic; in std_logic; in std_logic_vector(4:0); in std_logic; in std_logic; in std_logic; in std_logic_vector(SCALING_WIDTH-1 downto 0); in std_logic; out std_logic; out std_logic_vector(FFT_DATA_WIDTH/2 - 1 downto 0); out std_logic_vector(FFT_DATA_WIDTH/2 - 1 downto 0); out std_logic_vector(FFT_INDEX_WIDTH - 1 downto 0); out std_logic_vector(FFT_INDEX_WIDTH - 1 downto 0); out std_logic; out std_logic; out std_logic; out std_logic; out std_logic; out std_logic
: : : :
:= := := :=
Rev: FFT_v1_0_ds002
www.eASIC.com
16
CONFIDENTIAL
Bit-Accurate C Model
The C Model is designed for bit-accurate modelling of the FFT core. The model produces the same exact result as Verilog implementation of the FFT core. It is important to note that the C-model is not cycle accurate and does not model interface or clock latency. The files provided with the C-Model are 1. fft_bitacc_cmodel.c - The complete C-Model 2. fft_inter_parameter.h Internal parameters required for the IP 3. fft_user_defines.h - User parameters for the FFT The C-Model mainly consists of two functions which mimic the two loop engine architectures: 1. r2_fft_m() for Radix-2 Architecture 2. r4_fft_m() for Radix-4 Architecture
System Requirements
A 64-bit C compiler is required to use the C-Model
Rev: FFT_v1_0_ds002
www.eASIC.com
17
CONFIDENTIAL
User Defines
Change the "fft_user_defines.h" file for setting the required parameters for c model Table 5: User Parameters for the C Model Description The Point size of a Transform if radix2 computation is required define RADIX2 if radix4 computation is required define RADIX4 fft data width including real and imaginary data fft data width including real and imaginary data number of frames first frame N_POINT value second frame N_POINT value depending on NO_OF_FRAMES values we need to have that many F*_N_POINT defines first frame transform value second frame transform value forward transform value = 1 for reverse transform value = 0 first frame scaling value second frame scaling value this value indicates the scaling after each stage whether to include scaling or not if it is 1 then scaling is enabled if it is 0 then scaling is disabled and default scaling is applied. when STATIC_IFFT = 0, negates the imaginary values of phase factor to calculate the IFFT in multi frame transform input file name output file name This is to print intermediate stage results This is to print phase factor values make the array of F*_N_POINT make the array of F*_FWD_INV make the array of F*_SCA_VAL
www.eASIC.com 18
Parameter Name N_POINT RADIX2 RADIX4 FFT_DATA_WIDTH FFT_PHASE_WIDTH NO_OF_FRAMES F1_N_POINT F2_N_POINT
F1_FWD_INV F2_FWD_INV
F1_SCA_VAL F2_SCA_VAL
SCALING_EN
In "fft_user_defines.h" the character array "input" specifies the input file name. This file should contain the input data to be transformed. The data should be in decimal format, and contain the real and imaginary values separated by space. An example is shown below: +19783 +47534 +61825 +16308 +118822 +43074 +96314 +117995 . The left most column is the real part and the right portion is the imaginary part. Note that the data are in decimal format.
Rev: FFT_v1_0_ds002
www.eASIC.com
19
CONFIDENTIAL
Directory structure
Figure 4 shows the directory structure after unpacking the release package. Make sure the directory structure is correct before using the core:
CONFIDENTIAL
4.
Rev: FFT_v1_0_ds002
www.eASIC.com
21
CONFIDENTIAL
8.
To generate verdict report, go the folder \data\dsp_cores\fft\SIMULATION\ in Modelsim, then type do fft_gen_verdict_rpt.tcl
9.
Final report of the test cases status PASS/FAIL will be present in the verdict report (verdict.rpt) in the folder data\dsp_cores\fft\TEST. This verdict will indicate that any value mismatch has occurred if any. Report doesnt contain the reason for the failure or dump of DUT.
NOTE: 1. Before compilation of any testcase set the variable ELIBS to the correct path where the files for the simulation of single port ram is available For running one testcase following changes are required a. Go to the ./scripts/run_r2_all.do file. b. Look for quietly set TESTCASE {<testcase_name>}. This would have the list of the testcases that needs to run. c. Set TESTCASE to a particular testcase that needs to be run. d. After modification follow the steps mentioned above for compilation & simulation of the core e. The same procedure applies for the radix-4 architecture also. We just need the change the list in the file ./scripts/run_r4_all.do
2.
Script Descriptions
The scripts are available in the folder SIMULATION/scripts/ to run a testcase. Following are the files present: 1) compile_all.do 2) compile_r4_all.do 3) run_r2_all.do of Radix-2 architecture files 4) run_r4_all.do of // Compilation of library files. Loading the design & running the simulation Radix-4 architecture files 5) run_all.do //Simulation of both Radix-2 & Radix-4 architecture //Compilation of Radix 2 Architecture source & TB files //Compilation of Radix 4 Architecture source & TB files //Compilation of library files. Loading the design & running the simulation
CONFIDENTIAL
compile_all.do
This script will compile source file & Test bench files required for the Radix-2 architecture. Following are the list of the files that this script compiles Test bench files a) fft_tb_top.v b) fft_clock.v c) fft_data_driver.v d) fft_protocol_checker.v e) fft_result_analyser.v f) fft_throughput.v g) fft_test_script_driver.v RTL files a) fft_r2_top_rtl.v b) fft_r2_comparator_rtl.v c) fft_r2_counter_rtl.v d) fft_r2_masking_rtl.v e) fft_r2_nfft_gen_rtl.v f) fft_r2_addr_ctrl_rtl.v g) fft_r2_addr_manip_rtl.v h) fft_r2_data_reordr_rtl.v i) fft_r2_twi_spt_rtl.v j) fft_r2_comp_rtl.v k) fft_r2_scale_rtl.v l) fft_mux_2to1_rtl.v m) fft_delay_line_rtl.v n) fft_delay_line_enb_rtl.v o) fft_cmplx_add_rtl.v p) fft_cmult_rtl.v q) fft_cmplx_sub_rtl.v r) fft_saturation_indication_rtl.v s) fft_saturation_rtl.v t) fft_r2_phase_mem_rtl.v u) fft_r2_data_mem_rtl.v
Rev: FFT_v1_0_ds002
www.eASIC.com
23
CONFIDENTIAL
compile_r4_all.do
This script will compile source file & Test bench files required for the Radix-4 architecture. Following are the list of the files that this script compiles Test Bench files a) fft_tb_top_r4 b) fft_clock.v c) fft_data_driver.v d) fft_protocol_checker.v e) fft_result_analyser.v f) fft_throughput.v g) fft_test_script_driver.v RTL files a) fft_r4_top_rtl.v b) fft_r4_comparator_rtl.v c) fft_r4_masking_rtl.v d) fft_r4_nfft_gen_rtl.v e) fft_r4_counter_rtl.v f) fft_r4_comp_rtl.v g) fft_r4_scale_rtl.v h) fft_r4_addr_ctrl_rtl.v i) fft_r4_addr_manip_rtl.v j) fft_r4_data_reordr_rtl.v k) fft_r4_twi_spt_rtl.v l) fft_r4_comparator_rtl.v m) fft_generic_mux_rtl.v n) fft_mux_2to1_rtl.v o) fft_delay_line_rtl.v p) fft_delay_line_enb_rtl.v q) fft_cmplx_add_rtl.v r) fft_cmult_rtl.v s) fft_cmplx_sub_rtl.v t) fft_saturation_indication_rtl.v u) fft_saturation_rtl.v v) fft_r4_phase_mem_rtl.v w) fft_r4_data_mem_rtl.v
Rev: FFT_v1_0_ds002
www.eASIC.com
24
CONFIDENTIAL
run_r2_all.do
This script will create the work directory when user runs for the 1 time. The following library files are compiled (Specific to eASIC technology) Following are the list of files that are compiled nxfc_logic_bram.v nxfc_logic_bram_wrapper.v nxfc_logic_core.v eip_nx_bram_2p_v2.v ecell_delay.v eip_nx_bram_v2.v In this script the compile_all.do script is called for the compilation of the source & TB files To run regression & a particular testcase follow the procedure given in the section above. This script is for simulation of Radix-2 FFT architecture
st
run_r4_all.do This script is very similar to the Radix-2 run_all.do script. Only change is that this script ids for the simulation of the Radix-4 architecture.
run_all.do This is the top level top script which intern calls run_r2_all.do & run_r4_all.do scripts fft_gen_verdict_rpt.tcl This script is for the generation of the verdict file, which will tell what is the status of the all the testcase. The report also provides the time stamp of each of the test cases.
Rev: FFT_v1_0_ds002
www.eASIC.com
25
CONFIDENTIAL
An Example Testcase
Select a testcase to be run. Let us consider TV_FFT_21h is to be run. As per the testcase register .xls TV_FFT_21h has the following configuration. 1) Run time configurable 2) N point = 1024 3) iFFT 4) Data width = 36 (Real = 18 & Imag = 18) 5) Phase width = 36 (Real = 18 & Imag = 18) Hence, in the fft_config.vh file under the directory \ip_libs\data\dsp_cores\fft\INTERFACE\ TV_FFT_21h\ which is the configuration file for the core should have following definitions `define N_POINT 1024 `define FFT_DATA_WIDTH 36 `define FFT_PHASE_WIDTH 36 `define RUN_TIME_N_CONFIG `define STATIC_IFFT In addition to this the user can also define clock period & duty cycle This completes the FFT core configuration. Next is changing the testcase name in the run_r2_all.do for simulation and running it In the run_r2_all.do file which is under the folder \ip_libs\data\dsp_cores\fft\SIMULATION\scripts, set the test case name to be run as follows and save the file. quietly set TESTCASE {" TV_FFT_21h "} Run the script using the command do ../scripts/run_r2_all.do After the simulation is complete, the output report file can be viewed for final results. The report file will be available in the file TV_FFT_21h.rpt under the folder \ip_libs\data\dsp_cores\fft\TESTBENCH\simdata\TV_FFT_21h\reports
Rev: FFT_v1_0_ds002
www.eASIC.com
26
CONFIDENTIAL
References
1) Digital Signal Processing - Principles, Algorithms & Applications Proakis & Manolakis][3rd Ed].
Revision History
Date 01/28/2009 01/28/2009 Version v1.0 ds001 v1.0 ds002 Summary of Changes Initial release Removal of Block Floating Point Scaling
Rev: FFT_v1_0_ds002
www.eASIC.com
27