Professional Documents
Culture Documents
MJC 010233
MJC 010233
net/publication/258926868
CITATIONS READS
3 1,172
3 authors, including:
All content following this page was uploaded by Syed Manzoor Qasim on 22 May 2014.
III. FPGA INTERCONNECT ARCHITECTURE IV. CASE STUDY OF XILINX FPGA ARCHITECTURES
Based on the arrangement of the logic and interconnect Due to the parallel nature, high frequency, and high density
resources, FPGAs are broadly categorized into four main of modern FPGAs they make an ideal platform for the
types. implementation of computationally intensive and massively
parallel architecture. FPGA can accommodate multiple
A. Island Style FPGAs processors and control units that work in parallel. This section
It consists of an array of programmable logic blocks presents a case study of state-of-the-art FPGAs from Xilinx.
connected via vertical and horizontal programmable routing These include Spartan-3, Virtex-4 and Virtex-5 FPGAs.
channels [6]. A logic block input or output can connect to the
A. Spartan-3 FPGAs
routing channels with the connection box that consists of
The Spartan-3 FPGA belongs to the fifth generation Xilinx
MASAUM Journal of Computing, Volume 1 Issue 2, September 2009 254
family. It is specifically designed to meet the needs of high embedded storage blocks.
volume, low unit cost electronic systems. The family consists XtremeDSP slices contain a dedicated 18×18-bit 2’s
of eight member offering densities ranging from 50,000 to five complement signed multiplier, adder, and a 48-bit
million system gates [10]. The Spartan-3 FPGA consists of accumulator. Each multiplier or accumulator can be used
five fundamental programmable functional elements: CLBs, independently and can be used to implement extremely
IOBs, Block RAMs, dedicated multipliers (18×18) and digital efficient and high speed DSP and image processing algorithms
clock managers (DCMs). Spartan-3 family includes Spartan- [11].
3L, Spartan-3E, Spartan-3A, Spartan-3A DSP, Spartan-3AN
C. Virtex-5 FPGAs
and the extended Spartan-3A FPGAs.
Spartan-3L FPGAs consume less static current than the The Virtex-5 devices built on a 65 nm state-of-the-art
corresponding members of the standard Spartan-3 family. Its copper process technology are a programmable alternative to
capability to operate in hibernate mode lowers device power custom ASIC technology [12].The Virtex-5 LX platform also
consumption to the lowest possible levels. The Spartan-3E contains many hard-IP system-level blocks, including Block
family builds on the success of the earlier Spartan-3 family by RAM/first in first out (FIFO), second generation 25×18 DSP
increasing the amount of logic per I/O, significantly reducing slices, SelectIO technology with built-in digitally-controlled
the cost per logic cell. The Spartan-3A family builds on the impedance, ChipSync source-synchronous interface blocks,
success of the earlier Spartan-3E and Spartan-3 FPGA families enhanced clock management tiles with integrated DCM and
by increasing the amount of I/O per logic, significantly phase locked loop (PLL) clock generators, and advanced
reducing the cost per I/O. configuration options.
The Spartan-3A DSP FPGA is built by extending the In addition to the regular programmable functional
Spartan-3A FPGA family by increasing the amount of memory elements, Virtex-5 family provides power-optimized high
per logic and adding XtremeDSP DSP48A slices. The speed serial transceiver blocks for enhanced serial
XtremeDSP DSP48A slices replace the 18x18 multipliers connectivity, tri-mode Ethernet MACs and high-performance
found in the Spartan-3A devices. The Spartan-3AN FPGA PPC 440 microprocessor embedded blocks. Virtex-5 devices
family combines all the features of the Spartan-3A FPGA also use triple-oxide technology for reducing the static power
family plus leading technology in-system flash memory for consumption. Their 1.0 V core voltage and 65 nm
configuration and nonvolatile data storage. It is excellent for implementation process leads also to dynamic power
applications such as blade servers, medical devices, consumption reduction as compared to Virtex-4 devices.
automotive infotainment, GPS and other small consumer The Virtex-5 family is the first FPGA platform to offer a
products. Extended Spartan-3A FPGA includes non-volatile real 6-input look-up table (LUT) with fully independent inputs
Spartan-3AN devices, which combine leading edge FPGA and as shown in Fig. 1. This leads to increased logic fabric
flash technologies to provide a new evolution in security, performance due to the reduced critical path delay through the
protection and functionality, ideal for space-critical or secure LUTs.
applications.
B. Virtex-4 FPGAs
Virtex-4 FPGAs are produced on a state-of-the-art 90 nm
copper process, using 300 mm wafer technology [11]. It
consists of three platform families i.e., LX, SX and FX.
Virtex-4 hard-IP core blocks include the IBM PowerPC (PPC)
405 32-bit reduced instruction set computer (RISC)
processors, tri-mode Ethernet media access controls (MACs),
622 Mbps to 6.5 Gbps serial transceivers, dedicated DSP
slices and high-speed clock management circuitry. Virtex-4
devices consume approximately 50% the power of respective
Virtex-II Pro devices due to static and dynamic power Fig. 1. Virtex-5 6-input LUT architecture (Courtesy of Xilinx Inc.)
reduction enabled by triple-oxide technology and reduced core
voltage and capacitance respectively. The Virtex-4 FPGA It implements significantly more logic than a LUT with four
family comprises of CLBs, Block RAMs, XtremeDSP Slices inputs. Power consumption is also reduced because the larger
and DCMs. LUT reduces the amount of required interconnects [12].
Block RAM stores relatively large amounts of data more Virtex-5 family uses a new diagonally symmetric interconnects
efficiently than the distributed RAM. The Virtex-4 Block to minimize the number of interconnects required from CLB to
RAM resources are 18 Kb true dual-port RAM blocks, CLB in order to realize major performance improvements
programmable from 16K×1 to 512×36, in various depth and [12].
width configurations. Each port is totally synchronous and Advanced DSP48E slices are available in Virtex-5 FPGAs
independent. Block RAM is cascadable to implement large that helps in accelerating computation intensive DSP and
MASAUM Journal of Computing, Volume 1 Issue 2, September 2009 255
image processing algorithms. These slices can operate at a routing technology that ensures identical routing resource
maximum frequency of 550 MHz, drawing only 1.38 mW/100 usage for any function regardless of placement within the
MHz. device.
C. Stratix II/Stratix II GX FPGAs
V. CASE STUDY OF ALTERA FPGA ARCHITECTURES Stratix II and Stratix II GX FPGA families are based on a
1.2 V, 90 nm, all layer copper SRAM process and offer up to 9
This section presents a case study of state-of-the-art FPGAs
MB of on-chip, TriMatrix memory for demanding, memory
from Altera. These include Cyclone/Cyclone II, Stratix/Stratix
intensive applications and have up to 96 DSP blocks with up
GX and Stratix-II/Stratix-II GX devices.
to 384 multipliers for efficient implementation of high
A. Cyclone/Cyclone II FPGAs performance DSP functions [17]. Stratix II devices support
Cyclone FPGAs use a two-dimensional row and column- various I/O standards along with support for one Gbps source
based architecture to implement custom logic. It is based on a synchronous signaling with dynamic phase alignment (DPA)
1.5 V, 0.13 µm, all layer copper SRAM process [13]. Cyclone circuitry.
II FPGAs is based on TSMC 90 nm low-k dielectric process in Stratix II devices offer a complete clock management
order to extend the functionality of Cyclone FPGAs [14]. solution with internal clock frequency of up to 550 MHz and
The smallest unit of logic in the Altera FPGA family is the up to 12 PLLs. Stratix II devices have the ability to decrypt a
logic element (LE). Each LE contains a four-input LUT. A configuration bitstream using the advanced encryption
collection of ten LEs constitute a logic array block (LAB) standard (AES) algorithm to protect designs. Stratix II GX
which is a fundamental programmable functional element in devices have somewhat fewer logic resources than the
Altera FPGAs. respective Stratix II devices due to the space occupied by the
A significant addition to the Cyclone/Cyclone II FPGA embedded transceivers [18].
features is the addition of embedded multiplier blocks for the Adaptive logic module (ALM) is the basic building block
efficient implementation of multiplier intensive DSP functions used in Stratix II architecture. The ALM packs more
[14]. In addition to these, cyclone FPGA also includes combinational logic into less area, providing a higher logic
embedded memory blocks and PLLs for clock management. density than a standard 4-input LUT architecture and more
logic per register as shown in Fig. 2.
B. Stratix/Stratix GX FPGAs It contains a variety of LUT based resources that can be
Stratix and Stratix GX families are also based on a 1.5 V, divided between two adaptive LUTs (ALUTs). In addition to
0.13 µm, all layer copper SRAM process, with higher the ALUT-based resources, each ALM contains two
densities. Stratix devices offer up to 22 embedded DSP blocks programmable registers, two dedicated full adders, a carry
for applications that enable efficient implementation of high- chain, a shared arithmetic chain, and a register chain. Through
performance filters and multipliers. Stratix devices support these dedicated resources, the ALM can efficiently implement
various I/O standards and also offer a complete clock high performance arithmetic functions and shift registers. Each
management solution with its hierarchical clock structure with LAB of Stratix II FPGA family consists of eight ALMs [17]-
up to 420 MHz performance and up to 12 PLLs [15]. [18].
Stratix GX family of devices is Altera’s second FPGA
family to combine high-speed serial transceivers with a
scalable, high-performance logic array. Stratix GX devices
include 4 to 20 high-speed transceiver channels, each
incorporating clock data recovery (CDR) technology and
embedded SERialiser/DESerialiser (SERDES) capability at
data rates of up to 3.1875 Gbps. The Stratix GX FPGA
technology is built upon the Stratix architecture. This scalable,
high-performance architecture makes Stratix GX devices ideal
for high-speed backplane interface, chip-to-chip, and
communications protocol-bridging applications [16].
Stratix/Stratix GX FPGAs use advanced TriMatrix memory Fig. 2. Block Diagram of Stratix-II ALM (Courtesy of Altera Inc.)
which consists of three types of RAM blocks: M512, M4K and
MRAM blocks. In the Stratix family connections between LEs,
TriMatrix memory, DSP blocks, and device I/O pins are VI. 3D-FPGA ARCHITECTURE
provided by the MultiTrack interconnect structure with Although the two-dimensional (2D)-FPGA architecture
DirectDrive technology. The MultiTrack interconnect consists discussed so far has several advantages such as high degree of
of continuous, performance-optimized routing lines of flexibility and inherent parallelism, it suffers from a major
different lengths and speeds used for inter- and intra-design problem of long interconnect delays. As discussed in [19]-
block connectivity. DirectDrive technology is a deterministic [20], almost 80% of the total power is dissipated in
MASAUM Journal of Computing, Volume 1 Issue 2, September 2009 256
interconnects and clock networks. To reduce the interconnect [13] Altera, Cyclone Architecture, v. 1.5, May 2008.
delay, 3D-FPGA was proposed in [21]-[22]. [14] Altera, Cyclone II FPGA family datasheet, v. 3.2, May 2008.
3D-FPGA model is based on the 2D-FPGA architecture that [15] Altera, Stratix Device family datasheet, v. 3.2, July 2005.
[16] Altera, StratixGX Device family datasheet, v. 3.2, July 2005.
are vertically stacked and interconnects are provided between
[17] Altera, Stratix II Device family datasheet, v. 4.2, May 2007.
vertically adjacent 3D-switch blocks. The vertical stacking
[18] Altera, Stratix II GX Device family datasheet, v. 1.6, Oct. 2006.
results in reduction of total interconnect length which
[19] S. M. Qasim, S. A. Abbasi, and B. Almashary, “ A review of FPGA-
eventually results in achieving reduced interconnect delay, based design methodology and optimization techniques for efficient
improved performance and speed. hardware realization of computation intensive algorithms,” in Proc. of
IEEE Intl. Conf. on Multimedia, Signal Processing and Communication
However, the advantages associated with 3D-FPGA can be Technologies, Mar. 2009, pp.313–316.
fully exploited only when supported by an efficient placement [20] S. M. Qasim, S. A. Abbasi, and B. Almashary, “ An overview of
and routing algorithm which is currently a hot research topic. advanced FPGA architectures for optimized hardware realization of
Several research groups from academia and industry are computation intensive algorithms,” in Proc. of IEEE Intl. Conf. on
Multimedia, Signal Processing and Communication Technologies, Mar.
currently working to achieve this objective. This topic is of 2009, pp.300–303.
great importance and will be further explored in detail as [21] M. J. Alexander, J. P. Cohoon, J. L. Colflesh, J. Karro, and G. Robins,
future work. “Three-dimensional field-programmable gate arrays,” in Proc. of Eighth
Annual IEEE Intl. ASIC Conf. and Exhibit, Sept. 1995, pp. 253–256.
[22] M. Leeser, W. M. Meleis, M. M. Vai, S. Chiricescu, W. Xu, and P. M.
Zavracky, “Rothko: a three-dimensional FPGA,” IEEE Design and Test
VII. CONCLUSION Computers, vol. 15, no. 1, pp. 16–23, Jan–Mar. 1998.
For efficient hardware implementation of computation
intensive algorithms with efficient and proper utilization of Syed M. Qasim received the B.Tech and M.Tech Degrees in Electronics
available resources, it is important to study the targeted FPGA Engineering from Z. H. College of Engineering and Technology, Aligarh
architecture and technology in detail. The objective of this Muslim University, India in 2000 and 2002 respectively. Currently, he is
paper has been to present a state-of-the-art review of advanced working as a Researcher in the Electronics Group, Department of Electrical
Engineering, King Saud University, Saudi Arabia. He is the author or
FPGA architectures and technologies that can be utilized by coauthor of more than 30 papers in international journals and refereed
the designers and researchers working in this area to achieve conferences.
high performance and efficient FPGA-based circuits and He is a member of the Institution of Electronics and Telecommunication
Engineers, India and International Association of Engineers, Hong Kong. His
systems. We have also presented a case study of most widely current research interests include Digital VLSI System Design and High
used state-of-the-art Xilinx and Altera FPGAs respectively. Performance Reconfigurable Computing using FPGAs.
We also briefly presented the advantages and current research
Shuja A. Abbasi was born in Amroha, India in 1950. He obtained the B.Sc
going on in the area of 3D FPGAs. and M.Sc Degrees in Electrical Engineering from Z. H. College of
Engineering and Technology, Aligarh Muslim University, India in 1970 and
REFERENCES 1972 respectively. He received the Ph.D Degree in Microelectronics from
University of Southampton, England in 1980. He joined as Lecturer at
[1] T. J. Todman, G. A. Constantinides, S. J. E. Wilton, O. Mencer, W.
Aligarh Muslim University in 1971, was promoted to the positions of Reader
Luk, and P. Y. K. Cheung, “Reconfigurable computing: architectures
and Professor in 1982 and 1986 respectively. He served as the Chairman,
and design methods,” IEE Proc. Computer and Digital Techniques, vol.
152, no. 2, pp. 193–207, Mar. 2005. Department of Electronics Engineering, Z. H. College of Engineering and
Technology, Aligarh Muslim University from 1996 to 1999. He joined as a
[2] G. R. Goslin, “A Guide to Using Field Programmable Gate Arrays for Professor of Electronics Engineering at College of Engineering, Department
Application-Specific Digital Signal Processing Performance,”
of Electrical Engineering, King Saud University, Riyadh in1999 and is
Microelectronics Jour., vol. 28, no. 4, pp. 24–35, May 1997.
continuing since then.
[3] P. Chow, S. O. Seo, J. Rose, K. Chung, G. P. Monzon, and I. Rahardja, He has fairly large number of research publications to his credit so far and
“The Design of a SRAM-Based Field Programmable Gate Array–Part I: has completed many client funded projects from various organizations. He is
Architecture,” IEEE Tran. Very Large Scale Integration (VLSI) systems, a senior member of IEEE and Fellow of Institution of Electronics and
vol. 7, no. 2, pp. 191–197, June 1999.
Telecommunication Engineers, India.
[4] P. Chow, S. O. Seo, J. Rose, K. Chung, G. P. Monzon, and I. Rahardja,
“The Design of a SRAM–Based Field-Programmable Gate Array–Part Bandar A. Almashary is an Associate Professor in the Department of
II: Circuit Design and Layout,” IEEE Tran. Very Large Scale Electrical Engineering, King Saud University, Saudi Arabia. He received the
Integration (VLSI) systems, vol. 7, no. 3, pp. 321–330, Sept. 1999. BS and MS Degrees in Electrical Engineering from King Saud University,
[5] P. Marchal, “Field Programmable Gate Arrays,” Comm. of the ACM, Saudi Arabia. He received the Ph.D Degree in Optoelectronics from
vol. 42, no. 4, pp. 57–59, April 1999. University of Pittsburgh, USA in 1996. He has fairly large number of research
[6] V. George and J. M. Rabaey, Low–Energy FPGAs: Architecture and publications to his credit. He is a member of IEEE.
Design, Kluwer Academic Publishers, USA, 2001.
[7] Actel Corporation, Accelerator Series FPGAs–ACT3 Family,
Sunnyvale, California, Sept. 1997.
[8] Actel Corporation, SX Family of High Performance FPGAs, Sunnyvale,
California, Feb. 2007.
[9] M. Butts and J. Batcheller, “Methods of using electronically
reconfigurable logic circuits,” US Patent 5036473, 1991.
[10] Xilinx, Spartan-3 FPGA family datasheet, June 2008.
[11] Xilinx, Virtex-4 FPGA family overview, Sept. 2007.
[12] Xilinx, Virtex-5 FPGA family overview, June 2008.