Digital Circuit Design 2 10636321: Dr. Ashraf Armoush

Digital Circuit Design 2
10636321
Dr. Ashraf Armoush
© 2023 Dr. Ashraf Armoush
Field Programmable Gate Array

(FPGA)
© 2023 Dr. Ashraf Armoush

Outline
• PLD vs. ASIC
• FPGA
• FPGA Structure
• Programming Technologies
• FPGA Architecture
• Embedded RAM, Multipliers, Adders and MACs
• Embedded Processor Cores
• Clock Trees and Clock Managers
• Programming (Configuring) an FPGA
• JTAG Port
© 2023 Dr. Ashraf Armoush , An-Najah National University 3
Programmable Logic Devices (PLDs)
 Highly configurable.
 Fast design and modification times.
 Couldn’t support large or complex functions.

ASIC: Application-Specific Integrated Circuits
An ASIC: is an integrated circuit designed specifically for a special
purpose or application.
 This also implies that an ASIC is built only for one and only one
customer. (e.g. an IC designed for a specific line of mobile phones of a
company)
 Support extremely large and complex functions.

 Painfully expensive and time-consuming to design.
The Gap between PLD and ASICs
• In order to overcome this gap between PLDs and ASICs, Xilinx developed
a new class of IC called a field-programmable gate array (FPGA). [1985]
 FPGA can be customized in the field like PLDs.
 FPGA can contain millions of logic gates and implement extremely
complex functions that previously could be realized only using ASICs.
 The cost of an FPGA design is much lower than that of an ASIC.
 Implementation design changes is much easier in FPGAs, and the time to
market for such designs is much faster.

Technology Timeline
FPGA
• FPGA: is a digital integrated circuit (IC) that contains configurable
blocks of logic along with configurable interconnects between these
blocks.
– “Field Programmable” refers to the fact that its programming takes
place “in the field” (an opposed to devices whose internal functionality is
hardwired by the manufacturer.
– In-System Programmable (ISP): if a device is capable of being

programmed while remaining resident in a higher-level system.
– FPGAs can be programmed on a higher level with various Hardware

Description Languages (HDL).
– The translation to gate level is done by tools automatically

FPGA Applications
• Today’s FPGAs contain millions of gates and can be used in many
applications such as ( digital signal processing, software-defined radio,
aerospace and defense systems, ASIC prototyping, medical imaging, computer
vision, speech recognition, cryptography,, computer hardware emulation, radio
astronomy, metal detection) and a growing range of other areas.
• In general, FPGAs applications can be categorized into 5 major segments:

1. ASIC and custom silicon: To implement a variety of designs that could
previously realized using only ASICs and custom silicon .
2. Digital signal processing (DSP): Today’s FPGAs can contain embedded
multipliers, dedicated arithmetic routing, and large amount of on-chip RAM
to facilitate High-Speed DSP.
3. Embedded microcontrollers: FPGA are becoming attractive for embedded
control applications due to the falling price of FPGA and the available
capability to implement a soft processor core.
4. Physical layer communication: Implement the interfaces between the
physical layer communication chip and higher level network protocol layers .
5. Reconfigurable computing: The inherent parallelism and reconfigurability
give the ability to make substantial changes to the data path itself in addition
to the control flow during runtime.
FPGA Structure
• The most common architecture consists of:
1. Configurable Logic Block (CLB) or Logic Array Block (LAB)
2. Configurable I/O Block (IOB)
3. Programmable Interconnect

General Structure of an FPGA
Programmable (Configurable) Logic Block

• The simple programmable logic block consists of:
1. Lookup Table (LUT): By means of SRAM programming cells, every logic block can
be configured to perform a different function.
2. A Register that could act as a flip-flop or a latch: if the flip-flop option is selected
the register can be configured to be triggered by a positive or negative-going clock
(common to all of the logic blocks ).
3. Multiplexer: can be configured to accept the output from the LUT or a separate
input.
• The logic blocks in modern FPGAs can be significantly more complex.

• Each FPGA contains a large number of programmable logic block.
Lookup Tables (LUTs)
• Assume that a LUT was required to perform the function: y=(ab)+c’
Simplified LUT
• Note: by means of its own SRAM, the interconnect can be programmed

such that the primary inputs are connected to the inputs of one ore more
CLBs, and the outputs from any logic block can be used to drive the inputs
to any other logic block, the primary outputs from the device, or both.
SRAM-Based Devices
• The majority of FPGAs are based on the use of SRAM configuration cells which
can be configured over and over again:
• Advantages:
 The new design can be quickly implemented and tested.
 The FPGA can be initially be programmed to perform some test before
reprogrammed during start up.
 The SRAM cells are created using exactly the same CMOS technology as
the rest of the device (no special processing steps)
• Disadvantages
 SRAM-based devices have to be reconfigured every time the system is
powered up. (requires the use of a special external memory device)
 Security: It can be difficult to protect your intellectual property (IP). This is
because the configuration file is stored in some form of external memory.
 Some of today’s SRAM-based FPGAs support the concept of bit-stream

encryption, where the final configuration data is encrypted before being
stored in the external memory.

Antifuse-Based Devices
• Unlike SRAM-based devices, which are programmed while resident in the
system, antifuse-base are programmed off-line using a special device
programmer.
• Advantages:
 Nonvolatile, which means that they are immediately available as soon
as power is available without the need for external memory.
 Their interconnect structure is naturally “rad hard” which means that
they are relatively immune to the effect of radiation. (suitable for
military and aerospace applications)
 Lower power consumption and faster than SRAM-based ???
• Disadvantages
 The main disadvantage associated with antifuse-based devices is that
they are OTP (One Time Programmable). This makes these
components a poor choice for use in a development or prototyping
environment.
EEPROM/ Flash Based Devices

• Once programmed, the data they contain is Nonvolatile.
• Can be programmed off-line.
• Some versions are in-system programmable, but their programming time is
about 3 times that of an SRAM-based component.
• Protection:
– Some of these devices use the concept of a multibit key, which can range
from 50 bits to several hundred.
– Once you have programmed the device, you can load your user defined key
to secure its configuration data.
– After the key has been loaded, the only way to read the data out of the
device, or to write new data, is to load a copy of your key via the JTAG port.
– With current speed of the JTAG port (20Mhz), it would take billions of years
to crack the key by exhaustively trying every possible value.
• Disadvantages:
– Require around 5 additional process steps on top of the standard CMOS
technology, which results in their lagging by one generation behind SRAM-
based devices.
– Have relatively high static power consumption.
Summary of programming technologies
Feature SRAM Antifuse E2PROM/Flash
Technology node State-of-the-art One or more One or more generation
generation behind behind
Reprogrammable Yes (in system) NO Yes (in-system or offline)
Reprogramming speed Fast --- 3x slower than SRAM
Volatile Yes No No
External configuration file Yes No No
Good for Prototyping Yes (very good) No Yes (reasonable)
Instant-on No Yes Yes
IP-Security Acceptable Very Good Very Good
Size of configuration cell Large (6 transistors) Very small Medium-Small (2 trans.)
Power consumption Medium Low Medium
Rad Hard No Yes Not really
Fine-, Medium-, Coarse-grained Architecture

• Based on the size of the logic blocks, it is common to categorize FPGA offerings
as being either:
 Fine grained architecture: Each logic block can be used to implement only a
very simple function( AND, OR, Flip-Flop, etc).
 Coarse grained: Each block contains a relatively large amount of logic

compared to the find-grained architecture( For example a logic block might
contain 4-input LUTs, four MUXs, four D flip-flops, and some fast carry logic
 Number of companies have recently started developing really coarse-grained

device architectures comprising arrays of nodes, where each node is a highly
complex processing element ranging from an algorithmic function, to a
complete general purpose microprocessor core. [ Are these devices classed as
FPGAs???]
 Medium grained: LUT-based FPGA are now often classed as medium-grained
to leave the coarse-grained application free to be applied to these new node-
based devices.
MUX-based Logic Blocks
• The device can be programmed such that each input to the block is
presented with a logic 0, a logic 1, or the true or inverse of a signal
coming from another block or from a primary input to the device.
LUT-based Logic Blocks

• A group of input signals is used as an index to a lookup table.
• The contents of this table are arranged such that the cells contains the
values for the different combinations of the input signals.
• The LUT is formed from SRAM (but it could be formed using antifuses,
EEPROM, or FLASH cells)

MUX-based vs. LUT-based
• When engineers handcrafted their circuits prior to the advent of
today’s sophisticated CAD tools, some folks say that it was possible
to achieve best results using MUX-based architecture.
• During the 1990s, FPGA were widely used in the

telecommunications and networking markets. Both areas involves
pushing lots of data around, in which case LUT-based architecture
hold the high ground.
• As design grew larger and synthesis technology increased in

sophistication, handcrafting circuits became a thing of the past.
 The end result is that the majority of today’s FPGA architectures

are LUT-based.
3-, 4-, 5-, or 6-input LUTs?

 Adding more inputs allows you to represent more complex
functions.
 Every time you add an input, you double the number of SRAM cells
 The LUT size affects the area (more input means more wires), and
the speed which affects the performance of FPGAs.
 In the past, some devices were created using a mixture of different

LUT sizes (e.g. 3-inputs and 4-inputs LUTs)
 Many studies in the past have been conducted to study the effect of
LUTs.
 All of the really successful architectures are currently based on the
use of 4-input LUTs.
LUT vs. distributed RAM vs. Shift Register (SR)
The internal SRAM cells inside the LUT can offer a number of
interesting possibilities:
1. The primary role as a lookup table (LUT).
2. Some vendors allow the cells to be used

as a small block of RAM (16 X 1 RAM).
This is referred to as distributed RAM.
3. All of the FPGA’s configurations cells

(including LUT) are effectively strung
together in along chain. Therefore, some
vendors allow the SRAM cells forming a
LUT to be treated independently of main
body of the chain and to be used in the
form of shift register (SR).
A Xilinx Logic Cell

• Each vender has its name for things.
• The core building block in a modern FPGA from Xilinx is called a logic
cell (LC).
• A logic cell (LC) contains:
1. a 4-input LUT (can also act as a 16X1

RAM or a 16-bit Shift register)
2. a Multiplexer
3. a register (acts as a flip-flop or as a latch)
4. Some special fast carry logic for use in
arithmetic
• The equivalent core building block in an FPGA from Altera is called a

logic element (LE).
• There are a number of differences between a Xilinx LC and an Altera
LE, but the overall concepts are very similar.
Slicing
• The next step up the hierarchy is
what Xilinx calls a slice.
• Altera and the other vendors have

their own equivalent names.
• A slice contains two logic cells:
- Each logic cell has its own data

inputs and outputs.
- The slice has one set of clock,
clock enable, and set/reset
signal common to both logic
cells.
CLB and LAB

• Moving one more level up the hierarchy, we come to the
configurable block.
• Xillinx calls such a block a configurable logic block (CLB), and Altera
refers to as a logic array block (LAB).
• Some Xilinx FPGAs have two slices, while the others have four in
each CLB.

Embedded RAMs
• A lot of applications require the use of memory.
• FPGAs now include relatively large chunks of embedded RAM called
e-RAM or block RAM.
• Depending on the architecture of the component, these blocks might
be:
1. Positioned around the
periphery of the device.
2. Scattered across the face the
chip in relative isolation.
3. Or organized in columns.
• Each block of RAM can be used

independently, or multiple blocks
can be combined together to
construct larger blocks.
Embedded Multipliers, Adders, MACs, etc.

• Some functions, like multipliers, are
inherently slow if they are
implemented by connecting a large
number of logic block together.
• Since these functions are required by a
lot of applications, many FPGAs
incorporate special hardwired
multiplier block typically located near
the e-RAM.
• Similarly, some FPGAs offer dedicated
adder blocks.
• One common operation in DSP
applications is called a multiply-and
accumulate (MAC). The function
multiplies two numbers and add the
result to a running total stored in an
accumulator.
• Some FPGAs provide entire MACs as
embedded functions.
Embedded Processor Cores (Hard and Soft)
• Any portion of an electronic design can be realized in:
– Hardware(using logic gates and registers, etc.)
or
– Software (an instruction to be executed on a microprocessor)
• One of the main partitioning criteria is how fast you wish the functions to
perform their task:
– Picosecond and nanosecond logic: [has to run insanely fast]Hardware
– Microsecond logic: [reasonably fast]either in hardware or in software
– Millisecond logic: [it is a pain slowing the hardware down to implement such
slow function (e.g. using huge counters to generate delays)] (Software).
• In the past, discrete microprocessors on the circuit board were used to

execute the software for the required functions.
• Some FPGAs have become available that contain one or more embedded
microprocessors [called microprocessors cores]
 Save the cost of having two devices.
 Eliminate large number of tracks, pads, and pins. (makes the board smaller)
Hard Microprocessor Cores

• A hard microprocessor core is implemented as a dedicated predefined block.
• Two approaches:
1. The first is to locate it in a strip to the side of the main FPGA fabric
2. The second is to embed one ore more cores directly into the main FPGA fabric.

Hard Microprocessor Cores (cont)
Soft Microprocessor Cores (Soft Cores)

• It is possible to configure a group of programmable logic
blocks to act as a microprocessor.
• Soft cores are simpler and slower than their hard-core

counterparts. [A soft core typically runs at 30% to 50 % of a hard core]
• You only need to implement a core if you need it and also you
can instantiate as many cores as you require until you run out
of resources (programmable logic blocks)

Clock Trees
• All of the synchronous elements inside an FPGA (e.g. flip-flops) need to be
driven by a clock signal.
• Such a clock signal is typically comes into the FPGA via a special input pin.
• It is routed through the device and connected to the appropriate registers.
• This structure is used to ensure that all of the flip-flops see their clock as
close together as possible.
• The clock tree is
implemented using special
tracks and is separate from
the interconnects.
• In reality, multiple clock pins
are available. (unused clock
pins can be employed as
general-purpose I/O pins,
and there are multiple clock
trees inside the device.
Clock Managers
• The input clock pin can be used to derive a special hard-wire function
(block) called a clock manager that generates a number of daughter clocks.
• The daughter clocks may be used to derive internal clock trees or external
output pins that can be used to provide clocking service to other devices.
• Each family of FPGAs has its own type of clock manager.

Programming (Configuring) an FPGA
• Each FPGA vendor has its own unique terminology and it own
technology and protocols for doing things.
• Moreover, the detailed mechanisms for programming FPGAs

can vary on a family-by-family basic.
• The end result of all techniques is a configuration file.
• Configuration file (bit file): contains the information

(configuration bitstream) that will be uploaded into the FPGA
in order to program it to perform a specific function.
Programming an FPGA (cont)

• In SRAM-Based FPGAs:
– we can visualize all of the SRAM configuration cells as comprising a single (long)
shift register.

JTAG Port
• JTAG : Joint Test Action Group.
• Like many other modern devices, today’s FPGA are equipped with a
JTAG port to overcome test and programming challenges.
• JTAG was originally designed to implement the boundary scan
technique for testing circuit boards and ICs.
• Boundary scan is also widely used as a debugging method to
watch integrated circuit pin states, measure voltage, or analyze sub-
blocks inside an integrated circuit.
• FPGA has a number of pins that are used as a JTAG port. One of
these pins is used to input JTAG data, and another is used to output
that data.
• Each of remaining I/O pins has an associated JTAG register (a flip-

flop), where these registers are daisy-chained together.
JTAG Port (cont.)

Digital Circuit Design 2 10636321: Dr. Ashraf Armoush

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Digital Circuit Design 2 10636321: Dr. Ashraf Armoush

Uploaded by

Copyright:

Available Formats

Digital Circuit Design 2

© 2023 Dr. Ashraf Armoush

Field Programmable Gate Array

© 2023 Dr. Ashraf Armoush

Programmable Logic Devices (PLDs)

© 2023 Dr. Ashraf Armoush , An-Najah National University 4

 Support extremely large and complex functions.

© 2023 Dr. Ashraf Armoush , An-Najah National University 5

The Gap between PLD and ASICs

© 2023 Dr. Ashraf Armoush , An-Najah National University 6

© 2023 Dr. Ashraf Armoush , An-Najah National University 7

– In-System Programmable (ISP): if a device is capable of being

– FPGAs can be programmed on a higher level with various Hardware

– The translation to gate level is done by tools automatically

© 2023 Dr. Ashraf Armoush , An-Najah National University 8

• In general, FPGAs applications can be categorized into 5 major segments:

© 2023 Dr. Ashraf Armoush , An-Najah National University 10

© 2023 Dr. Ashraf Armoush , An-Najah National University 11

Programmable (Configurable) Logic Block

• The logic blocks in modern FPGAs can be significantly more complex.

• Note: by means of its own SRAM, the interconnect can be programmed

© 2023 Dr. Ashraf Armoush , An-Najah National University 13

 Some of today’s SRAM-based FPGAs support the concept of bit-stream

© 2023 Dr. Ashraf Armoush , An-Najah National University 14

© 2023 Dr. Ashraf Armoush , An-Najah National University 15

EEPROM/ Flash Based Devices

Reprogrammable Yes (in system) NO Yes (in-system or offline)

Reprogramming speed Fast --- 3x slower than SRAM

External configuration file Yes No No

Good for Prototyping Yes (very good) No Yes (reasonable)

Instant-on No Yes Yes

IP-Security Acceptable Very Good Very Good

Size of configuration cell Large (6 transistors) Very small Medium-Small (2 trans.)

Power consumption Medium Low Medium

Rad Hard No Yes Not really

© 2023 Dr. Ashraf Armoush , An-Najah National University 17

Fine-, Medium-, Coarse-grained Architecture

 Coarse grained: Each block contains a relatively large amount of logic

 Number of companies have recently started developing really coarse-grained

© 2023 Dr. Ashraf Armoush , An-Najah National University 19

LUT-based Logic Blocks

© 2023 Dr. Ashraf Armoush , An-Najah National University 20

• During the 1990s, FPGA were widely used in the

• As design grew larger and synthesis technology increased in

 The end result is that the majority of today’s FPGA architectures

© 2023 Dr. Ashraf Armoush , An-Najah National University 21

3-, 4-, 5-, or 6-input LUTs?

 In the past, some devices were created using a mixture of different

1. The primary role as a lookup table (LUT).

2. Some vendors allow the cells to be used

3. All of the FPGA’s configurations cells

A Xilinx Logic Cell

1. a 4-input LUT (can also act as a 16X1

• The equivalent core building block in an FPGA from Altera is called a

• Altera and the other vendors have

• A slice contains two logic cells:

- Each logic cell has its own data

© 2023 Dr. Ashraf Armoush , An-Najah National University 25

CLB and LAB

© 2023 Dr. Ashraf Armoush , An-Najah National University 26

• Each block of RAM can be used

© 2023 Dr. Ashraf Armoush , An-Najah National University 27

Embedded Multipliers, Adders, MACs, etc.