Unit 3 and 4 PPts of RC2

UNIT 3
BY
Prof. K. R. Saraf
Qn. Compare the architectures and capabilities of
ASIC, PDSP, GPP, FPGA, and memory 10
Qn. Qn. Write a note on
1. RALU (6) page 41 R.C. Bobada
2. Limitations of current FPGAs (6)
3. Area consumption by different architectural building blocks in
reconfigurable device. (6)
4. Weak upper bound and interconnect (6) page no. 104 General purpose
computing by Andrew Dehon
5. Bus-based Communication (6) page no. 203 Bobada
6. Direct Communication (6) page no. 201 Bobada
7. Circuit Switching (6) page no. 203 Bobada
8. The Dynamic Network on Chip ( DyNoC) (6) page no. 219 Bobada
9. Comparison of DPGA with FPGA (6)
10. MATRIX (6) page no. 273 General purpose computing by Andrew Dehon
1. RALU (6) page 41 R.C. Bobada
• The Xputer’s concept was presented in early 1980s by Reiner Hartenstein, a
researcher at the University of Kaiserslautern in Germany.
• The goal of Xputer’s was to have a very high degree of programmable parallelism
in the hardware, at lowest possible level, to obtain performance not possible with
the Von Neumann computers.
• Instead of sequencing the instructions, the Xputer used to sequence data, thus
exploiting the regularity in the data dependencies of some class of applications
like in image processing, where a repetitive processing is performed on a large
amount of data.
• An Xputer consists of three main parts: the data sequencer, the data memory
and the reconfigurable ALU (rALU) that permits the run-time configuration of
communication at levels below instruction set level.
• Within a loop, data to be processed were accessed via a data structure called
scan window. Data manipulation was done by the rALU that had access to many
scan windows.
• The most essential part of the data sequencer was the generic address generator
(GAG) that was able to produce address sequences corresponding to the data of
up to three nested loops.
• An rALU subnet that could be configured to perform all computations on the data
of a scan window was required for each level of a nested loop.
4. Weak upper bound and interconnect page
no. 104 General purpose computing by Andrew Dehon
6. Direct Communication (6) page no. 201 Bobada
• Direct communication paradigm allows modules placed on the chip to
communicate using dedicated physical channels, configured at compile-time.
• The configuration of the channels remains until the next full reconfiguration of
the device.
• A configuration defines the set of physical lines to be used, their direction, their
bandwidth and speed as well as the terminal, i.e the components that are
connected by the lines.
• Components must be designed and placed on the device in such a way that their
ports can be connected to the predefined terminals.
• Feed through channels must also be available in each component to allow signal
used by modules aside the component to cross the components.
• Example:- The configuration of figure above provides an example of 1-D
communication using direct lines.
• For this purpose, a set of 10 predefine channels is fixed and the component must
adapt their position and direction to make use of the channels.
• The physical channels 1, 5, 6 and 7 are not used.
• Line 3 is used by module C2 for connection with the device pins on the left and
right sides.
• It is fed through component C1 that provides the necessary channel for the
signal to cross.
• Line 4 is used by the two components C1 and C2 for a direct communication.
• Additionally, the two components use the same line to access the device pins.
• Lines 8 and 10 are used by component C5 to access the device pins on the left
and right sides.
• They cross components C3 and C4.

• Line 9 is used to connect the components C3 and C4 and runs through
component C5.
• The main disadvantage of this approach is the restriction imposed on the design
of components.
• For each component, dedicated channels must be foreseen to allow signals that
are not used by this component in its placement location to cross.
• This increases the amount of resources needed to implement the component.
• Also, the placement algorithm must deal with additional restrictions like the
availability of signals in a given location.
• This increases its complexity and makes the approach only possible for an
offline temporal placement, where all the configurations can be defined and
implemented at compile-time.
5. Bus-based Communication (6) page no. 203 Bobada
• Communication over third party is used for example by Brebner in and Walder et
al.
• A central component exists that behaves as a message reflector.
• Each message is first sent to the central module, which forwards it to the
destination.
• Brebner in uses this approach to allow the communication between a

reconfigurable module connected to a processor through a bus and a user
program running on the host processor.
• The module inputs and outputs are controlled by registers that are mapped into
the address space of the processor.
• This approach can be used not only to allow the communication between a
reconfigurable module and a user program but also between several
reconfigurable modules connected together through a bus.
• Bus-based Communication
• The communication between the reconfigurable modules on a given device can
also be done using a common bus.
• To avoid the bus resources to be destroyed at run-time by components

dynamically placed on the device, predefined slots must be available, where the
modules can be placed at run-time, such that no alteration of the bus is possible.
• On the predefined slots, connection ports must be available to dynamically

attach the placed component to the bus.
• While the predefinition of locations where to place components allows a

simplicity of the placement algorithms, it is not flexible at all.
• Using a common bus reduces the amount of resources needed in the system
because only one medium is required for all component.
• However, the additional delay increased by the bus arbitration can drastically
affect the performance of the system.
• The approaches of Walder et al as well as that of Brebner are both based on

restricted bus in which no arbitration module is required to manage the bus
access.
7. Circuit Switching (6) page no. 203 Bobada
• Introduced in the 1980s under the name reconfigurable massively parallel
computers circuit switching is the art of dynamically establishing a connection
between two processing elements (PE) at run-time, using a set of physical lines
connected by switches.
• The system consists of a set of processing elements arranged in a mesh.
• Switches are available at the column and line intersection to allow a longer
connection using the vertical and horizontal lines at an intersection point.
• In this way, two arbitrary processing elements can be dynamically connected at

run-time by dynamically setting the switches on the path from the first processor
to the second one.
• Once the connection is established, the data are transmitted from the source PE
to the destination in just one clock.
• Circuit routing was experienced in several systems such as the YUPPIE in the
1980s.
• It is available today in many systems such as the PACTXPP device.
• Also the connection mechanism in FPGA follows the same paradigm.
• Although dedicated lines exist in some devices to allow connection between

remote modules on the device, longer connections are normally established
using the switch matrices to connect segmented lines in other to realize the long
communication path.
• Despite its application in fine-grained parallel image computing systems, where
it helped to dynamically change the topology of a parallel computer to
accommodate the best computation structure of an application under
computation, using the circuit switching in reconfigurable devices presents
some drawbacks:
• Long communication delay: This will happen, if the connection between two
processing elements has to go through many other processors.
• The number of switches on the communication path therefore increases, thus
increasing communication delay and therefore slowing down the clock.
• Dynamic computation of routes: In dynamically changing environment, the

communication need between components changes with the placement of
components. Computing new routes at run-time is very time-consuming, and therefore,
the performance of the overall computation may be drastically affected by the run-time
routing.
• Exclusive use of chip space: Because of the complexity of the synthesis algorithm,
• components that will be downloaded later onto the reconfigurable device are
implemented at compile-time.
• The synthesis process constrains a module on a given area of the device, where it
uses all the resources, i.e. the processing elements and the interconnects in that area.
• This has the consequence that the placement of a module at run-time in a region,
where a route is running, will destroy the route, because the connection used by the
route will be assigned to the component.
• To avoid this, we can place components only at locations where no route will destroy
them, a restriction that will however increase the chip fragmentation.
• Consider the placement of figure below with three routes to connect PE1 to PE2, PE3
to PE4 and PE5 to PE6.
• For a new component that needs to use four consecutive processing elements in a
quadratic surface, the region where the routes are implemented must be prohibited.
• No region that can accommodate the new component is therefore left on the device,
and the component must be rejected, although enough resources are available on the
device.
• Figure Drawback of circuit switching in

temporal placement Placing a component
using 4 PEs will not be possible, although
enough free resources are available
• The previous example has shown how circuit switching might increase the
device fragmentation in devices allowing a 2-D placement.
• In device allowing only a 1-D placement like it is the case of Xilinx VirtexII
FPGAs, which are only column wise reconfigurable, circuit switching can be
used to connect a few number of modules, usually 2–8, and allow dynamic
communication to be established between the components running onto the
device at run-time.
8. The Dynamic Network on Chip ( DyNoC) (6) page no.
219 Bobada
• A Dynamic Network on Chip (DyNoC) is a Network on Chip whose structure can
be dynamically changed at run-time. In a DyNoC, a routers is a programmable
element basically configured as router but that can be configured at run-time to
implement any function that can fit on it.
• This concept provides a mean to overcome the deficiency of network on chip as

communication paradigm in reconfigurable computing (figure below).
• Whenever a module is placed at a given location where it uses the resources of

several PEs, the redundant routers within the module boundary can be used as
additional resources for the module.
• The placed module only needs one router to access the network.
• Figure Implementation of a large reconfigurable module on a Network on Chip
• With this, the routers in the area of a placed component are no more accessible
by other components in the network.
• Upon completion, the module is removed and the network must be reactivated in
the area where the component where previously placed.
• This can be done quickly, because the router are programmable components that
can be quickly reset to their basic configuration, i.e. the routers.
• With this, we have a network in which some parts may be deactivated at a given
period of time and reactivated in the future.
• In order for such a network to efficiently operate, some prerequisites on the

communication infrastructure of the network must be fulfilled.
10. MATRIX (6) page no. 273 General purpose computing by Andrew Dehon
• MATRIX is a novel, general-purpose computing architecture which does not take
a pre-fabrication stand on the assignment of space, distribution, and control for
instructions.
• Rather, MATRIX allows the user or application to determine the actual

organization and deployment of resources as needed.
• Post-fabrication the user can allocate instruction stores, instruction distribution,

control elements, datapaths, data stores, dedicated and fixed data interconnect,
and the interaction between datastreams and instruction streams.
• MATRIX is designed to maintain flexibility in instruction control.
• Primary instruction distribution paths are not defined at fabrication time.

Instruction memories are not dedicated to datapath elements.
• Datapath widths are not fully predetermined.
• MATRIX neither binds control elements to datapaths nor predetermines elements

that can only serve as control elements.
• To provide this level of flexibility, MATRIX is based on a uniform array of primitive

elements and interconnect which can serve instruction, control, and data
functions.
• A single network is shared by both instruction and data distribution.
• A single integrated memory and computing element can serve as an instruction

store, data store, datapath element, or control element.
•
• MATRIX’s primitive resources are, therefore, deployable, in that the primitives
may be deployed on a per-application basis to serve the role of instruction
distribution, instruction control, and datapath elements as appropriate to the
application.
• This allows tasks to have just as much regularity, dynamic control, or dedicated
datapaths as needed.
• Datapaths can be composed efficiently from primitives since instructions are not
prededicated to datapath elements, but rather delivered through the uniform
interconnection network.
• The key to providing this flexibility is a multilevel configuration scheme which

allows the device to control the way it will deliver configuration information.
• To first order, MATRIX uses a two level configuration scheme.
• Traditional “instructions” direct the behavior of datapath and network elements

on a cycle-by-cycle basis.
• Metaconfiguration data configures the device behavior at a more primitive level
defining the architectural organization for a computation.
• Metaconfiguration data can be used to define the traditional architectural

characteristics, such as instruction distribution paths, control assignment, and
datapath width.
• The metaconfiguration “wires up” configuration elements which do not change

from cycle-to-cycle including “wiring” instruction sources for elements whose
configuration does change from cycle-to-cycle.
Qn. Write a note on
1. Various reconfigurable devices deployed yet. (6)
2. Wire delays and solutions.(6)
3. Switch requirement in reconfigurable device. (6) page no. 86 General purpose

4. Static interconnect (6) page no. 30 General purpose computing by Andrew Dehon
5. Interconnect hierarchy (6) page no. 161 General purpose computing by Andrew Dehon
6. Overhead in network design Or overheads in Design (6)
7. Bisection BW (6) page no. 81 General purpose computing by Andrew Dehon
8. Crossbars (6) page no. 81 General purpose computing by Andrew Dehon

3. Switch requirement in reconfigurable device. (6) page no.
86 General purpose computing by Andrew Dehon
• we briefly review the number of switches conventionally employed by networks
supporting 100 to 1000 4-LUTs.
• Brown and Rose suggest each 4-LUT in a moderate sized FPGA with 100’s of 4-
LUTs will require 200-400 switches.
• Agarwal and Lewis suggest approximately 100 switches per LUT for hierarchical
FPGAs with some reduction in logic utilization.
• Conventional, commercial FPGAs do little or no encoding on their interconnect

bit streams – that is, each interconnect switch is controlled by a single
configuration bit.
• Commercial devices also exhibit on the order of 200 switches per 4-LUT.
• The fact that conventional FPGAs can, with difficulty, route most all designs
using less than 80-90% of the device LUTs, suggests that they chose a number
of switches which provides reasonably “adequate” interconnect for the current
device sizes – hundreds to a couple of thousand 4-LUTs.
4. Static interconnect (6) page no. 30 General purpose computing by Andrew
Dehon
5. Interconnect hierarchy (6) page no. 161 General purpose computing by
Andrew Dehon
7. Bisection BW (6) page no. 81 General purpose computing by Andrew
Dehon
8. Crossbars (6) page no. 81 General purpose computing by Andrew Dehon
• Exact answer containing equations

Qn. Write a note on Multi context FPGA (6) page no. 63 General
purpose computing by Andrew Dehon
OR
Qn. What is single context and multi context FPGA?

Discuss these issues pertaining to present FPGA
architecture. (8)
• Like FPGAs, multicontext FPGAs are composed of a collection of programmable
gates embedded in a programmable interconnect.
• Unlike FPGAs, multicontext devices store several configurations for the logic
and the interconnect on the chip.
• The additional area for the extra contexts decreases functional density, but it
increases functional diversity by allowing each LUT element to perform several
different functions.
• Like FPGAs, these devices may suffer from limited interconnect or application
pipelining limits.
• The additional context memory makes them less susceptible to functionality

limits than traditional components.
Qn. “Rich Flexible interconnects of FPGA is good resource
as well as headache to designer”. Comment on this
statement. Give suitable example. 8
Qn. Compare various computing architectures w.r.t.
multiplier operation. How will you locate reconfigurable
device? Where? 8
Qn Write a note on Partial Reconfiguration 6 101 page no RC FPGA By
Andrew dehon
Qn. Write a note on Partial and full reconfiguration with

example 6
OR
Qn. What is parallel reconfigurability? Is it supported in any
present device? How do you decide that the task needs
partially or fully reconfigurable device? 8
page no. 170 by Bobda
• Partitions can be made in reconfigurable device.
• The result is a set of partitions that are used to the complete device.
• While the implementation of single partitions is easy, the amount of waste
resources in partitions can be very high.
• Recall that the waste resource of a component is the amount resources occupied
by that component multiplied by the time where the component is idle on the
device.
• Wasting resources on the chip can be avoided if any single component is placed
on the chip only when its computation is required and remains on the device
only for time it is active.
• With this, idle components can be replaced by new ones, ready to run at a given
point of time.
• Exchanging a single component on the chip means reconfiguring the chip only
on the location previously occupied by that component.
• This process is called partial reconfiguration in contrast to full reconfiguration
where the full device must be reconfigured, even for the replacement of a single
component.
• To be exploited, the partial reconfiguration must be technically supported by

the device, which is not the case for all available devices.
• While most of the existing devices support full reconfiguration, only few are
able to be partially reconfigured.
Qn. What is relationship of interconnect growth and
requirement of area on chip? Why do interconnects
consume dominant area on chip? Give example. 8
page no. 109 General purpose computing by Andrew Dehon

Qn. Give the growth equations for wire, channel and
associated hardware in reconfigurable device design. 8
Qn. Explain the concept of peak performance density in RP
space area model. 8 page no. 129 General purpose computing by
Andrew Dehon
• Using the model, we can examine the peak computational densities from
various architectural
• configurations in RP-space.
• Figure 1 plots computational density against datapath width, , and the number
of instructions per function group, c.
• As increases there is more sharing of instruction memories and less switches

required in the interconnect resulting in smaller bit processing element cell
sizes or higher densities.
• As increases, there are more instructions per compute element resulting in

lower densities.
• Figure 1: Peak Computational Density
Versus Contexts and Data path Width
• The effect of more instructions is more severe for smaller datapath widths,w ,
since there are less processing elements against which to amortize instruction
overhead.
• For single context designs, there is only a factor of 2.5 difference in density
between single bit granularity and 128-bit granularity.
• At this size, network effects dominate instruction effects, and the factor of
difference comes almost entirely from the difference in switching requirements.
• For heavily multicontext devices at the same number of instruction contexts, the
difference between fine and coarse granularity is greater since the instruction
memory area dominates (See also Figure 2).
• At 1024 contexts, the 128 bit datapath is 36 denser than an array with bit-level
granularity.
• Figure 2: Compute and Instruction Densities Versus Contexts and Datapath Width
• As the number of contexts, , increase, the device is supporting more loaded
instructions; that is, a larger on chip instruction diversity.
• Figure 2 shows how instruction density increases with increasing numbers of

contexts alongside the decrease in peak computational density.
• These same density trends hold if we set aside a fixed amount of data memory.
• The area outside of the data memory will follow the same density curves shown
here.
Qn. Give Rents Rule based hierarchical model for
interconnect. 8 page no. 87 General purpose computing by Andrew Dehon

Qn. What is network utilization efficiency? How to achieve
it? Give mathematical model for it. 8 page no. 94 General purpose

Qn. Give the mathematical model to compute area needed for
crossbar and memory. Assume wire wire pitch of 8λ. 8
Qn. State the relationship between efficiency and task path
length, architectural context. Give suitable example 8

Qn. What is meant by data bandwidth? What is its impact
in design of new architecture? 8
Qn. Why is the excess stress being given on optimization
of interconnect? What are challenges in it? 8
Qn. Draw and explain the architecture of DPGA. 8
OR
Qn. With the help of suitable block diagram, explore the
architecture of DPGA on detail. Which section is
responsible for bottleneck in throughput? List the merits
and limitations of DPGA along with the causes. 16
Dynamically Programmable Gate Arrays (DPGAs), fine-grained, multicontext devices
which are often more area efficient than FPGAs.
The DPGA is a multi context ( c>1), fine-grained ( w=1), computing device.
The DPGA exploits two facts:

1. The description of an operation is much smaller than the active area necessary to
perform the operation.
• 2. It is seldom necessary to evaluate every gate or bit computation in a design
simultaneously in order to achieve the desired task latency or throughput.
• Figure 1 below depicts the basic architecture for this DPGA. Each array element is a
conventional 4-input lookup table (4-LUT).
• Small collections of array elements, in this case 4 4 arrays, are grouped together into
sub-arrays.
• These sub-arrays are then tiled to compose the entire array.
• Crossbars between sub-arrays serve to route inter-sub-array connections.
• A single, 2-bit, global context identifier is distributed throughout the array to select the
configuration for use.
• Figure 1: Architecture and Composition of DPGA
• Additionally, programming lines are distributed to read and write configuration
memories.
• DRAM Memory The basic memory primitive is a 4 *32 bit DRAM array which
provides four context configurations for both the LUT and interconnection
network (See Figure 2).
• The memory cell is a standard three transistor DRAM cell.
• Notably, the context memory cells are built entirely out of N-well devices,
allowing the memory array to be packed densely, avoiding the large cost for N-
well to P-well separation.
• The active context data is read onto a row of standard, complementary CMOS
inverters which drive LUT programming and selection logic.
• Figure 2: DRAM Memory Primitive
• Array Element The array element is a 4-LUT which includes an optional flip-flop
on its output (Figure 3).
• Each array element contains a context memory array. For our prototype, this is
the 4*32 bit memory described above.
• 16 bits provide the LUT programming, 12 configure the four 8-input multiplexors
which select each input to the 4-LUT, and one selects the optional flip-flop.
• The remaining three memory bits are presently unused.

• Figure 3: Array Element
• Subarrays The subarray organizes the lowest level of the interconnect hierarchy.
• Each array element output is run vertically and horizontally across the entire
span of the subarray (Figure 4).
• Each array element can, in turn, select as an input the output of any array
element in its subarray which shares the same row or column.
• This topology allows a reasonably high degree of local connectivity.
• This leaf topology is limited to moderately small subarrays since it ultimately

does not scale.
• The row and column widths remains fixed regardless of array size so the
horizontal and vertical interconnect would eventually saturate the row and
column channel capacity if the topology were scaled up.
• Figure 4: Subarray Local Interconnect
• Additionally, the delay on the local interconnect increases with each additional
element in a row or column.
• For small subarrays, there is adequate channel capacity to route all outputs
across a row and column without increasing array element size, so the topology
is feasible and
desirable.
• Further, the additional delay for the few elements in the row or column of a small
subarray is moderately small compared to the fixed delays in the array element
and routing network.
• In general, the subarray size should be carefully chosen with these properties in
mind.
• Non-Local Interconnect In addition to the local outputs which run across each row
and column, a number of non-local lines are also allocated to each row and column.
• The non-local lines are driven by the global interconnect (Figure 4).
• Each LUT can then pick inputs from among the lines which cross its array element.
• In the prototype, each row and column supports four non-local lines.
• Each array element could thus pick its inputs from eight global lines, six row and
column neighbor outputs, and its own output.
• Each input is configured with an 8:1 selector as noted above (Figure 3).
• Of course, not all combinations of 15 inputs taken 4 at a time are available with this
scheme.
• The inputs are arranged so any combination of local signals can be selected along with
many subsets of global signals.
• Freedom available at the crossbar in assigning global lines to tracks reduces the
impact of this restriction, but complicates placement.
• Local Decode Row select lines for the context memories are decoded and buffered
locally from the 2-bit context identifier.
• A single decoder services each row of array elements in a subarray.
• One decoder also services the crossbar memories for four of the adjacent crossbars.
• In our prototype, this placed five decoders in each subarray, each servicing four array
element or crossbar memory blocks for a total of 128 memory columns.
• Each local decoder also contains circuitry to refresh the DRAM memory on contexts
which are not being actively read or written.
Qn. Explore the architectural building blocks of iDPGA and DPGA. What is difference
between them? 16
OR
Qn. With the help of detail block diagram. Explain the architecture of iDPGA. List its
features, merits and limitations. 16
Exact answer containing equations

Qn. Draw the detail block diagram of TSFPGA. Explain
each block in detail. Discuss merits and demerits. 16
Qn. What are the concepts behind time switched FPGA

and dynamically programmable gate array? Explore the
architecture of any one of them in detail. 16
Qn. How is task switching innovative in TSFPGA? Explain

the architectural blocks of TSFPGA in detail. What are its
limitations? 16
• Time-Switched Field Programmable Gate Array (TSFPGA), a multicontext device
designed explicitly around the idea of time-switching the internal interconnect in
order to implement more effective connectivity with less physical interconnect.
All Questions in Unit 3 and Unit 4
• Compare the architectures and capabilities of ASIC, PDSP, GPP, FPGA, and
memory 10
• RALU 6
• Multicontext FPGA. 6
• What is single context and multi context FPGA? Discuss these issues pertaining to
present FPGA architecture. 8
• “Rich Flexible interconnects of FPGA is good resource as well as headache to
designer”. Comment on this statement. Give suitable example. 8
• Compare various computing architectures w.r.t. multiplier operation. How will
you locate reconfigurable device? Where? 8
• Limitations of current FPGAs 6
• Partial Reconfiguration 6
• Partial and full reconfiguration with example 6
• What is parallel reconfigurability? Is it supported in any present device? How do
you decide that the task needs partially or fully reconfigurable device? 8
• What is relationship of interconnect growth and requirement of area on chip?
Why do interconnects consume dominant area on chip? Give example. 8
• Give the growth equations for wire, channel and associated hardware in
reconfigurable device design. 8
• Various reconfigurable devices deployed yet. 6
• Wire delays and solutions. 6
• Switch requirement in reconfigurable device. 6
• Static interconnect 6
• Interconnect hierarchy 6
• Give Rents Rule based hierarchical model for interconnect. 8
• Explain the concept of peak performance density in RP space area model. 8
• What is network utilization efficiency? How to achieve it? Give mathematical
model for it. 8
• Overhead in network design 6
• Bisection BW 6
• Crossbars 6
• Area consumption by different architectural building blocks in reconfigurable
device. 6
• Weal Upper Bound 6
• Bus-based Communication 6
• Direct Communication 6
• Circuit Switching 6
• The Dynamic Network on Chip ( DyNoC) 6
• Comparison of DPGA with FPGA 6
• Give the mathematical model to compute area needed for crossbar and memory.
Assume wire wire pitch of 8λ. 8
• Weak upper bound and interconnect 6
• Overheads in design 6
• State the relationship between efficiency and task path length, architectural
context. Give suitable example 8
• What is meant by data bandwidth? What is its impact in design of new
architecture? 8
• Why is the excess stress being given on optimization of interconnect? What are
challenges in it? 8
• Draw and explain the architecture of DPGA. 8
• With the help of suitable block diagram, explore the architecture of DPGA on
detail. Which section is responsible for bottleneck in throughput? List the merits
and limitations of DPGA along with the causes. 16
• Explore the architectural building blocks of iDFPGA and DPGA. What is difference
between them? 16
• With the help of detail block diagram. Explain the architecture of iDPGA. List its
features, merits and limitations. 16
• RP space area model. 6
• MATRIX 6
• Draw the detail block diagram of TSFPGA. Explain each block in detail. Discuss
merits and demerits. 16
• What are the concepts behind time switched FPGA and dynamically
programmable gate array? Explore the architecture of any one of them in detail.
16
• How is task switching innovative in TSFPGA? Explain the architectural blocks of
TSFPGA in detail. What are its limitations? 16

Unit 3 and 4 PPts of RC2

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 3 and 4 PPts of RC2

Uploaded by

Copyright:

Available Formats

UNIT 3

• Line 4 is used by the two components C1 and C2 for a direct communication.

• They cross components C3 and C4.

• This increases the amount of resources needed to implement the component.

• A central component exists that behaves as a message reflector.

• Brebner in uses this approach to allow the communication between a

• To avoid the bus resources to be destroyed at run-time by components

• On the predefined slots, connection ports must be available to dynamically

• While the predefinition of locations where to place components allows a

• The approaches of Walder et al as well as that of Brebner are both based on

• The system consists of a set of processing elements arranged in a mesh.

• In this way, two arbitrary processing elements can be dynamically connected at

• It is available today in many systems such as the PACTXPP device.

• Also the connection mechanism in FPGA follows the same paradigm.

• Although dedicated lines exist in some devices to allow connection between

• Dynamic computation of routes: In dynamically changing environment, the

• Figure Drawback of circuit switching in

• This concept provides a mean to overcome the deficiency of network on chip as

• Whenever a module is placed at a given location where it uses the resources of

• In order for such a network to efficiently operate, some prerequisites on the

• Rather, MATRIX allows the user or application to determine the actual

• Post-fabrication the user can allocate instruction stores, instruction distribution,

• MATRIX is designed to maintain flexibility in instruction control.

• Primary instruction distribution paths are not defined at fabrication time.

• MATRIX neither binds control elements to datapaths nor predetermines elements

• To provide this level of flexibility, MATRIX is based on a uniform array of primitive

• A single network is shared by both instruction and data distribution.

• A single integrated memory and computing element can serve as an instruction

• The key to providing this flexibility is a multilevel configuration scheme which

• To first order, MATRIX uses a two level configuration scheme.

• Traditional “instructions” direct the behavior of datapath and network elements

• Metaconfiguration data can be used to define the traditional architectural

• The metaconfiguration “wires up” configuration elements which do not change

2. Wire delays and solutions.(6)

3. Switch requirement in reconfigurable device. (6) page no. 86 General purpose

7. Bisection BW (6) page no. 81 General purpose computing by Andrew Dehon

8. Crossbars (6) page no. 81 General purpose computing by Andrew Dehon

• Conventional, commercial FPGAs do little or no encoding on their interconnect

• Exact answer containing equations

Qn. What is single context and multi context FPGA?

• The additional context memory makes them less susceptible to functionality

Qn. Write a note on Partial and full reconfiguration with

• Partitions can be made in reconfigurable device.

• To be exploited, the partial reconfiguration must be technically supported by

• Exact answer containing equations

• As increases there is more sharing of instruction memories and less switches

• As increases, there are more instructions per compute element resulting in

• Figure 2 shows how instruction density increases with increasing numbers of

• Exact answer containing equations

• Exact answer containing equations

• Exact answer containing equations

The DPGA is a multi context ( c>1), fine-grained ( w=1), computing device.

The DPGA exploits two facts:

• These sub-arrays are then tiled to compose the entire array.

• Crossbars between sub-arrays serve to route inter-sub-array connections.

• The memory cell is a standard three transistor DRAM cell.

• The remaining three memory bits are presently unused.

• This topology allows a reasonably high degree of local connectivity.

• This leaf topology is limited to moderately small subarrays since it ultimately

• A single decoder services each row of array elements in a subarray.

Exact answer containing equations

Qn. What are the concepts behind time switched FPGA

Qn. How is task switching innovative in TSFPGA? Explain