Vlsi Unit Iv

SUBSYSTEM DESIGN
PROCESSES AND
ILLUSTRATION
INTRODUCTION
• Objectives:
– Design consideration, problem and solution
– Design processes
• Basic digital processor structure
• Datapath
• Bus Architecture
– Design 4 – bit shifter
– Design of ALU subsystem
– Adders
– Multipliers
UNIT – V SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

GENERAL CONSIDERATIONS
 Lower unit cost

 Higher reliability
 Lower power dissipation, lower weight and
lower volume
 Better performance
 Enhanced repeatability
 Possibility of reduced design/development
periods
UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

SOME PROBLEMS
1. How to design complex systems in a

reasonable time & with reasonable effort.
2. The nature of architectures best suited to take
full advantage of VLSI and the technology.
3. The testability of large/complex systems
once implemented on silicon.

SOME SOLUTIONS
• Problem 1 & 3 are greatly reduced if two

aspects of standard practices are accepted:
1. a) Top-down design approach with adequate CAD tools to
do the job
b) Partitioning the system sensibly
c) Aiming for simple interconnections
d) High regularity within subsystem
e) Generate and then verify each section of the design.
2. Devote significant portion of total chip area to test and
diagnostic facility
3. Select architectures that allow design objectives and high
regularity in realization

ILLUSTRATION OF DESIGN PROCESSES
• Structured design begins with the concept of hierarchy
• It is possible to divide any complex function into less
complex subfunctions that is up to leaf cells
• Process is known as top-down design
• As a systems complexity increases, its organization changes
as different factors become relevant to its creation
• Coupling can be used as a measure of how much submodels
interact
• It is crucial that components interacting with high frequency
be physically proximate, since one may pay severe penalties
for long, high-bandwidth interconnects
• Concurrency should be exploited – it is desirable that all
gates on the chip do useful work most of the time
• Because technology changes so fast, the adaptation to a new
process must occur in a short time.

Approaches used at Different Stages
 Conventional circuit symbols

 Logic symbols
 Stick diagram
 Any mixture of logic symbols and stick
diagram that is convenient at a stage
 Mask layouts
 Architectural block diagrams and floor plans

General Arrangement of 4-bit
Arithmetic Processor
Figure 6.1: Basic digital processor structure

Figure 6.2: Communication strategy for the datapath

Figure 6.3: Subunits and basic interconnection for datapath

Figure 6.4: One bus architecture

Sequence:
1. 1st operand from registers to ALU. Operand is stored there.
2. 2nd operand from register to ALU and added.
3. Result is passed through shifter and stored in the register
Figure 6.5: Two bus architecture

Sequence:
1. Two operands (A & B) are sent from register(s) to ALU & are operated upon, result (S) in ALU.
2. Result is passed through the shifter & stored in registers.

Figure 6.6: Three bus architecture

Sequence:
Two operands (A & B) are sent from registers, operated upon, and shifted result (S) returned to
another register, all in same clock period.

Figure 6.7: Tentative floor plan for 4 – bit datapath

Points to be noted for design:
Metal can cross poly or diffusion
Poly crossing diffusion form a transistor
Whenever lines touch on the same level an interconnection is formed
Simple contacts can be used to join diffusion or poly to metal
Buried contacts or a butting contacts can be used to join diffusion
and poly
Some processes use 2nd metal
1st and 2nd metal layers may be joined using a via
Each layer has particular electrical properties which must be taken
into account
For CMOS layouts, p-and n-diffusion wires must not directly join
each other
Nor may they cross either a p-well or an n-well boundary
Design of a 4-bit Shifter
• Any general purpose n-bit shifter should be able to
shift incoming data by up to (n – 1) place in a right-
shift or left-shift direction.
• Further specifying that all shifts should be on an
end-around basis, so that any bit shifted out at one
end of a data word will be shifted in at the other end
of the word, then the problem of right shift or left
shift is greatly eased.
• The shifter must have:
• input from a four line parallel data bus
• four output lines for the shifted data
• means of transferring input data to output lines with any
shift from 0 to 3 bits

Figure 6.8: 4 X 4 crossbar switch using MOS

Figure 6.9: 4 X 4 barrel shifter

Summary of Design Processes
• Set out the specifications
• Partition the architecture into subsystems
• Set a tentative floor plan
• Determine the interconnects
• Choose layers for the bus & control lines
• Conceive a regular architecture
• Develop stick diagram
• Produce mask layouts for standard cell
• Cascade & replicate standard cells as required to complete
the design

COMPUTATIONAL ELEMENTS
Design of an ALU Subsystem
Figure 6.10: 4-bit data path for processor

• Design of 4-bit adder:
From the table one

form of the equation is:
Sum
Sk = HkCk-l’ + Hk’Ck-1
New carry
Ck = AkBk + HkCk-1
Where Half sum

Hk = Ak’Bk + AkBk’

• Adder element requirement:
Table reveals that the adder requirement may be stated as:

If Ak = Bk then Sk = Ck-1
Else Sk = Ck-l’
And for the carry Ck

If Ak = Bk then Ck = Ak = Bk
Else Ck = Ck-l
Thus the standard adder element for 1-bit is as shown in the figure 6.11
Figure 6.11: Adder element

Figure 6.12: Multiplexer based adder

Figure 6.13: CMOS based adder

• Standard cells required for adder:
Figure 6.14: Multiplexer cell with or without cut

Figure 6.15: NMOS (butting contact) inverters

Figure 6.16: NMOS (buried contact) inverters

Figure 6.17: CMOS inverter design

• Adder element bounding box:
Figure 6.18: Approximate bounding box and floor plan for CMOS adder element

• Adder element bounding box:
Figure 6.19: 4-bit adder element

• Implementing ALU functions with an adder:
The adder equations are:
Sum Sk = HkCk-l’ + Hk’Ck-1
New carry Ck = AkBk + Hk Ck-1
Half sum Hk = Ak’Bk + Ak Bk’
Let us consider the sum output, if the previous carry is at logical 0, then
Sk = Hk. 1 + Hk’. 0
Sk = Hk = Ak’Bk + Ak Bk’ – An Ex-or operation
Now, if Ck-1 is logically 1, then
Sk = Hk. 0 + Hk’. 1
Sk = Hk’ – An Ex-Nor operation
Next, consider the carry output of each element, first Ck-1 is held at logical 0, then
Ck = AkBk + Hk . 0
Ck = AkBk - An And operation
Now if Ck-1 is at logical 1, then
Ck = AkBk + Hk . 1
On solving Ck = Ak + Bk - An Or operation
• Implementing ALU functions with an adder:
Figure 6.20: 1-bit adder element and 4-bit ALU

Further Consideration of Adder
Generation:
• This principle of generation allows the system to take
advantage of the occurrences “Ak=Bk”.
Propagation:
• If we are able to localize a chain of bits Ak Ak+1... Ak+p and Bk
Bk+1... Bk+p for which Ak not equal to Bk for k in [k, k+p], then
the output carry bit of this chain will be equal to the input
carry bit of the chain.
• These remarks constitute the principle of generation and
propagation used to speed the addition of two numbers.
• All adders which use this principle calculate in a first stage.
Pk = Ak XOR Bk
Gk = Ak Bk

Figure 6.21: CMOS adder element and using pass/generate concept

• The Manchester Carry Chain:
• If the carry path is precharged to
VDD, the transmission gate is then
reduced to a simple NMOS
transistor.
• In the same way the PMOS
transistors of the carry generation
is removed.
• The Manchester cell is very fast,
but a large set of such cascaded
cells would be slow due to the
distributed RC effect and the
body effect making the
Figure 6.22: Manchester carry-chain element propagation time grow with the
square of the number of cells.
• The Manchester Carry Chain:
Figure 6.23: Cascaded Manchester carry-chain elements with buffering

• Adder Enhancement Techniques:
– Carry select adders:
Figure 6.24: Carry select adder structure

Figure 6.25: Carry select adder structure (6-bit)

Optimization of the carry select adder:

• Computational time
T = k1n
k1 – delay through one adder cell
• Dividing the adder into blocks with 2 parallel paths
T = k1n/2 + k2
k2 – time needed by multiplexer of next block to select actual output carry
• For a n-bit adder of M-blocks and each block contains P adder cells in
series so that
T = Pk1 + (M – 1) k2 ;
n = M.P minimum value for T is when M= (k1n / k2 )1/2

– Carry skip adders:
Figure 6.26: Carry skip adder structure

Figure 6.27: Carry skip adder structure

Figure 6.28: Carry skip adder structure (24-bit)

Optimization of the carry skip adder:
• Let us formalize that the total adder is made of N adder cells. It contains M
blocks of P adder cells. The total of adder cells is then
N = M.P
• The time T needed by the carry signal to propagate through P adder cells is
T = k1.P
• The time T' needed by the carry signal to skip through M adder blocks is
T‘ = k2.M
• The problem to solve is to minimize the worst case delay which is:
Tworst = 2(P – 1).k1 + (M – 2)
where P = n/M
• T is minimum when M = (2n.k1/k2)1/2
Figure 6.29: Worst case carry propagation carry skip adder

Figure 6.30: Block propagation carry skip adder

– Carry look-ahead (CLA) adders:
Figure 6.31: Carry look-ahead adder structure

Figure 6.32: Carry look-ahead and ripple through compromise

Figure 6.33: 4-bit Carry look-ahead adder unit

Figure 6.34: 16-bit, 4X4 block Carry look-ahead adder unit

Figure 6.35: Generation of carry out (from 4-bits and carry in)

Figure 6.36: Four-cell Manchester carry-chain

Introduction to CPLDs and FPGAs
CPLD Families
CPLD Block Diagram
1
0
0
FF
0 1
An individual switch
In a crossbar is a
diamond switch
O/Ps
Programmable switch
Function block (~ PLA w/ 1 o/p
for interconnecting
I/Ps that can be FF’ed)
various FBs
Crossbar Switch
CPLD Function Block
Extra function (e.g., g,

h) i/ps for OR term
2:1 Mux
Example function
f= ab+bc’+g+h
D-FF
PLA-like AND array

Literal inputs (e.g., a, b, c)
Field Programmable Gate Arrays (FPGAs)
FPGA Types
(Anti-fuse technology)
FPGA Families
SRAM-type FPGA Interconnect Architecture
Diamond
switch
Horizontal
routing
(interconnect)
channel
PSM: Programmable Switch Matrix (for
making connections between interconnects
of different channels). The structure shown
only allows i-to-i connections
Vertical
routing
channels CLB: Configuration Logic Block
(programmable logic cell)
SRAM-type FPGA Interconnect
Architecture (contd)
Cell Connection
Matrix (CCM)
PSM
Configuration Logic Block (CLB)
• 5-i/p function implemented using G, F and H LUTs (Look Up Tables) using Shannon’s
Expansion: p(a,b,c,d,e) = a p(1, b, c, d, e) + a’ p(0, b, c, d, e) = a q(b,c,d,e) + a’r(b,c,d,e).
q( ) impl. using LUT G, r impl. using LUT F and p=ag + a’h impl. using LUT H
• The LUT o/ps can go through a FF (for seq. ckt design) or bypass it for a combinational o/p
• This is called technology mapping: mapping the logic to CLB logic components
Technology Mapping
Programming a CLB (contd)
Components of Modern FPGAs
Digital System: Implementation Spectrum
Microprocessor Reconfigurable ASIC

Hardware
Software Firmware Hardware
–ASIC gives high performance at cost of

inflexibility.
–Processor is very flexible but not tuned to the
application.
–Reconfigurable hardware is a nice
compromise.
Simplified FPGA Logic Element
Inputs Look-Up Out

Table
(LUT)
State
Clock
Enable
High-level Compilers & FPGAs
–Difficult to estimate hardware resources.
–Some parts of program more appropriate for
processor (hardware/software codesign).
–Compiler must parallelize computation
across many resources.
–Engineers like to write in C/VHDL/Verilog
rather than pushing little blocks around.
for (i = 0; i<n, i++)
{
c[i] = a[i] + b[i]
}
Some success stories
Translating a Design to an FPGA
RTL Circuit Array
. A
. + C
B
C = A+B
.
–CAD to translate circuit from text description to

physical implementation well understood.
–Most current FPGA designers use register-
transfer level specification (allocation and
scheduling)
–Same basic steps as ASIC design.
Circuit Compilation & Implementation:
Basic Steps
1. Technology Mapping
LUT
4. Convert all implementation

2. Placement
LUT “details” to FPGA programming
info (configuration bits): LUT
RAM bits, CCM & PSM
? FF/SRAM bits, etc.
• Can store config bits on disk or ROM and
Assign a logical LUT to a physical load into FPGA as needed
location. • Can thus use the FPGA to implement
multiple digital systems (at different times
or sometimes simultaneously in different
3. Routing FPGA partitions)
Select wire segments

and switches for
Interconnection.
Technology Mapping: A Simple Example
Made of Full Adders
A B A+B = D
Co FA Ci
S
Logic synthesis tool reduces circuit to
SOP form
S = ABCi + ABCi + ABCi + ABCi
A A
B LUT Co B LUT S
Ci Ci
Co = ABCi + ABCi + ABCi + ABCi

Processor + FPGA
Three possibilities
daughtercard
Proc FPGA
chip
Backplane bus
(e.g. PCI)
1. FPGA serves as coprocessor for data
intensive applications – possible project.
Proc FPGA chip
2. FPGA serves as embedded digital system

for lower latency processing. “Reconfigurable Functional Unit”

Vlsi Unit Iv

Uploaded by

Copyright:

Available Formats

You might also like

Vlsi Unit Iv

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Vlsi Unit Iv

Uploaded by

Copyright:

Available Formats

SUBSYSTEM DESIGN

UNIT – V SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

 Lower unit cost

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

1. How to design complex systems in a

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

• Problem 1 & 3 are greatly reduced if two

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

 Conventional circuit symbols

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

Figure 6.1: Basic digital processor structure

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

Figure 6.2: Communication strategy for the datapath

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

Figure 6.3: Subunits and basic interconnection for datapath

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

Figure 6.4: One bus architecture

Figure 6.5: Two bus architecture

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

Figure 6.6: Three bus architecture

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

Figure 6.7: Tentative floor plan for 4 – bit datapath

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

Figure 6.8: 4 X 4 crossbar switch using MOS

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

Figure 6.9: 4 X 4 barrel shifter

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

Figure 6.10: 4-bit data path for processor

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

From the table one

Where Half sum

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

Table reveals that the adder requirement may be stated as:

And for the carry Ck

Figure 6.11: Adder element

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

Figure 6.12: Multiplexer based adder

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

Figure 6.13: CMOS based adder

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

Figure 6.14: Multiplexer cell with or without cut

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

Figure 6.15: NMOS (butting contact) inverters

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

Figure 6.16: NMOS (buried contact) inverters

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

Figure 6.17: CMOS inverter design

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

Figure 6.19: 4-bit adder element

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

Figure 6.20: 1-bit adder element and 4-bit ALU

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION

Figure 6.21: CMOS adder element and using pass/generate concept