Vlsi Unit Iv

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 72

SUBSYSTEM DESIGN

PROCESSES AND
ILLUSTRATION
INTRODUCTION

• Objectives:
– Design consideration, problem and solution
– Design processes
• Basic digital processor structure
• Datapath
• Bus Architecture
– Design 4 – bit shifter
– Design of ALU subsystem
– Adders
– Multipliers

UNIT – V SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


GENERAL CONSIDERATIONS

 Lower unit cost


 Higher reliability
 Lower power dissipation, lower weight and
lower volume
 Better performance
 Enhanced repeatability
 Possibility of reduced design/development
periods

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


SOME PROBLEMS

1. How to design complex systems in a


reasonable time & with reasonable effort.
2. The nature of architectures best suited to take
full advantage of VLSI and the technology.
3. The testability of large/complex systems
once implemented on silicon.

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


SOME SOLUTIONS

• Problem 1 & 3 are greatly reduced if two


aspects of standard practices are accepted:
1. a) Top-down design approach with adequate CAD tools to
do the job
b) Partitioning the system sensibly
c) Aiming for simple interconnections
d) High regularity within subsystem
e) Generate and then verify each section of the design.
2. Devote significant portion of total chip area to test and
diagnostic facility
3. Select architectures that allow design objectives and high
regularity in realization

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


ILLUSTRATION OF DESIGN PROCESSES
• Structured design begins with the concept of hierarchy
• It is possible to divide any complex function into less
complex subfunctions that is up to leaf cells
• Process is known as top-down design
• As a systems complexity increases, its organization changes
as different factors become relevant to its creation
• Coupling can be used as a measure of how much submodels
interact
• It is crucial that components interacting with high frequency
be physically proximate, since one may pay severe penalties
for long, high-bandwidth interconnects
• Concurrency should be exploited – it is desirable that all
gates on the chip do useful work most of the time
• Because technology changes so fast, the adaptation to a new
process must occur in a short time.

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


ILLUSTRATION OF DESIGN PROCESSES
Approaches used at Different Stages

 Conventional circuit symbols


 Logic symbols
 Stick diagram
 Any mixture of logic symbols and stick
diagram that is convenient at a stage
 Mask layouts
 Architectural block diagrams and floor plans

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


ILLUSTRATION OF DESIGN PROCESSES
General Arrangement of 4-bit
Arithmetic Processor

Figure 6.1: Basic digital processor structure

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


ILLUSTRATION OF DESIGN PROCESSES
General Arrangement of 4-bit
Arithmetic Processor

Figure 6.2: Communication strategy for the datapath

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


ILLUSTRATION OF DESIGN PROCESSES
General Arrangement of 4-bit
Arithmetic Processor

Figure 6.3: Subunits and basic interconnection for datapath

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


ILLUSTRATION OF DESIGN PROCESSES
General Arrangement of 4-bit
Arithmetic Processor

Figure 6.4: One bus architecture


Sequence:
1. 1st operand from registers to ALU. Operand is stored there.
2. 2nd operand from register to ALU and added.
3. Result is passed through shifter and stored in the register
UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
ILLUSTRATION OF DESIGN PROCESSES
General Arrangement of 4-bit
Arithmetic Processor

Figure 6.5: Two bus architecture


Sequence:
1. Two operands (A & B) are sent from register(s) to ALU & are operated upon, result (S) in ALU.
2. Result is passed through the shifter & stored in registers.

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


ILLUSTRATION OF DESIGN PROCESSES
General Arrangement of 4-bit
Arithmetic Processor

Figure 6.6: Three bus architecture


Sequence:
Two operands (A & B) are sent from registers, operated upon, and shifted result (S) returned to
another register, all in same clock period.

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


ILLUSTRATION OF DESIGN PROCESSES
General Arrangement of 4-bit
Arithmetic Processor

Figure 6.7: Tentative floor plan for 4 – bit datapath

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


ILLUSTRATION OF DESIGN PROCESSES
General Arrangement of 4-bit
Arithmetic Processor
Points to be noted for design:
Metal can cross poly or diffusion
Poly crossing diffusion form a transistor
Whenever lines touch on the same level an interconnection is formed
Simple contacts can be used to join diffusion or poly to metal
Buried contacts or a butting contacts can be used to join diffusion
and poly
Some processes use 2nd metal
1st and 2nd metal layers may be joined using a via
Each layer has particular electrical properties which must be taken
into account
For CMOS layouts, p-and n-diffusion wires must not directly join
each other
Nor may they cross either a p-well or an n-well boundary
UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
ILLUSTRATION OF DESIGN PROCESSES
Design of a 4-bit Shifter
• Any general purpose n-bit shifter should be able to
shift incoming data by up to (n – 1) place in a right-
shift or left-shift direction.
• Further specifying that all shifts should be on an
end-around basis, so that any bit shifted out at one
end of a data word will be shifted in at the other end
of the word, then the problem of right shift or left
shift is greatly eased.
• The shifter must have:
• input from a four line parallel data bus
• four output lines for the shifted data
• means of transferring input data to output lines with any
shift from 0 to 3 bits

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


ILLUSTRATION OF DESIGN PROCESSES
Design of a 4-bit Shifter

Figure 6.8: 4 X 4 crossbar switch using MOS

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


ILLUSTRATION OF DESIGN PROCESSES
Design of a 4-bit Shifter

Figure 6.9: 4 X 4 barrel shifter

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


ILLUSTRATION OF DESIGN PROCESSES
Summary of Design Processes
• Set out the specifications
• Partition the architecture into subsystems
• Set a tentative floor plan
• Determine the interconnects
• Choose layers for the bus & control lines
• Conceive a regular architecture
• Develop stick diagram
• Produce mask layouts for standard cell
• Cascade & replicate standard cells as required to complete
the design

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Design of an ALU Subsystem

Figure 6.10: 4-bit data path for processor

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Design of an ALU Subsystem
• Design of 4-bit adder:

From the table one


form of the equation is:
Sum
Sk = HkCk-l’ + Hk’Ck-1

New carry
Ck = AkBk + HkCk-1

Where Half sum


Hk = Ak’Bk + AkBk’

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Design of an ALU Subsystem
• Adder element requirement:

Table reveals that the adder requirement may be stated as:


If Ak = Bk then Sk = Ck-1
Else Sk = Ck-l’

And for the carry Ck


If Ak = Bk then Ck = Ak = Bk
Else Ck = Ck-l
Thus the standard adder element for 1-bit is as shown in the figure 6.11

Figure 6.11: Adder element

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Design of an ALU Subsystem
• Adder element requirement:

Figure 6.12: Multiplexer based adder

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Design of an ALU Subsystem
• Adder element requirement:

Figure 6.13: CMOS based adder

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Design of an ALU Subsystem
• Standard cells required for adder:

Figure 6.14: Multiplexer cell with or without cut

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Design of an ALU Subsystem
• Standard cells required for adder:

Figure 6.15: NMOS (butting contact) inverters

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Design of an ALU Subsystem
• Standard cells required for adder:

Figure 6.16: NMOS (buried contact) inverters

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Design of an ALU Subsystem
• Standard cells required for adder:

Figure 6.17: CMOS inverter design

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Design of an ALU Subsystem
• Adder element bounding box:

Figure 6.18: Approximate bounding box and floor plan for CMOS adder element

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Design of an ALU Subsystem
• Adder element bounding box:

Figure 6.19: 4-bit adder element

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Design of an ALU Subsystem
• Implementing ALU functions with an adder:
The adder equations are:
Sum Sk = HkCk-l’ + Hk’Ck-1
New carry Ck = AkBk + Hk Ck-1
Half sum Hk = Ak’Bk + Ak Bk’
Let us consider the sum output, if the previous carry is at logical 0, then
Sk = Hk. 1 + Hk’. 0
Sk = Hk = Ak’Bk + Ak Bk’ – An Ex-or operation
Now, if Ck-1 is logically 1, then
Sk = Hk. 0 + Hk’. 1
Sk = Hk’ – An Ex-Nor operation
Next, consider the carry output of each element, first Ck-1 is held at logical 0, then
Ck = AkBk + Hk . 0
Ck = AkBk - An And operation
Now if Ck-1 is at logical 1, then
Ck = AkBk + Hk . 1
On solving Ck = Ak + Bk - An Or operation
UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
COMPUTATIONAL ELEMENTS
Design of an ALU Subsystem
• Implementing ALU functions with an adder:

Figure 6.20: 1-bit adder element and 4-bit ALU


UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
COMPUTATIONAL ELEMENTS
Further Consideration of Adder
Generation:
• This principle of generation allows the system to take
advantage of the occurrences “Ak=Bk”.
Propagation:
• If we are able to localize a chain of bits Ak Ak+1... Ak+p and Bk
Bk+1... Bk+p for which Ak not equal to Bk for k in [k, k+p], then
the output carry bit of this chain will be equal to the input
carry bit of the chain.
• These remarks constitute the principle of generation and
propagation used to speed the addition of two numbers.
• All adders which use this principle calculate in a first stage.
Pk = Ak XOR Bk
Gk = Ak Bk

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Further Consideration of Adder

Figure 6.21: CMOS adder element and using pass/generate concept

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Further Consideration of Adder
• The Manchester Carry Chain:
• If the carry path is precharged to
VDD, the transmission gate is then
reduced to a simple NMOS
transistor.
• In the same way the PMOS
transistors of the carry generation
is removed.
• The Manchester cell is very fast,
but a large set of such cascaded
cells would be slow due to the
distributed RC effect and the
body effect making the
Figure 6.22: Manchester carry-chain element propagation time grow with the
square of the number of cells.
UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
COMPUTATIONAL ELEMENTS
Further Consideration of Adder
• The Manchester Carry Chain:

Figure 6.23: Cascaded Manchester carry-chain elements with buffering

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Further Consideration of Adder
• Adder Enhancement Techniques:
– Carry select adders:

Figure 6.24: Carry select adder structure

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Further Consideration of Adder
• Adder Enhancement Techniques:
– Carry select adders:

Figure 6.25: Carry select adder structure (6-bit)

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Further Consideration of Adder
• Adder Enhancement Techniques:
– Carry select adders:

Optimization of the carry select adder:


• Computational time
T = k1n
k1 – delay through one adder cell
• Dividing the adder into blocks with 2 parallel paths
T = k1n/2 + k2
k2 – time needed by multiplexer of next block to select actual output carry
• For a n-bit adder of M-blocks and each block contains P adder cells in
series so that
T = Pk1 + (M – 1) k2 ;
n = M.P minimum value for T is when M= (k1n / k2 )1/2

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Further Consideration of Adder
• Adder Enhancement Techniques:
– Carry skip adders:

Figure 6.26: Carry skip adder structure

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Further Consideration of Adder
• Adder Enhancement Techniques:
– Carry skip adders:

Figure 6.27: Carry skip adder structure

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Further Consideration of Adder
• Adder Enhancement Techniques:
– Carry skip adders:

Figure 6.28: Carry skip adder structure (24-bit)

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Further Consideration of Adder
• Adder Enhancement Techniques:
– Carry skip adders:
Optimization of the carry skip adder:
• Let us formalize that the total adder is made of N adder cells. It contains M
blocks of P adder cells. The total of adder cells is then
N = M.P
• The time T needed by the carry signal to propagate through P adder cells is
T = k1.P
• The time T' needed by the carry signal to skip through M adder blocks is
T‘ = k2.M
• The problem to solve is to minimize the worst case delay which is:
Tworst = 2(P – 1).k1 + (M – 2)
where P = n/M
• T is minimum when M = (2n.k1/k2)1/2
UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION
COMPUTATIONAL ELEMENTS
Further Consideration of Adder
• Adder Enhancement Techniques:
– Carry skip adders:
Optimization of the carry skip adder:

Figure 6.29: Worst case carry propagation carry skip adder

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Further Consideration of Adder
• Adder Enhancement Techniques:
– Carry skip adders:
Optimization of the carry skip adder:

Figure 6.30: Block propagation carry skip adder

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Further Consideration of Adder
• Adder Enhancement Techniques:
– Carry look-ahead (CLA) adders:

Figure 6.31: Carry look-ahead adder structure

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Further Consideration of Adder
• Adder Enhancement Techniques:
– Carry look-ahead (CLA) adders:

Figure 6.32: Carry look-ahead and ripple through compromise

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Further Consideration of Adder
• Adder Enhancement Techniques:
– Carry look-ahead (CLA) adders:

Figure 6.33: 4-bit Carry look-ahead adder unit

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Further Consideration of Adder
• Adder Enhancement Techniques:
– Carry look-ahead (CLA) adders:

Figure 6.34: 16-bit, 4X4 block Carry look-ahead adder unit

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Further Consideration of Adder
• Adder Enhancement Techniques:
– Carry look-ahead (CLA) adders:

Figure 6.35: Generation of carry out (from 4-bits and carry in)

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


COMPUTATIONAL ELEMENTS
Further Consideration of Adder
• Adder Enhancement Techniques:
– Carry look-ahead (CLA) adders:

Figure 6.36: Four-cell Manchester carry-chain

UNIT – VI SUBSYTEM DESIGN PROCESSES AND ILLUSTRATION


Introduction to CPLDs and FPGAs
CPLD Families
CPLD Block Diagram
1
0

0
FF

0 1

An individual switch
In a crossbar is a
diamond switch
O/Ps

Programmable switch
Function block (~ PLA w/ 1 o/p
for interconnecting
I/Ps that can be FF’ed)
various FBs
Crossbar Switch
CPLD Function Block

Extra function (e.g., g,


h) i/ps for OR term
2:1 Mux

Example function
f= ab+bc’+g+h

D-FF

PLA-like AND array


Literal inputs (e.g., a, b, c)
Field Programmable Gate Arrays (FPGAs)
FPGA Types

(Anti-fuse technology)
FPGA Families
SRAM-type FPGA Interconnect Architecture

Diamond
switch

Horizontal
routing
(interconnect)
channel
PSM: Programmable Switch Matrix (for
making connections between interconnects
of different channels). The structure shown
only allows i-to-i connections
Vertical
routing
channels CLB: Configuration Logic Block
(programmable logic cell)
SRAM-type FPGA Interconnect
Architecture (contd)

Cell Connection
Matrix (CCM)

PSM
Configuration Logic Block (CLB)

• 5-i/p function implemented using G, F and H LUTs (Look Up Tables) using Shannon’s
Expansion: p(a,b,c,d,e) = a p(1, b, c, d, e) + a’ p(0, b, c, d, e) = a q(b,c,d,e) + a’r(b,c,d,e).
q( ) impl. using LUT G, r impl. using LUT F and p=ag + a’h impl. using LUT H
• The LUT o/ps can go through a FF (for seq. ckt design) or bypass it for a combinational o/p
• This is called technology mapping: mapping the logic to CLB logic components
Technology Mapping
Programming a CLB (contd)
Components of Modern FPGAs
Digital System: Implementation Spectrum

Microprocessor Reconfigurable ASIC


Hardware

Software Firmware Hardware

–ASIC gives high performance at cost of


inflexibility.
–Processor is very flexible but not tuned to the
application.
–Reconfigurable hardware is a nice
compromise.
Simplified FPGA Logic Element

Inputs Look-Up Out


Table
(LUT)

State
Clock

Enable
High-level Compilers & FPGAs
–Difficult to estimate hardware resources.
–Some parts of program more appropriate for
processor (hardware/software codesign).
–Compiler must parallelize computation
across many resources.
–Engineers like to write in C/VHDL/Verilog
rather than pushing little blocks around.
for (i = 0; i<n, i++)
{
c[i] = a[i] + b[i]
}
Some success stories
Translating a Design to an FPGA
RTL Circuit Array

. A
. + C
B
C = A+B
.

–CAD to translate circuit from text description to


physical implementation well understood.
–Most current FPGA designers use register-
transfer level specification (allocation and
scheduling)
–Same basic steps as ASIC design.
Circuit Compilation & Implementation:
Basic Steps
1. Technology Mapping

LUT

4. Convert all implementation


2. Placement
LUT “details” to FPGA programming
info (configuration bits): LUT
RAM bits, CCM & PSM
? FF/SRAM bits, etc.
• Can store config bits on disk or ROM and
Assign a logical LUT to a physical load into FPGA as needed
location. • Can thus use the FPGA to implement
multiple digital systems (at different times
or sometimes simultaneously in different
3. Routing FPGA partitions)

Select wire segments


and switches for
Interconnection.
Technology Mapping: A Simple Example
Made of Full Adders
A B A+B = D
Co FA Ci

S
Logic synthesis tool reduces circuit to
SOP form
S = ABCi + ABCi + ABCi + ABCi

A A
B LUT Co B LUT S
Ci Ci

Co = ABCi + ABCi + ABCi + ABCi


Processor + FPGA
Three possibilities
daughtercard

Proc FPGA

chip
Backplane bus
(e.g. PCI)
1. FPGA serves as coprocessor for data
intensive applications – possible project.

Proc FPGA chip

2. FPGA serves as embedded digital system


for lower latency processing. “Reconfigurable Functional Unit”

You might also like