Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

Assignment Report on 8-bit Booth Multiplier

MOS VLSI Design


(ELL734)

Saahil Kr Nakami 2022EEY7581


Yash Juyal 2022EEY7582
Madhav 2023EEY7503
Ayanabho Banerjee 2023EEY7505

Under the guidance of


Prof. Kaushik Saha

Department of Electrical Engineering


Indian Institute of Technology Delhi
2023-24
1 INTRODUCTION

1.1 Referenced Documents


[1] I. S. Abu-Khater, A. Bellaouar and M. I. Elmasry, "Circuit techniques for CMOS low-
power high-performance multipliers," in IEEE Journal of Solid-State Circuits, vol. 31, no. 10,
pp. 1535-1546, Oct. 1996, DOI: 10.1109/4.540066.
[2] A. Bellaouar and M. I. Elmasry, Low-Power Digital VLSI Design Circuits and Systems.
Boston: Kluwer, 1995
[3] A. Morgenshtein, A. Fish and I. A. Wagner, "Gate-diffusion input (GDI) - a technique for
low power design of digital circuits: analysis and characterization", Proc. IEEE International
Symposium on Circuits and Systems., Phoenix-Scottsdale, AZ, USA, 2002, pp. I-I.
[4] J. J. F. Cavanagh, Computer Science Series: Digital Computer Arithmetic. New York:
McGraw-Hill, 1984.
[5] Hwang-Cherng Chow and I-Chyn Wey, "A 3.3 V 1 GHz high speed pipelined Booth
multiplier," 2002 IEEE International Symposium on Circuits and Systems (ISCAS), Phoenix-
Scottsdale, AZ, USA, 2002, pp. I-I, DOI: 10.1109/ISCAS.2002.1009876.

1.2 Design Library Name


• /afs/iitd.ac.in/user/y/yash_juyal1/TSMC_65/MOS_VLSI_ASSG1_TB
• /afs/iitd.ac.in/user/s/sa/saahiln/tsmc_65_MOSVLSI/MOS_VLSI_SAAHIL_SCH_V1

1.3 People Involved in the Design


• Saahil Kr Nakami
• Yash Juyal
• Ayanabho Banerjee
• Madhav
2 Function

2.1 Brief Overview


• Specifications

Sl. No. Parameter Value


1.) VDD 1.2 V
2.) PMOS & NMOS width > 80 nm
3.) Operating Frequency post PEX 500 MHz
4.) Area 20,000-50,000 um2
5.) Load Cap 0.1pF
6.) VDD-VSS Pitch < 1.8 um

Figure 2.1: Specifications

• Inputs and Outputs


Two sets of 8-bit inputs and 16-bit output.

• Algorithm used
This method optimizes the conventional multiplier by recoding it in two's complement,
reducing the number of partial products for faster operation and minimal hardware usage.
The modified Radix-2 algorithm segments the multiplier into 3-bit groups, each decoded
to generate partial products efficiently. Let us consider Y as the multiplier and X as the
multiplicand.
Multiplier Y in two’s complement is written at [2] as:

This is rewritten in [2] as

Using the modified Booth algorithm, the recoded version of Y produces a set of five signed
….digits: -2, -1, 0, +1, and +2. Each of these recoded digits as shown in Table 2.1 plays a
….distinct role in the multiplication process with X.
Y2i+1 Y2i Y2i-1 Recoded digit Operation on X
0 0 0 0 0*X
0 0 1 1 1*X
0 1 0 1 1*X
0 1 1 2 2*X
1 0 0 2 2*X
1 0 1 -1 -1*X
1 1 0 -1 -2*X
1 1 1 0 0*X

Table 2.2: Partial Product Selection [2]

Multiplier bits Y are grouped into overlapping 3-bit sets, each facilitating the calculation of a
specific partial product. These five multiples of the multiplicand are generated as shown in
Table 2.2 from the recoding of the multiplier Y. The process of generating the partial product
is shown below.

Recoded digit Operation on X


0 Add 0 to the partial product
1 Add X to the partial product
2 Shift left X by one position and add the partial product
-1 Add two's complement of X to the partial product
-2 Take two's complement of X and shift left one position

Table 2.3: Partial Product Generation [2]

Table 2.3 shows the relationship between the generated partial product (Pi) and the
multiplicand, with Pn representing the sign bit, where Pn = Pn-1 when no partial product shifting
occurs. Notably, the partial product is represented with n + 1 bits.

Table 2.4: Partial Product Generation Relations [2]

To clearly understand this algorithm let us take X = 10010101 Y = 01101001


Y can be recoded as
Two 8-bit numbers produce 4 partial products by grouping bits into 3-bit groups with a 1-bit
overlap and adding a 0 bit as Y-1 on the right side of Y. The array is trapezoidal because of sign
extension and this kind of architecture is not preferred because in VLSI implementation
rectangular array is preferred. To achieve this, we use two extra bits in the partial product. The
two additional bits are equal to the sign bit of the partial product as shown per [2]

For the second partial product, follow the given expression if the first partial product is positive;
otherwise, consider two different cases as shown per [2]

To efficiently indicate sign propagation, we can use a flag bit, F, which serves as a signal to
determine whether a previous partial product had a negative sign bit to be propagated. In the
given example Fig 2.1, F0 = 0 (no previous partial product before the first one), and F1 = F2 =
Fa = 1, indicating sign propagation from the first partial product to all subsequent ones. This
flag can be expressed using a specific Boolean equation as shown in [2].
is the sign bit of jth partial product.

Figure 2.1: 8-bit Booth Multiplication Algorithm [2]


2.2 Interfaces

• Table 2.4 shows all the interfaces

Signals Signal Type


X<0:7> Input
Y<0:7> Input
P<0:15> Output
DVDD Input/output
DGND Input/output

Table 2.4: Interfaces

• DVDD – 1.2 V
• DGND – 0 V

2.3 Architecture

Figure 2.2: 8-bit Booth Multiplier Architecture

• The inputs are X<0:7> , Y<0:7>


• Figure 2.2 shows the entire 8-bit Booth Multiplier Architecture.
• The output is P<0:15>
• For PP-HA different block is not used the Cin input is grounded in PP-FA.

2.4 Detailed functional description


• The following blocks are implemented in order for the booth multiplier to function:
• Booth Encoder
• PP-MUX
• PP-FA
• 8-bit Adder
• Add Cell
• Tapered Buffer

• The circuit and description of the above-mentioned blocks are shown below:

1. Booth Encoder
The Booth encoder is an essential component of the Booth algorithm used in binary
multiplication. It encodes three bits of the multiplier, handles sign extension, and generates
control signals to guide the multiplier operation. A high-level description of the Booth
encoder's logic and implementation is discussed below.

• Input Bits:

𝑌2𝑖−1 : The least significant bit of the current group of three bits.

𝑌2𝑖 : The middle bit of the current group of three bits.

𝑌2𝑖+1 : The most significant bit of the current group of three bits.

• Complement Bits:

𝑌2𝑖−1 ,𝑌2𝑖′ , 𝑌2𝑖+1

: The complements of the input bits.

• Sign Extension:
Sign extension logic ensures that the Booth encoding considers the signs of the bits. Depending
on the encoding scheme (2's complement or others), the sign bits may need to be extended.
This is typically done by copying the most significant bit (𝑌2𝑖+1 ) to the left to fill the sign bits.

• Encoding Logic:
The Booth encoder's primary function is to generate control signals that guide the multiplier
operation (like selecting a 0, 1, or 2's complement of the partial product). The encoding logic
examines the three input bits and generates one of several possible encoding values based on
the Booth algorithm's rules.
For example, it can output 00, 01, 10, or 11 to represent different multiplication operations.
The encoding is often implemented using a truth table or a combination of logic gates,
depending on the specific implementation requirements.
Figure 2.3: Booth Encoder

• 0x

Figure 2.4: Generation of 0x Control Signal

• 1x

Figure 2.5: Generation of 1x Control Signal


• 2x

Figure 2.6: Generation of 2x Control Signal

• -1x

Figure 2.7: Generation of -1x Control Signal

• -2x

Figure 2.8: Generation of -2x Control Signal


• 𝑷𝑷𝒏+𝟐

Figure 2.9: Generation of 𝑃𝑃𝑛+2

• 𝑷𝑷𝒏+𝟏

Figure 2.10: Generation of 𝑃𝑃𝑛+1

2. PP-MUX

The basic functioning of the PP-MUX is to select either 𝑋𝑖−1 , 𝑋𝑖′ , 𝑋𝑖 , 𝑋𝑖−1 depending on the
control signal provided i.e. either 0x, 1x, 2x, -1x, -2x, these signals are generated based on the
recoded bits of multiplier Y as mentioned above in the encoder section.
The PP-MUX plays a crucial role in the Booth multiplication algorithm by dynamically
choosing the appropriate input data based on the current recoded multiplier bits. This selection
process ensures that the correct partial product is generated for each stage of the multiplication
operation, contributing to the overall accuracy and efficiency of the multiplication process.
The circuit implementation shown in Figure 2.9 represents a detailed representation of how the
PP-MUX functions. It embodies the intricate logic and control signals required to make precise
data selections, thus contributing to the overall functioning of the Booth multiplication
algorithm in high-performance digital systems.
Figure 2.11: PP-MUX Architecture

3. PP-FA (Multiplier Cell)

Figure 2.12: PP-FA Architecture

The primary responsibility of a multiplier cell is to compute and produce a single bit of the
accurate partial product.
The generated partial product bit is added to the cumulative sum, which has propagated from
preceding cells. This addition operation ensures the progressive accumulation of the product
bits towards the final result.
The multiplier cell comprises two integral components:

• Partial Product Multiplexer (PP-MUX): The PP-MUX plays a pivotal role in the
generation of the partial product bit. It selects the appropriate input bit, either from the
multiplicand or its complement, based on the control signals provided by the multiplier
algorithm. The chosen input is then subjected to the multiplication operation, and the
resulting bit is directed towards further processing.
• Adder Unit (Full Adder or Half Adder): Depending on the specific design and
operational requirements, the multiplier cell is equipped with either a Full Adder (FA) or a
Half Adder (HA). This component is responsible for the addition of the generated partial
product bit to the cumulative sum, effectively updating the sum for subsequent stages of
multiplication.
Notably, in the initial row of the multiplier, which corresponds to the least significant bits
….of the multiplication operation, only the generation of partial products is necessary.
….Consequently, each cell in this row exclusively comprises a PP-MUX circuit without the
….need for an adder component.
.Figure 2.10 provides a comprehensive block diagram illustrating the architecture and
….interconnection of components within the multiplier cell, underscoring the crucial role it
….plays in facilitating the multiplication process with precision and efficiency. This
….professional design ensures the correct generation of partial products and their systematic
….accumulation to produce the final product, adhering to the principles of digital
….multiplication.

4. 8-bit Adder

• The 8-bit Adder block has been designed employing the carry ripple logic architecture, with
specific optimizations applied to the individual Full Adder blocks.

• In the construction of each Full Adder, a specialized logic design known as mGDI
(Modified Gate Diffusion Input) logic has been strategically employed to craft the essential
XOR gates and mGDI basic cell, crucial for the generation of both the sum and carry
signals.

mGDI method

• The mGDI (Gate Diffusion Input) method is predicated on the utilization of a fundamental
cell, as depicted in Figure 2.11. Upon initial observation, this basic cell bears some
resemblance to the standard CMOS inverter. However, it distinguishes itself through
several critical disparities.

Figure 2.13: mGDI Basic Cell [3]


• Three Inputs: The mGDI cell incorporates three inputs, denoted as G (common gate input
for both nMOS and PMOS), P (input connected to the source/drain of PMOS), and N (input
connected to the source/drain of nMOS). This tri-input configuration deviates from the
typical dual-input structure found in standard CMOS inverters.

• The mGDI Basic Cell can implement a wide range of functions using only two gates,
that’s what gives this technology edge over normal static CMOS implementation.
• The individual blocks involved in the construction of the 8-bit Adder is shown below:

• mGDI XOR Gate

Figure 2.14: mGDI XOR Gate

• mGDI Carry Block

Figure 2.15: mGDI Carry Block


• mGDI FA (Full Adder)

Figure 2.16: mGDI FA (Full Adder)

• mGDI 8-bit Ripple Carry Adder (RCA)

Figure 2.17: mGDI 8-bit RCA

5. Add Cell
To obtain the two's complement of a binary number, a systematic procedure is followed.
Initially, the binary number is inverted, and subsequently, a logical "1" is added to the inverted
result. The Add Cell is responsible for generating the required "1" to be added, as and when
necessary. This process adheres to established principles of binary arithmetic and is commonly
employed in digital computation and representation. Figure 2.18 shows the architecture.

Figure 2.18: Add Cell


6. Tapered Buffer

• A tapered buffer, also known as a taper buffer or simply a buffer, is an electronic circuit or
component commonly used in integrated circuits (ICs) and semiconductor devices. Its
primary function is to condition or modify the electrical signals passing through it. Here
are some key aspects of tapered buffers:

• Signal Conditioning:
Tapered buffers are used to shape or condition electrical signals. They can serve various
purposes, such as impedance matching, signal level adjustment, or driving a signal through
a long transmission line without significant loss.

• Signal Amplification:
In some cases, tapered buffers may amplify signals, ensuring they have enough strength to
be reliably processed by subsequent circuitry.

• Signal Isolation:
Tapered buffers provide isolation between different sections of a circuit, preventing
undesired interactions or interference.

• CMOS Tapered Buffer:


In digital integrated circuits, CMOS (Complementary Metal-Oxide-Semiconductor)
technology is used to implement tapered buffers as shown in Figure 2.19. CMOS buffers
consist of both n-type (NMOS) and p-type (PMOS) transistors and can be designed to
provide various signal conditioning functions.

Figure 2.19: Tapered Buffer


3 Design Parameters

3.1 Performance Requirements


• The operating frequency of the multiplier must be >= 500MHz with a load capacitor of
0.1pF.

3.2 Clock distribution


N/A

3.3 Reset
N/A

3.4 Timing Description


N/A

4 Verification Strategy

4.1 Objectives

• Design:
The primary objective of this design is to implement an 8x8 Booth Multiplier, a crucial
component in digital arithmetic, capable of multiplying two sets of 8-bit binary numbers
efficiently. This design leverages the Booth algorithm to optimize the multiplication
process, reducing the number of partial products generated and improving computational
speed.

• Objective of Verification:
The main goal of verification is to test the functionality of each block within an integrated
circuit.

• Functionality Testing:
To verify functionality, the first step involves performing a DC (Direct Current) analysis.
This analysis aims to determine if the block generates the required output under steady-
state conditions.
• Critical Path Identification:
After confirming functionality, the next step is to identify the critical path within the block.
The critical path represents the longest delay path through the circuit, which determines the
overall circuit speed.

• Delay Calculation:
To calculate the critical path delay, a transient analysis is performed. A transient pulse
(input signal change) is applied to one of the inputs while keeping the other inputs constant.
The time difference between the generation of the output and the input transition is
measured, giving the critical path delay.

• PVT Analysis:
The same analysis process is repeated across different Process, Voltage, and Temperature
(PVT) variations. These variations include different Process corners (TT, FF, FS, SF, SS)
and multiple Voltage and Temperature variations. This comprehensive analysis across
various conditions is referred to as PVT analysis.

• Robustness Check:
PVT analysis is essential to assess the robustness of the circuit. It helps ensure that the
circuit functions correctly under different operating conditions, accounting for
manufacturing process variations, voltage fluctuations, and temperature changes.

• Post-Layout Analysis:
The entire verification and PVT analysis process is repeated after the layout of the
integrated circuit has been finalized. This is crucial because layout can impact signal
propagation delays and other factors that affect circuit performance.
In summary, this process is critical in the design and verification of digital integrated
…..circuits to ensure that they meet their specified functionality and performance criteria under
…..a range of real-world operating conditions, including variations in process technology,
…..voltage levels, and temperature.

4.2 Tools and version


• Cadence Virtuoso 6.1.8-64b is the version used
• ADE_L and ADE_XL for simulations
• Calibre for DRC, LVS and Post Layout Extraction (PEX)
4.3 Checking mechanisms
The verification and analysis process for digital integrated circuits is a critical phase in ensuring
the functionality, speed, and reliability of the designed circuit. This process involves a
systematic approach to validate each block and the top-level design. Here is an overview of the
key steps involved:

• DC Analysis and Critical Path Estimation


The verification process commences with DC analysis to assess the fundamental functionality
of each block within the integrated circuit. Following this, critical path delay estimation is
performed to gauge the circuit's speed.

• Block-Level Verification
This verification procedure is applied to all individual blocks within the integrated circuit. Each
block's output is connected to a load capacitor of 0.05pF to evaluate its ability to drive a specific
load. This step confirms both the functionality and output driving capability of the blocks.

• Process Variations Analysis


The circuit's robustness is assessed across five process variations, denoted as TT, FF, SS, FS,
and SF, to account for manufacturing process variations. These variations can impact circuit
performance significantly.

• Voltage and Temperature Variations


Additionally, the functionality and speed of the circuit are evaluated under three sets of voltage
and temperature conditions. Voltage levels and temperature variations can affect circuit
behavior, making it crucial to account for these factors in the analysis.

• Top-Level Design Verification


Once all sub-blocks have been thoroughly checked and verified, the top-level integrated circuit
is assembled and subjected to a similar verification mechanism. This ensures that the integrated
system functions cohesively and meets design specifications.

• Output Load Analysis


To assess the ability of the top-level circuit to drive external loads, a load capacitor of 0.1pF is
connected to its output. This analysis confirms whether the circuit can effectively drive a load
of that specific capacitance.

• Post-Layout Extraction
The entire verification and analysis process is repeated after the layout phase, where the
physical placement and routing of components are finalized. This step is essential as layout
changes can impact signal propagation delays and other critical parameters.
5 Functional Checklist
In a booth multiplier design, Booth encoders and array cells are essential for efficient partial
product generation. Booth encoders handle encoding, decoding, and sign extension logic
propagation. Array cells generate specific bits added to the cumulative sum. In two's
complement, inverting a number involves bit flipping and adding one. The "ADD" cell
manages this process. The final stage includes a 16-bit adder split into two 8-bit adders from
which the 16-bit output of the multiplier is generated.
In summary, Booth encoders, array cells, and associated components are crucial in booth
multiplier design. They handle encoding, decoding, bit generation, and addition operations.
The final stage ensures accurate multiplication results. To verify the functionality of the block
first DC analysis is performed followed by the identification of the critical path to calculate the
critical path delay which determines the operating frequency of the multiplier.

6 Testbench

6.1 Overview
The verification and analysis process for digital integrated circuits is a critical phase in ensuring
the functionality, speed, and reliability of the designed circuit. This process involves a
systematic approach to validate each block and the top-level design. Here is an overview of the
key steps involved:

• DC Analysis and Critical Path Estimation


The verification process commences with DC analysis to assess the fundamental functionality
of each block within the integrated circuit. Following this, critical path delay estimation is
performed to gauge the circuit's speed, this is the most important analysis

• Top-Level Design Verification


Once all sub-blocks have been thoroughly checked and verified, the top-level integrated circuit
is assembled and subjected to a similar verification mechanism. This ensures that the integrated
system functions cohesively and meets design specifications.

• Output Load Analysis


To assess the ability of the top-level circuit to drive external loads, a load capacitor of 0.1pF is
connected to its output. This analysis confirms whether the circuit can effectively drive a load
of that specific capacitance.

• Post-Layout Extraction
The entire verification and analysis process is repeated after the layout phase, where the
physical placement and routing of components are finalized. This step is essential as layout
changes can impact signal propagation delays and other critical parameters.
• Critical Path of the Top Level
Figure 6.1 provides a visualization of the critical path within the top-level structure of the 8x8
Booth Multiplier Architecture. This critical path determination involves a rigorous analysis
aimed at identifying the specific pathway that yields the most prolonged propagation delay,
consequently representing the worst-case scenario.

Figure 6.1: Critical Path of Top Level

6.2 Architecture

• The comprehensive evaluation of the circuit begins with a rigorous DC Analysis, which
serves as a fundamental step in assessing the circuit's operational integrity. This analysis
enables us to ascertain whether the circuit functions as intended or if any potential issues
arise.

• Subsequently, we proceed to calculate the critical path delay, a crucial metric that gauges
the worst-case propagation delay within the circuit. This specific path, previously identified
as illustrated above, is of paramount importance in evaluating the circuit's overall
performance. To measure this delay, a pulse is applied at a designated input point, serving
as the initiation point of our critical path. The delay is then meticulously quantified by
measuring the time elapsed between the input signal's rising edge and the subsequent
change in the circuit's output.

• In Figure 6.2, we are presented with the testbench setup meticulously designed for the
execution of both the DC Analysis and the precise calculation of propagation delay. This
setup reflects a commitment to a methodical and professional approach to circuit
evaluation, ensuring that the circuit's functionality and timing characteristics are rigorously
assessed to meet the specified requirements and performance criteria.

Figure 6.2: Test Bench Setup


Figure 6.3: ADE_L Window for Running DC Simulations (Schematic)

• ADE_L Window for running DC Simulations for the top-level is shown in Figure 6.3
Figure 6.4: ADE_L Window for Critical Path Delay Pre Layout (Schematic)

• ADE_L Window for running transient sims for the calculation of critical path delay Pre-
layout is shown in Figure 6.4.
Figure 6.5: Corner Setup

• The ADE_XL (Analog Design Environment eXtensive) window is harnessed as the


platform of choice for executing an extensive Parametric, Voltage, and Temperature
(PVT) analysis, involving a comprehensive exploration of 45 distinct corners, as visually
represented in Figure 6.5.
• This multifaceted PVT analysis encompasses a broad spectrum of conditions, including:
• Process Corners: Evaluations are conducted across a range of process corners,
encompassing the typical (TT), fast-fast (FF), slow-slow (SS), fast-slow (FS), and slow-
fast (SF) corners. Each corner encapsulates unique process characteristics impacting
circuit behaviour.
• Temperature Variations: The analysis accounts for temperature fluctuations spanning
from -40°C to 27°C to 140°C, encompassing both sub-zero and elevated temperature
regimes. This diverse temperature spectrum probes the circuit's performance under
varying thermal conditions.
• Voltage Variations (10%): The voltage domain is explored through variations of 10%,
encompassing voltage levels of 1.08V, 1.2V, and 1.32V. These voltage settings offer
insights into the circuit's adaptability and behaviour across different supply voltages.
• The culmination of these meticulous evaluations results in a total of 45 corner analyses,
each meticulously executed to ascertain the worst-case critical path delay within the
design. This exhaustive approach ensures a comprehensive exploration of the circuit's
behaviour under diverse environmental conditions, ultimately facilitating the
identification of the most challenging scenario in terms of propagation delay.
• This comprehensive analysis approach is instrumental in thoroughly assessing the circuit's
performance, ensuring its robustness and reliability across a wide array of operational
scenarios and environmental conditions, thereby aligning with best practices in circuit
design and evaluation.
• Figure 6.6 shows the ADE_XL window comprising of the 45 corner analysis of the
Critical Path delay (Pre-Layout).

Figure 6.6: ADE_XL Window for PVT Analysis Pre-Layout (Schematic)

• The same analysis is repeated Post-Layout after exporting the calibre file of the top level
and running the simulations using that, this happens after the parasitic extraction of the
Layout.
Figure 6.7: ADE_L Window for Critical Path Estimation Post Layout Extraction

• ADE_L Window for running transient sims for the calculation of critical path delay Post-
layout extraction is shown in Figure 6.7.
Figure 6.8: ADE_XL Window for PVT Analysis Post-Layout Extraction

• Figure 6.8 shows the ADE_XL window comprising of the 45-corner analysis of the
Critical Path delay post layout.
7 Tests Specifications
• All the test suits and functional tests done is mentioned in great depth in the previous
section.

8 Design Microarchitecture

8.1 Top Level Interface

Figure 2.2: 8-bit Booth Multiplier Architecture

• The inputs are X<0:7> Y<0:7>

• Figure 2.2 shows the entire 8-bit Booth Multiplier Architecture.

• The output is P<0:15>

• For PP-HA different block is not used the Cin input is grounded in PP-FA.
8.2 Sub-Block Description

• The following blocks are implemented in order for the booth multiplier to function:
• Booth Encoder
• PP-MUX
• PP-FA
• 8-bit Adder
• Add Cell
• Tapered Buffer

• The circuit and detailed description of the above-mentioned blocks is shown in detail in
Section 2.4.

8.3 Structural Mapping Process


N/A
9 Physical Hierarchy

9.1 Floorplanning

Figure 9.1: Complete Layout View

Figure 9.1 provides the complete Layout view with major pins marked using TEXT layer.
A. Placement Plan:

1.) Booth Encoder


a.) Placed on the left as its outputs are global encoded signals going as input to PP
Generators.
b.) Input Y<0:7> can be vertically routed.
2.) PP Generators
a.) Array of two to accommodate in same height as Booth Encoder
b.) Placed in pairs of consecutive Couts and Cins as they form the critical path.
3.) ADD
a.) Placed in between PP Generators due to its small area, nearest to Full Adder.
4.) MUX
a.) Placed along with PP Generators and the area created due to difference in height
is used for routing.
5.) 8-bit Ripple Carry Adder
a.) Set of 2 placed in bottom to route output vertically and parallelly.

B. Routing Plan:

Sl. No. Metal Layer Widths (um) Signal Direction


1. M1 0.1 Std Cell routing X/Y
Internal routes
2. M2 0.1 Couts, Cins, Sin, Souts X
0.25 Internal Power/GND X
3. M3 0.1 Inter Block connection Y
0.25 Global routes Y
3 Global Power/GND Y
4. M4 0.4 Multi-Global Power/GND X
Table 9.1: Routing Plan

• No metal above M4 has been used.


Figure 9.2: Sub-Blocks Highlighted

• Sub-Blocks are highlighted in Figure 9.2.


Sub blocks:
1. Inverter

Figure 9.3: INVERTER

X*Y= 0.85 µm x 1.245 µm; Area=1.05825 µm2


2. mGDI Basic Cell

Figure 9.4: mGDI_basic_cell

X*Y= 2.19 µm x 1.245 µm; Area=2.72655 µm2


3. Pass Transistor

Figure 9.5: Pass Transistor

X*Y= 0.85 µm x 1.445 µm; Area=1.22825 µm2


4. mGDI XOR

Figure 9.6: mGDI_XOR

X*Y= 2.86 µm x 1.245 µm; Area=3.5607 µm2


5. 2 Stage Buffer

Figure 9.7: 2 Stage Buffer

X*Y= 1.67 µm x 1.51 µm; Area=2.5217 µm2


6. 4 Stage Buffer

Figure 9.8: 4 Stage Buffer

X*Y= 3.365 µm x 3.03 µm; Area=10.19595 µm2


7. mGDI Full Adder

Figure 9.9: mGDI_FULL_ADDER

X*Y= 2.955 µm x 3.905 µm; Area=11.539275 µm2


8. mGDI 8-bit RCA Adder

Figure 9.10: mGDI_8BIT_RCA

X*Y= 39.83 µm x 3.905 µm; Area=155.53615 µm2


9. MUX

Figure 9.11: MUX

X*Y= 2.955 µm x 9.595 µm ; Area=28.353225 µm2


10. Partial Product Generator

Figure 9.12: Partial Product Generator

X*Y= 2.955 µm x 13.7 µm ; Area=40.4835 µm2


11. Booth Encoder

Figure 9.13: BOOTH_ENCODER

X*Y= 10.87 µm x 11.025 µm ; Area=119.84175µm2


9.2 Clocktree Insertion
N/A

9.3 Layout Strategies


Some notable strategies used are as follows:
A.) Design Related:

• Std cells like inverter, XOR and buffer have a fixed height of 1.2um to ease abutment in X
direction in top level.
• PP/NP implant removed at PCELL level and custom drawn to compress standard cell
height.
• Each cell, irrespective of it being a standard cell or a complex block, has a standard
power/GND rail with 0.4um width in M4 at equal intervals.
• This creates uniformity in the top-level power/GND rail.
• Half DRC for abutting blocks maintained.
• Custom TAP cells used instead of strips of MPPs as we need substrate biasing only every
30um to avoid Latch up.
• Blocks having globally routed signals, have been floated inside its own block so that a
distinct track forms when multiple such blocks are instantiated in the top level.

B.) EDA Tool Related:

• Annotation Browser: Used to track shorts/open dynamically as the layout is designed.


• SKILL codes: Used to create bindkeys for cadence inbuilt functions and custom pattern
generators.
• Auto Pin Placer: Detects and auto places pins on signals.
• Clone Generator: Generates one or more instances with similar connectivity/property in
the same pattern.
• Auto via gen: Places vias over multiple intersections having same connectivity.
• Gravity: Auto places cursor over edges, reduces zoom ins at top level.
• Infix: Used to reduce no. of mouse clicks per movement for greater speed
10 Results

• This section aims to provide an insight into the performance of the 8x8 Booth Multiplier
concerning speed across diverse process variations, voltage levels, and temperature
conditions.
• Additionally, it will offer an overview of the utilized area, both at the individual block level
and within the top-level layout.
• This comprehensive analysis forms a critical part of the assessment, enabling a thorough
understanding of the multiplier's behaviour under various operational scenarios and its
corresponding physical footprint.

10.1 Area

Table 10.1: Area of Each Sub-Block

Top Block ND2-equivalent gate count % Dimensions (X*Y) Layout.Area (um2)


Utilization
8X8 BOOTH MULTIPLIER ≈3,000 45% 116 x 50 um2 5,800um2

Table 10.2: Area of Top Block


10.2 Timing

• In this section, a detailed examination of the critical path delay and the resulting worst-case
operating frequency is presented.
• The evaluations are conducted at both the typical corner (TT) and the most challenging
corner (SS) stages, both pre and post layout implementation.
• This comprehensive assessment offers valuable insights into the system's performance
under varying conditions, facilitating a comprehensive understanding of its operational
capabilities and potential optimizations.
• This analysis encompasses all individual blocks within the architecture and extends to the
entire system as shown below:

1. Booth Encoder
Load Capacitance used: 50fF

Table 10.2: Critical Path Delay for Booth Encoder Block

2. Multiplexer
Load Capacitance used: 30fF

Table 10.3: Critical Path Delay for Multiplexer Block

3. Add Cell
Load Capacitance used: 20fF

Table 10.4: Critical Path Delay for Add Cell Block


4. Full Adder
Load Capacitance used: 10fF

Table 10.5: Critical Path Delay for Full Adder Block

5. Ripple Carry Adder


Load Capacitance used: 50fF

Table 10.6: Critical Path Delay for RCA Block

6. Tapered Buffer
Load Capacitance used: 100fF

Table 10.7: Critical Path Delay for Tapered Buffer Block


7. Multiplier (8-bit) (Total Integration)
Load Capacitance used: 100fF

Table 10.8: Critical Path Delay for 8-bit Booth Multiplier Block

Conclusion
• The data presented in the table above reveals noteworthy performance characteristics.
Specifically, at the typical corner (TT), the pre-layout speed attains a commendable 3.418
GHz, while at the slow corner (SS), it registers at 1.935 GHz, with an associated output
load of 0.1pF.
• Furthermore, the post-layout assessment indicates that at the typical corner (TT), the speed
stands at 1.524 GHz, while at the slow corner (SS), it operates at 0.886 GHz under the same
0.1pF output load condition.
• In light of the project's stipulated specifications, where the minimum required operating
frequency is set at 500MHz, it is noteworthy that the achieved speeds align seamlessly with
the anticipated operating frequency. This affirmation underscores the successful fulfillment
of the project's performance requirements and underscores the proficiency of the design
and layout implementations.

10.3 Testability Analysis


This has been explained in Section 6 in detail
10.4 LVS/DRC rule violations

• LVS and ERC are clean with no violations.

Figure 10.1: LVS ERC Clean snapshot


• DRC is clean with some violations as shown below.

Figure 10.2: DRC violations snapshot Part 1/2

Figure 10.3: DRC violations snapshot Part 2/2


• The above DRC has minimum density, ESD and chip level integration DRC errors
which can be ignored.

11. Bugs Known at Submission Date

• No specific bugs to be reported.

You might also like