Textual Learning Material - Module 5

128 Computer Architecture and Parallel Processing
Unit 9: Synchronous Parallel Processing

Notes
Structure
9.1 Introduction
9.2 SIMD Architecture and Programming Principles
9.3 SIMD Architecture
9.4 MIMD Architecture
9.5 SIMD Computer and Performance Enhancement
9.6 Issues with SIMD
9.7 Summary
9.8 Check Your Progress
9.9 Questions and Exercises
9.10 Key Terms
9.11 Further Readings
Objectives
After studying this unit, you should be able to:
z Understand SIMD Architecture
z Discuss the programming Principles of SIMD
z Understand the SIMD Architecture
z Know about the SIMD Computer and Performance
9.1 Introduction
Parallel processors are computer systems consisting of multiple processing units
connected via some interconnection network plus the software needed to make the
processing units work together. There are two major factors used to categorize such
systems: the processing units themselves, and the interconnection network that ties
them together. The processing units can communicate and interact with each other
using either shared memory or message passing methods. The interconnection network
for shared memory systems can be classified as bus-based versus switch-based. In
message passing systems, the interconnection network is divided into static and
dynamic. Static connections have a fixed topology that does not change while programs
are running. Dynamic connections create links on the fly as the program executes. The
main argument for using multiprocessors is to create powerful computers by simply
connecting multiple processors. A multiprocessor is expected to reach faster speed
than the fastest single-processor system. In addition, a multiprocessor consisting of a
number of single processors is expected to be more cost-effective than building a high-
performance single processor. Another advantage of a multiprocessor is fault tolerance.
If a processor fails, the remaining processors should be able to provide continued
service, albeit with degraded performance.
9.2 SIMD Architecture and Programming Principles

The most popular taxonomy of computer architecture was defined by Flynn in 1966.
Flynn’s classification scheme is based on the notion of a stream of information. Two
types of information flow into a processor: instructions and data. The instruction stream
is defined as the sequence of instructions performed by the processing unit. The data
Amity Directorate of Distance & Online Education

Synchronous Parallel Processing 129
stream is defined as the data traffic exchanged between the memory and the
processing unit.
Notes
According to Flynn’s classification, either of the instruction or data streams can be
single or multiple. Computer architecture can be classified into the following four distinct
categories:
z Single-instruction single-data streams (SISD);
z Single-instruction multiple-data streams (SIMD);
z Multiple-instruction single-data streams (MISD); and
z Multiple-instruction multiple-data streams (MIMD).
Conventional single-processor von Neumann computers are classified as SISD
systems. Parallel computers are either SIMD or MIMD. When there is only one control
unit and all processors execute the same instruction in a synchronized fashion, the
parallel machine is classified as SIMD. In a MIMD machine, each processor has its own
control unit and can execute different instructions on different data. In the MISD
category, the same stream of data flows through a linear array of processors executing
different instruction streams. In practice, there is no viable MISD machine; however,
some authors have considered pipelined machines (and perhaps systolic-array
computers) as examples for MISD. Figures 9.1, 9.2, and 9.3 depict the block diagrams
of SISD, SIMD, and MIMD, respectively. An extension of Flynn’s taxonomy was
introduced by D. J. Kuck in 1978. In his classification, Kuck extended the instruction
stream further to single (scalar and array) and multiple (scalar and array) streams. The
data stream in Kuck’s classification is called the execution stream and is also extended
to include single.
Figure 9.1: SISD Architecture
Figure 9.2: SIMD Architecture

(scalar and array) and multiple (scalar and array) streams. The combination of these
streams results in a total of 16 categories of architectures.

9.3 SIMD Architecture

Notes The SIMD model of parallel computing consists of two parts: a front-end computer of
the usual von Neumann style, and a processor array as shown in Figure 1.4. The
processor array is a set of identical synchronized processing elements capable of
simultaneously performing the same operation on different data. Each processor in the
array has a small amount of local memory where the distributed data resides while it is
being processed in parallel. The processor array is connected to the memory bus of the
front end so that the front end can randomly access the local.
Figure 9.3: MIMD Architecture
Figure 9.4: SIMD Architecture Model

Processor memories as if it were another memory. Thus, the front end can issue
special commands that cause parts of the memory to be operated on simultaneously or
cause data to move around in the memory. A program can be developed and executed
on the front end using a traditional serial programming language. The application
program is executed by the front end in the usual serial way, but issues commands to
the processor array to carry out SIMD operations in parallel. The similarity between
serial and data parallel programming is one of the strong points of data parallelism.
Synchronization is made irrelevant by the lock – step synchronization of the processors.
Processors either do nothing or exactly the same operations at the same time. In SIMD
architecture, parallelism is exploited by applying simultaneous operations across large
sets of data. This paradigm is most useful for solving problems that have lots of data
that need to be updated on a wholesale basis. It is especially powerful in many regular
numerical calculations. There are two main configurations that have been used in SIMD
machines (see Fig. 1.5). In the first scheme, each processor has its own local memory.
Processors can communicate with each other through the interconnection network.

Basic Principles
There is a two-dimensional array of processing elements, each connected to its four Notes
nearest neighbors.
z All processors execute the same instruction simultaneously.
z Each processor incorporates local memory.
z The processors are programmable, that is, they can perform a variety of functions.
z Data can propagate quickly through the array
Figure 9.5
9.4 MIMD Architecture

Multiple-instruction multiple-data streams (MIMD) parallel architectures are made of
multiple processors and multiple memory modules connected together via some.
Figure 9.5: Two SIMD Schemes

Interconnection network: They fall into two broad categories: shared memory or
message passing. Figure 1.6 illustrates the general architecture of these two
Notes categories. Processors exchange information through their central shared memory in
shared memory systems, and exchange information through their interconnection
network in message passing systems.
A shared memory system typically accomplishes interprocessor coordination
through a global memory shared by all processors. These are typically server systems
that communicate through a bus and cache memory controller. The bus/ cache
architecture alleviates the need for expensive multiported memories and interface
circuitry as well as the need to adopt a message-passing paradigm when developing
application software. Because access to shared memory is balanced, these systems
are also called SMP (symmetric multiprocessor) systems. Each processor has equal
opportunity to read/write to memory, including equal access speed.
Figure 9.6: Shared Memory Versus Message Passing Architecture

Commercial examples of SMPs are Sequent Computer’s Balance and Symmetry,
Sun Microsystems multiprocessor servers, and Silicon Graphics Inc. multiprocessor
servers. A message passing system (also referred to as distributed memory) typically
combines the local memory and processor at each node of the interconnection network.
There is no global memory, so it is necessary to move data from one local memory to
another by means of message passing. This is typically done by a Send/Receive pair of
commands, which must be written into the application software by a programmer. Thus,
programmers must learn the message-passing paradigm, which involves data copying
and dealing with consistency issues.
Commercial examples of message passing architectures c. 1990 were the nCUBE,
iPSC/2, and various Transputer-based systems. These systems eventually gave way to
Internet connected systems whereby the processor/memory nodes were either Internet
servers or clients on individuals’ desktop. It was also apparent that distributed memory
is the only way efficiently to increase the number of processors managed by a parallel
and distributed system. If scalability to larger and larger systems (as measured by the
number of processors) was to continue, systems had to use distributed memory
techniques.
Notes
These two forces created a conflict: programming in the shared memory model was
easier, and designing systems in the message passing model provided scalability. The
distributed-shared memory (DSM) architecture began to appear in systems like the SGI
Origin2000, and others. In such systems, memory is physically distributed; for example,
the hardware architecture follows the message passing school of design, but the
programming model follows the shared memory school of thought. In effect, software
covers up the hardware. As far as a programmer is concerned, the architecture looks
and behaves like a shared memory machine, but a message passing architecture lives
underneath the software. Thus, the DSM machine is a hybrid that takes advantage of
both design schools.
9.5 SIMD Computer and Performance Enhancement

SIMD Machines The three SIMD machines covered in this paper are the Connection
Machine by Danny Hillis, the Abacus Project at the MIT AI Lab, and the CAM-8 machine
by Norman Margolus. These three machines give a pretty accurate sampling of the type
of SIMD machines that were constructed as well as an idea of the motivations for
creating the machines in the first place.
The Connection Machine was composed of 65,536 bit processors. Each die
consisted of 16 processors with each processor capable of communicating with each
other via a switch. These 4,096 dies formed the nodes of a 12 th dimension hypercube
network. Thus, a processor was guaranteed to be within 12 hops of any other processor
in the machine. The hypercube network also facilitated communication by providing
alternative routes from source processor to destination. Each node was given a 12-
bitnode ID, and different paths between two nodes in the network could be traversed
based on how the node ID was read. The network allowed for both packet and circuit-
based communication for flexibility.
The second machine discussed is the Abacus machine created at the MIT AI Lab.
This machine was constructed primarily for vision processing. The machine consisted of
1024 bit processing elements set in a 2D mesh. The primary concept of interest from
the design was that the processing elements were configurable, and used
reconfigurable bit parallel "RBP" algorithms instead of traditional bit serial computation.
This means that each PE emulated logic for part of an arithmetic circuit (be it an adder,
shifter, multiplier, etc) based on a RBP algorithm. The motivation for having these
configurable processing elements was to save on the silicon area needed to implement
arithmetic. However, because there was a necessary overhead for reconfiguration and
the implementation did not easily allow for pipelining due to data dependencies, it was
not clear that having configurable processing elements was a definite win.
The final machine of the talk was for the CAM-8 cellular automata machine, a
machine that mimics the basic spatial locality of physics. Thus, each processing
element would get it’s own "piece" of the physical entity that is being modeled, such as
a location in space or a particle, etc. The processing elements are connected in a 3D
mesh, a natural topology for describing microphysical simulations. Each processing
element consisted of a programmable lookup-table with an associated local memory.
Since there are usually not enough processing elements to assign one to each cell of
the system that was to be simulated, each PE was assigned a specific region and
performed updates to each cell virtually. Cellular automata applications include
microscopic physics/lattice gas simulations (for which the machine was originally
intended), statistical mechanics, and data visualization/ image processing.
9.6 Issues with SIMD

Although SIMD machines are very effective for certain classes of problems (namely
embarrassingly parallel problems with very high parallelism), they do not perform well
universally. The architecture is specifically tailored for data computation-intensive work;
as such, SIMD machines are rather inflexible and perform poorly on some classes of
important problems. In addition, SIMD architectures typically do not scale down in
Notes competitive fashion when compared to other style multiprocessor architectures. That is
to say, the desirable price/performance ratio when constructed in massive arrays
becomes less attractive when constructed on a smaller scale. Finally, since SIMD
machines almost exclusively rely on very simple processing elements, the architecture
cannot leverage low-cost advances in microprocessor technology. A combination of
these factors and others have lead SIMD architectures to remain in only specialized
areas of use.
9.7 Summary
In this unit, we have gone over a number of concepts and system configurations related
to obtaining high-performance computing via parallelism. In particular, we have
provided the general concepts and terminology used in the context of multiprocessors.
The popular Flynn’s taxonomy of computer systems has been provided. An introduction
to SIMD and MIMD systems was given.

Multiple Choice Questions
1. Von Neumann architecture is
(a) SISD
(b) SIMD
(c) MIMD
(d) MISD
2. SIMD represents an organization that ……………..
(a) refers to a computer system capable of processing several programs at the
same time.
(b) represents organization of single computer containing a control unit, processor
unit and a memory unit.
(c) includes many processing units under the supervision of a common control unit
(d) none of the above.
3. SIMD represents an organization that ………………..
same time.
4. MIMD stands for
(a) Multiple instruction multiple data
(b) Multiple instruction memory data
(c) Memory instruction multiple data
(d) Multiple information memory data
5 SIMD represents an organization that …………………
same time.

Notes
6. The key factor/s in commercial success of a computer is/are ………………
(a) Performance
(b) Cost
(c) Speed
(d) Both a and b
7. The main objective of the computer system is
(a) To provide optimal power operation.
(b) To provide best performance at low cost.
(c) To provide speedy operation at low power consumption.
(d) All of the above.
8. A common measure of performance is
(a) Price/performance ratio.
(b) Performance/price ratio.
(c) Operation/price ratio.
(d) None of the above.
9. The performance depends on
(a) The speed of execution only.
(b) The speed of fetch and execution.
(c) The speed of fetch only
(d) The hardware of the system only.
10. The main purpose of having memory hierarchy is to
(a) Reduce access time.
(b) Provide large capacity.
(c) Reduce propagation time.
(d) Both a and b.

1. Discuss Flynn’s classification of Computer?
2. Draw the SIMD Architecture?
3. What are the programming Principles of SIMD Architecture?
4. Discuss the MIMD architecture
5. Explain about the Shared memory in SIMD?
6. Describe the message passing architecture of SIMD?
7. Explain about the SIMD Computer Performance ?
8. Discuss the Issues with SIMD?
9.10 Key Terms

z SISD: Single-instruction single-data streams.
z SIMD: Single-instruction multiple-data streams.
z MISD: Multiple-instruction single-data streams
z MIMD: Multiple-instruction multiple-data streams

Check Your Progress: Answers

1. (a) SISD
Notes
2. (c) includes many processing units under the supervision of a common control unit
4. (a) Multiple instruction multiple data
6. (d) Both a and b
7. (b) To provide best performance at low cost.
8. (a) Price/performance ratio.
9. (b) The speed of fetch and execution.
10. (d) Both a and b.

z S.S.Jadhav, Advanced Computer Architecture and Computing, Technical
Publications, 2009
z V. Rajaraman, T. Radhakrishnan, Computer Organization And Architecture, PHI
Learning Pvt.Ltd., 2007
z P. V. S. RAO, Computer System Architecture, PHI Learning Pvt. Ltd., 2008
z Sajjan G. Shiva , Advanced Computer Architectures, CRC Press. Copyright, 2005

VSLI Computing 137
Unit 10: VSLI Computing

Notes
Structure
10.1 Introduction
10.2 VLSI
10.3 Design
10.4 Applications of VLSI
10.5 Advantages of VLSI
10.6 VLSI and Systems
10.7 CMOS Technology
10.7.1 Power Consumption Constraints
10.7.2 Design and Testability
10.7.3 Testability as a Design Process
10.8 Integrated Circuit Design Techniques
10.9 Summary
10.12 Key Terms
Objectives
After studying this unit, you should be able to:
z Understand the VLSI Design
z Discuss the Applications of VLSI
z Understand the concept of CMOS Technology
z Discuss the Integrated Circuit Design Techniques
10.1 Introduction
The expansion of VLSI is ‘Very-Large-Scale-Integration’. Here, the term ‘Integration’
refers to the complexity of the Integrated circuitry (IC). An IC is a well-packaged
electronic circuit on a small piece of single crystal silicon measuring few mms by few
mms, comprising active devices, passive devices and their interconnections. The
technology of making ICs is known as ‘MICROELECTRONICS’.
This is because the size of the devices will be in the range of micro, sub
micrometers. The examples include basic gates to microprocessors, op-amps to
consumer electronic ICs. There is so much evolution taken place in the field of
Microelectronics, that the IC industry has the expertise of fabricating an IC successfully
with more than 100 million MOS transistors as of today. ICs are classified keeping many
parameters in mind. Based on the transistors count on the IC, ICs are classified as SSI,
MSI, LSI and VLSI. The minimum number of transistors on a VLSI IC is in excess of
40,000.

10.2 VLSI
Notes The expansion of VLSI is ‘Very-Large-Scale-Integration’. Here, the term ‘Integration’
refers to the complexity of the Integrated circuitry (IC). An IC is a well-packaged
electronic circuit on a small piece of single crystal silicon measuring few mms by few
mms, comprising active devices, passive devices and their interconnections. The
technology of making ICs is known as ‘MICROELECTRONICS’. This is because the
size of the devices will be in the range of micro, sub micrometers. The examples include
basic gates to microprocessors, op-amps to consumer electronic ICs. There is so much
evolution taken place in the field of Microelectronics, that the IC industry has the
expertise of fabricating an IC successfully with more than 100 million MOS transistors
as of today. ICs are classified keeping many parameters in mind. Based on the
transistors count on the IC, ICs are classified as SSI, MSI, LSI and VLSI. The minimum
number of transistors on a VLSI IC is in excess of 40,000.
VLSI is ‘Very Large Scale Integration’. It is the process of designing, verifying,
fabricating and testing of a VLSI IC or CHIP.A VLSI chip is an IC, which has transistors
in excess of 40,000. MOS and MOS technology alone is used. The active devices used
are CMOSFETs. The small piece of single crystal silicon that is used to build this IC is
called a ‘DIE’. The size of this die could be 1.5cmsx1.5cms. This die is a part of a bigger
circular silicon disc of diameter 30cms.This is called a ‘WAFR’. Using batch process,
where in 40 wafers are processed simultaneously, one can fabricate as many as 12,000
ICs in one fabrication cycle. Even if a low yield rate of 40% is considered you are liable
to get as many as 5000 good ICs. These could be complex and versatile ICs. These
could be PENTIUM Microprocessor ICs of INTEL, or DSP processors of TI, each
costing around Rs10,000. Thus you are likely to make Rs50 million (Rs5crore) out of
one process flow. So there is lot of money in VLSI industry. The initial investment to set
up a silicon fabrication unit (called ‘FAB’ in short and also called sometimes as silicon
foundry) runs into a few $Billion. In INDIA, we have only one silicon foundry-SCL at
Punjab (Semiconductor Complex Ltd., in Chandigarh). Very stringent and critical
requirements of power supply, cleanliness of the environment and purity of water are
the reasons as to why there are not many FABS in India.
Complexity
Producing a VLSI chip is an extremely complex task. It has number of design and
verification steps. Then the fabrication step follows. The complexity could be best
explained by what is known as ‘VLSI design funnel’ as shown in the Fig.1.1.
Figure 1.1: The VLSI Design Funnel To set up Facilities for VLSI One Needs a Lot
of Money
Then the design starts at a highest abstraction in designer’s mind as an initial idea.
Engineers using CAD tools further expand this idea. One should have good marketing
information also. Then all these are dumped inside the funnel along with a pile of sand
as a raw material to get the wonderful item called “the VLSI chip’.
10.3 Design
A state of art of VLSI IC will have tens of millions of transistors. One human mind
cannot assimilate all the information that is required to design and implement such

VSLI Computing 139
complex chip. A design team comprising hundreds of engineers, scientists and
technicians has to work on a modern VLSI project. It is important that each member of
the team has clear understanding of his or her part of the contribution for the design. Notes
This is accomplished by means of the design hierarchy. Any complex digital system
may be broken down into gates and memory elements by successively subdividingthe
system in a hierarchical manner. Highly automated and sophisticated CAD tools are
commercially available to achieve this decomposition. They take very high-level
descriptions of system behavior and convert them into a form that ultimately be used to
specify how a chip is manufactured. A specific set of abstractions will help in describing
the digital system, which is targeted for a VLSI chip. These are well depicted in the
Fig.1.2 in a Y-chart. In this figure three distinct domains are marked in three directions
in the form letter Y. These domains are Behavioral, Structural and Physical. The
behavioral domain specifies what a particular system does. The structural domain
specifies how entities are connected together to effect the prescribed behavior (or
function). The physical domain finally specifies how to actually build a structure that has
the required connectivity to implement the required functionality.
Figure 1.2: The Y-chart

Each design domain may be specified at a various levels of abstraction such as
circuit, logic and architectural. Concentric circles around the center indicate these levels
of abstraction. The design hierarchy is shown in the Fig.1.3 in the form of a flow.
Figure 1.3: VLSI Design Flow

The starting point of a VLSI design (of a chip) is the system specifications. The
specifications (specs) will provide design targets such as functions, speed, size, power
Notes dissipation, cost etc. This is the top level of the design hierarchy. These specifications
are used to create an abstract, high-level model. The modeling is usually done using a
high-level hardware description language (HDL), such as VHDL or VERILOG. You have
already done a course on digital system design using VHDL in the last semester.
Therefore this aspect of modeling a system will not dealt here. Using a HDL compiler
the functional simulation is done.
The next step in the flow is ‘SYNTHESIS’. Synthesiser is a CAD tool (software),
which generates a gate level net list (net list is the description of the logic gates and
their interconnections) using the VHDL code as an input. This forms the basis for
transferring the design to the electronic circuit level. In this level of design, the MOS
transistors are used as switches, and Boolean variables are treated as varying voltage
signals (‘0’ or ‘1’ for digital) in the circuit. Transistors cannot be decomposed further and
these are the components, which have to be siliconised. To make a transistor on silicon,
one should know the layout details of the same at various integration layer levels. This
corresponds to ‘physical design’. The physical design level constitutes the bottom of the
design hierarchy. After the design process, the VLSI design project moves on to the
manufacturing line.
The final result is a finished electronic VLSI chip.
Design Styles
The above said design process is called ‘top-down’ approach. That is because design
moves from the highest abstraction level to the lowest level of abstraction namely
transistor. The other design approach starts at the transistor (switch) level. Using
transistors logic gates are built. Using gates, smaller functional units such as adders,
registers and multipliers are built. These are used to build bigger blocks and the system.
This approach is known as ‘bottom up’. This is also called the ‘Full – custom’ approach.
This approach suits well for small projects. For most of the projects ‘semi-custom’
approach is used. Semi-custom is classified as below:
In standard cell based design, design details up synthesis will be same as top-down
approach. After synthesis, standard cells such as flip-flops, Registers and Multipliers
are borrowed from the library to fit (mapping) into the design. These cells would have
been custom designed and fully characterized for a given technology.
The design cycle time is shorter compared to full custom and the IC is cheaper.
Gate array based design makes use of the transistors’ array or the array of standard
gates. Wafers are pre-diffused have M x N array of transistors or Gates in silicon. It is
up to the designer to use whatever the number of transistors and connect them in
whatever the manner he needs.
This is to say that the only design, which is done in this style, is the design of the
interconnections, VDD, VSS buses and the communication paths. Masks have to be
prepared only for the above said metal layers. Therefore this approach has got a very
short design cycle time. Full custom design as already discussed, stats at the transistor
(switch) level. Here the logic style (CMOS, BiCMOS, CCMOS, DYNAMIC CMOS etc)
will be decided and the transistor layout details (W/L ratio) will be designed. So this is
sometimes called as ‘hand crafted design’. In this approach, silicon area is optimized
and the IC comes out to be of high density. The performance is also very high, and the
design is very flexible (can easily change). This approach is suitable for large and

VSLI Computing 141
complex designs. Circuit simulator such as SPICE (Simulation Package with IC
Emphasis) and the layout editor such as Magic or Lassi are used.
Notes
10.4 Applications of VLSI
Electronic systems now perform a wide variety of tasks in daily life. Electronic systems
in some cases have replaced mechanisms that operated mechanically, hydraulically, or
by other means; electronics are usually smaller, more flexible, and easier to service. In
other cases electronic systems have created totally new applications. Electronic
systems perform a variety of tasks, some of them visible, some more hidden: • Personal
entertainment systems such as portable MP3 players and DVD players perform
sophisticated algorithms with remarkably little energy.
z Electronic systems in cars operate stereo systems and displays; they also control
fuel injection systems, adjust suspensions to varying terrain, and perform the
control functions required for anti-lock braking (ABS) systems. • Digital electronics
compress and decompress video, even at high definition data rates, on-the-fly in
consumer electronics.
z Low-cost terminals for Web browsing still require sophisticated electronics, despite
their dedicated function.
z Personal computers and workstations provide word-processing, financial analysis,
and games. Computers include both central processing units (CPUs) and special-
purpose hardware for disk access, faster screen display, etc.
z Medical electronic systems measure bodily functions and perform complex
processing algorithms to warn about unusual conditions. The availability of these
complex systems, far from overwhelming consumers, only creates demand for even
more complex systems.
The growing sophistication of applications continually pushes the design and
manufacturing of integrated circuits and electronic systems to new levels of complexity.
And perhaps the most amazing characteristic of this collection of systems is its
variety—as systems become more complex, we build not a few general-purpose
computers but an ever wider range of special-purpose systems. Our ability to do so is a
testament to our growing mastery of both integrated circuit manufacturing and design,
but the increasing demands of customers continue to test the limits of design and
manufacturing.
10.5 Advantages of VLSI

While we will concentrate on integrated circuits in this book, the properties of integrated
circuits—what we can and cannot efficiently put in an integrated circuit—largely
determine the architecture of the entire system. Integrated circuits improve system
characteristics in several critical ways. ICs have three key advantages over digital
circuits built from discrete components:
z Size. Integrated circuits are much smaller—both transistors and wires are shrunk to
micrometer sizes, compared to the millimeter or centimeter scales of discrete
components. Small size leads to advantages in speed and power consumption,
since smaller components have smaller parasitic resistances, capacitances, and
inductances.
z Speed. Signals can be switched between logic 0 and logic 1 much quicker within a
chip than they can between chips. Communication within a chip can occur hundreds
of times faster than communication between chips on a printed circuit board. The
high speed of circuits on-chip is due to their small size—smaller components and
wires have smaller parasitic capacitances to slow down the signal.
z Power consumption. Logic operations within a chip also take much less power.
Once again, lower power consumption is largely due to the small size of circuits on
the chip—smaller parasitic capacitances and resistances require less power to drive
them.

10.6 VLSI and Systems

Notes These advantages of integrated circuits translate into advantages at the system level:
z Smaller physical size. Smallness is often an advantage in itself—consider portable
televisions or handheld cellular telephones.
z Lower power consumption. Replacing a handful of standard parts with a single chip
reduces total power consumption. Reducing power consumption has a ripple effect
on the rest of the system: a smaller, cheaper power supply can be used; since less
power consumption means less heat, a fan may no longer be necessary; a simpler
cabinet with less shielding for electromagnetic shielding may be feasible, too.
z Reduced cost. Reducing the number of components, the power supply
requirements, cabinet costs, and so on, will inevitably reduce system cost. The
ripple effect of integration is such that the cost of a system built from custom ICs
can be less, even though the individual ICs cost more than the standard parts they
replace.
Understanding why integrated circuit technology has such profound influence on the
design of digital systems requires understanding both the technology of IC
manufacturing and the economics of ICs and digital systems
10.7 CMOS Technology

CMOS is the dominant integrated circuit technology. In this section we will introduce
some basic concepts of CMOS to understand why it is so widespread and some of the
challenges introduced by the inherent characteristics of CMOS.
10.7.1 Power Consumption Constraints

The huge chips that can be fabricated today are possible only because of the relatively
tiny consumption of CMOS circuits. Power consumption is critical at the chip level
because much of the power is dissipated as heat, and chips have limited heat
dissipation capacity. Even if the system in which a chip is placed can supply large
amounts of power, most chips are packaged to dissipate fewer than 10 to 15 Watts of
power before they suffer permanent damage (though some chips dissipate well over 50
Watts thanks to special packaging). The power consumption of a logic circuit can, in the
worst case, limit the number transistors we can effectively put on a single chip.
Limiting the number of transistors per chip changes system design in several ways.
Most obviously, it increases the physical size of a system. Using high-powered circuits
also increases power supply and cooling requirements. A more subtle effect is caused
by the fact that the time required to transmit a signal between chips is much larger than
the time required to send the same signal between two transistors on the same chip; as
a result, some of the advantage of using a higher-speed circuit family is lost.
Another subtle effect of decreasing the level of integration is that the electrical
design of multi-chip systems is more complex:
microscopic wires on-chip exhibit parasitic resistance and capacitance, while
macroscopic wires between chips have capacitance and inductance, which can cause a
number of ringing effects that are much harder to analyze.
The close relationship between power consumption and heat makes low-power
design techniques important knowledge for every CMOS designer. Of course, low-
energy design is especially important in battery-operated systems like cellular
telephones. Energy, in contrast, must be saved by avoiding unnecessary work. We will
see throughout the rest of this book that minimizing power and energy consumption
requires careful attention to detail at every level of abstraction, from system architecture
down to layout.
As CMOS features become smaller, additional power consumption mechanisms
come into play. Traditional CMOS consumes power when signals change but consumes
VSLI Computing 143
only negligible power when idle. In modern CMOS, leakage mechanisms start to drain
current even when signals are idle. In the smallest geometry processes, leakage power
consumption can be larger than dynamic power consumption. We must introduce new Notes
design techniques to combat leakage power.
10.7.2 Design and Testability

Design verification Our ability to build large chips of unlimited variety introduces the
problem of checking whether those chips have been manufactured correctly. Designers
accept the need to verify or validate their designs to make sure that the circuits perform
the specified function. (Some people use the terms verification and validation
interchangeably; a finer distinction reserves verification for formal proofs of correctness,
leaving validation to mean any technique which increases confidence in correctness,
such as simulation.) Chip designs are simulated to ensure that the chip’s circuits
compute the proper functions to a sequence of inputs chosen to exercise the chip.
manufacturing test But each chip that comes off the manufacturing line must also
undergo manufacturing test—the chip must be exercised to demonstrate that no
manufacturing defects rendered the chip useless.
Because IC manufacturing tends to introduce certain types of defects and because
we want to minimize the time required to test each chip, we can’t just use the input
sequences created for design verification to perform manufacturing test.
Each chip must be designed to be fully and easily testable. Finding out that a chip is
bad only after you have plugged it into a system is annoying at best and dangerous at
worst. Customers are unlikely to keep using manufacturers who regularly supply bad
chips.
Defects introduced during manufacturing range from the catastrophic—
contamination that destroys every transistor on the wafer—to the subtle—a single
broken wire or a crystalline defect that kills only one transistor. While some bad chips
can be found very easily, each chip must be thoroughly tested to find even subtle flaws
that produce erroneous results only occasionally. Tests designed to exercise
functionality and expose design bugs don’t always uncover manufacturing defects.
We use fault models to identify potential manufacturing problems and determine
how they affect the chip’s operation. The most common fault model is stuck-at-0/1: the
defect causes a logic gate’s output to be always 0 (or 1), independent of the gate’s input
values. We can often determine whether a logic gate’s output is stuck even if we can’t
directly observe its outputs or control its inputs. We can generate a good set of
manufacturing tests for the chip by assuming each logic gate’s output is stuck at 0 (then
1) and finding an input to the chip which causes different outputs when the fault is
present or absent. (Both the stuck-at-0/1 fault model and the assumption that faults
occur only one at a time are simplifications, but they often are good enough to give
good rejection of faulty chips.)
10.7.3 Testability as a Design Process

Unfortunately, not all chip designs are equally testable. Some faults may require long
input sequences to expose; other faults may not be testable at all, even though they
cause chip malfunctions that aren’t covered by the fault model. Traditionally, chip
designers have ignored testability problems, leaving them to a separate test engineer
who must find a set of inputs to adequately test the chip. If the test engineer can’t
change the chip design to fix testability problems, his or her job becomes both difficult
and unpleasant. The result is often poorly tested chips whose manufacturing problems
are found only after the customer has plugged them into a system. Companies now
recognize that the only way to deliver high-quality chips to customers is to make the
chip designer responsible for testing, just as the designer is responsible for making the
chip run at the required speed. Testability problems can often be fixed easily early in the
design process at relatively little cost in area and performance.

But modern designers must understand testability requirements, analysis

techniques which identify hard-to-test sections of the design, and design techniques
Notes which improve testability.
10.8 Integrated Circuit Design Techniques

To make use of the flood of transistors given to us by Moore’s Law, we must design
large, complex chips quickly. The obstacle to making large chips work correctly is
complexity—many interesting ideas for chips have died in the swamp of details that
must be made correct before the chip actually works. Integrated circuit design is hard
because designers must juggle several different problems:
z Multiple levels of abstraction. IC design requires refining an idea through many
levels of detail. Starting from a specification of what the chip must do, the designer
must create an architecture which performs the required function, expand the
architecture into a logic design,
z Multiple and conflicting costs. In addition to drawing a design through many
levels of detail, the designer must also take into account costs—not dollar costs, but
criteria by which the quality of the design is judged. One critical cost is the speed at
which the chip runs. Two architectures that execute the same function
(multiplication, for example) may run at very different speeds. We will see that chip
area is another critical design cost: the cost of manufacturing a chip is exponentially
related to its area, and chips much larger than 1 cm2 cannot be manufactured at all.
Furthermore, if multiple cost criteria—such as area and speed requirements—must
be satisfied, many design decisions will improve one cost metric at the expense of
the other. Design is dominated by the process of balancing conflicting constraints.
z Short design time. In an ideal world, a designer would have time to contemplate
the effect of a design decision. We do not, however, live in an ideal world. Chips
which appear too late may make little or no money because competitors have
snatched market share. Therefore, designers are under pressure to design chips as
quickly as possible. Design time is especially tight in application-specific IC design,
where only a few weeks may be available to turn a concept into a working ASIC.
Designers have developed two techniques to eliminate unnecessary detail:
hierarchical design and design abstraction. Designers also make liberal use of
computer-aided design tools to analyze and synthesize the design.
10.9 Summary
Integrated circuit manufacturing is a key technology it makes possible a host of
important, useful new devices. ICs help us make better digital systems because they
are small, stingy with power, and cheap. However, the temptation to build ever more
complex systems by cramming more functions onto chips leads to an enormous design
problem.

Multiple Choice Questions
1. Identify the following type of serial communication error condition: “The prior
character that was received was not still read by the CPU and is over written by a
new received character.
(a) Framing error
(b) Parity error
(c) Overrun error
(d) Under-run error

VSLI Computing 145
2. What type of integration is chosen to fabricate Integrated Circuits like Counters,
multiplexers and Adders?
(a) Small Scale Integration (SSI)
Notes
(b) Medium Scale Integration (MSI)
(c) Large Scale Integration (LSI)
(d) Very Large Scale Integration (VLSI)
3. Determine the chip area for Large Scale Integration ICs.
(a) 1,00,000 mil2
(b) 10,000 mil2
(c) 1,60,000 mil2
(d) 16,000 mil2
4. Ultra Large Scale Integration are used in fabrication of
(a) 8-bit microprocessors, RAM, ROM
(b) 16 and 32- bit microprocessors
(c) Special processors and Smart sensors
(d) All of the mentioned
5. The concept of Integrated circuits was introduced at the beginning of 1960 by
(a) Texas instrument and Fairchild Semiconductor
(b) Bell telephone laboratories and Fair child Semiconductor
(c) Fairchild Semiconductor
(d) Texas instrument and Bell telephone Laboratories
6. How many gates per chip are used in first generation Integrated Circuits?
(a) 3-30
(b) 30-300
(c) 300-3000
(d) More than 3000
7. Find the chip area for a Medium Scale Integration IC?
(a) 8 mm3
(b) 4 mm2
(c) 64 mm3
(d) 16 mm2
8. The number of transistors used in Very Large Scale Integration is
(a) 107 transistors/chip
(b) 106 – 107 transistors/chip
(c) 203 – 105 transistors/chip
(d) 102 – 203 transistors/chip
9. Following IC chip integrates 100 thousands electronic components per chip
(a) SSI
(b) MSI
(c) LSI
(d) VLSI

10. IBM 7000 digital computer

(a) Belongs to second generation
Notes
(b) Uses VLSI
(c) Employs semiconductor memory
(d) Has modular constructions

1. Define VLSI Systems?
2. What are the Applications of VLSI ?
3. Give various advantages of Advantages of VLSI
4. Discuss CMOS Technology.
5. Explain the Design and Testability of VLSI ?
6. What are the Power consumption constraints of VLSI?
7. Discuss the Integrated Circuit Design Techniques .
8. Write Short Note on Design styles of VLSI?
10.12 Key Terms

z VLSI: It refers to the complexity of the Integrated circuitry (IC)
z CMOS: Complementary metal–oxide–semiconductor
z SSI : Small Scale Integration
z MSI: Medium Scale Integration
z LSI: Large Scale Integration
Check Your Progress: Answers

1. (c) Overrun error
2. (c) Medium Scale Integration (MSI)
3. (c) 1,60,000 mil2
4. (c) Special processors and Smart sensors
5. (a) Texas instrument and Fairchild Semiconductor
6. (a) 3-30
7. (d) 16 mm2
8. (c) 203 – 105 transistors/chip.
9. (c) LSI
10. (d) Has modular constructions

z S.S.Jadhav, Advanced Computer Architecture and Computing, Technical
Publications, 2009
z V. Rajaraman, T. Radhakrishnan, Computer Organization And Architecture, PHI
Learning Pvt.Ltd., 2007
z P. V. S. RAO, Computer System Architecture, PHI Learning Pvt. Ltd., 2008
z Sajjan G. Shiva , Advanced Computer Architectures, CRC Press. Copyright, 2005

Textual Learning Material - Module 5

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Textual Learning Material - Module 5

Uploaded by

Copyright:

Available Formats

128 Computer Architecture and Parallel Processing

Unit 9: Synchronous Parallel Processing

9.2 SIMD Architecture and Programming Principles

Amity Directorate of Distance & Online Education

Figure 9.1: SISD Architecture

Figure 9.2: SIMD Architecture

Amity Directorate of Distance & Online Education

9.3 SIMD Architecture

Figure 9.3: MIMD Architecture

Figure 9.4: SIMD Architecture Model

Amity Directorate of Distance & Online Education

9.4 MIMD Architecture

Figure 9.5: Two SIMD Schemes

Amity Directorate of Distance & Online Education

Figure 9.6: Shared Memory Versus Message Passing Architecture

9.5 SIMD Computer and Performance Enhancement

9.6 Issues with SIMD

9.8 Check Your Progress

Amity Directorate of Distance & Online Education

9.9 Questions and Exercises

9.10 Key Terms

Amity Directorate of Distance & Online Education

Check Your Progress: Answers

9.11 Further Readings

Amity Directorate of Distance & Online Education

Unit 10: VSLI Computing

Amity Directorate of Distance & Online Education

Amity Directorate of Distance & Online Education

Figure 1.2: The Y-chart

Figure 1.3: VLSI Design Flow

Amity Directorate of Distance & Online Education

Amity Directorate of Distance & Online Education

10.5 Advantages of VLSI

Amity Directorate of Distance & Online Education

10.6 VLSI and Systems

10.7 CMOS Technology

10.7.1 Power Consumption Constraints

10.7.2 Design and Testability

10.7.3 Testability as a Design Process

Amity Directorate of Distance & Online Education

But modern designers must understand testability requirements, analysis

10.8 Integrated Circuit Design Techniques

10.10 Check Your Progress

Amity Directorate of Distance & Online Education

Amity Directorate of Distance & Online Education

10. IBM 7000 digital computer

10.11 Questions and Exercises

10.12 Key Terms

Check Your Progress: Answers

10.13 Further Readings

Amity Directorate of Distance & Online Education

You might also like