Professional Documents
Culture Documents
A S R O: Eminar Eport N
A S R O: Eminar Eport N
March, 07
ACKNOWLEDGEMENT
At this moment I would like to thanks my guide Mr. H. C. Parmar. Without his
constant support this seminar would be difficult if not impossible. I am thankful to him
for guiding through the right path in Seminar.
i
ABSTRACT
When people ask this question, they are typically referring to the frequency of a
minuscule clock inside the computer, a crystal oscillator that sets the basic rhythm used
throughout the machine. In a computer with a speed of one gigahertz, for example, the
crystal "ticks" a billion times a second. Every action of the computer takes place in tiny
steps, each a billionth of a second long. A simple transfer of data may take only one step;
complex calculations may take many steps. All operations, however, must begin and end
according to the clock's timing signals.
The use of a central clock also creates problems. As speeds have increased,
distributing the timing signals has become more and more difficult. Present-day
transistors can process data so quickly that they can accomplish several steps in the time
that it takes a wire to carry a signal from one side of the chip to the other. Keeping the
rhythm identical in all parts of a large chip requires careful design and a great deal of
electrical power. Wouldn't it be nice to have an alternative?
For these reasons the clockless technology is considered as the technology which is
going to drive majority of electronic chips in the coming years.
ii
INDEX
2. CLOCKLESS TECHNIQUES 6
6. CONCLUSION 25
7 BIBLIOGRAPHY 26
8 ABBREVIATIONS 27
iii
LIST OF FIGURES
iv
1. INTRODUCTION
1.1 CONCEPT OF CLOCKS
The clock is a tiny crystal oscillator that resides in the heart of every
microprocessor chip. The clock is what which sets the basic rhythm used throughout the
machine. The clock orchestrates the synchronous dance of electrons that course through
the hundreds of millions of wires and transistors of a modern computer.
Such crystals which tick up to 2 billion times each second in the fastest of today’s
desktop personal computers, dictate the timing of every circuit in every one of the chips
that add, subtract, divide, multiply and move the ones and zeros that are the basic stuff of
the information age.
Conventional chips (synchronous) operate under the control of a central clock, which
samples data in the registers at precisely timed intervals. Computer chips of today are
synchronous: they contain a main clock, which controls the timing of the entire chips.
One advantage of a clock is that, the clock signals to the devices of the chip when to
input or output. This functionality of the synchronous design makes designing the chip
much easier. There are problems that go along with the clock, however.
Clock speeds are now in the gigahertz range and there is not much room for
speedup before physical realities start to complicate things. With a gigahertz clock
powering a chip, signals barely have enough time to make it across the chip before the
next clock tick. At this point, speedup up the clock frequency could become disastrous.
This is when a chip that is not constricted by clock speeds could become very valuable.
1
1.2 WORKING OF A SYNCHRONOUS CIRCUIT
The figure gives a clear idea of how conventional chips operate under the control
of a central clock, which samples data in the registers at precisely timed intervals. The
only thing the designers have to think about is how to complete one operation during a
single tick of the clock. It is extremely important to design the circuits in such a fashion
that all the computations must settle down and be ready for the next logical operation
before the next clock tick.
One problem is speed. A chip can only work as fast as its slowest
component. Therefore, if one part of the chip is especially slow, the other parts of the
chip are forced to sit idle. This wasted computing time is obviously detrimental to the
speed of the chip.
New problems with speeding up a clocked chip are just around the corner. Clock
frequencies are getting so fast that signals can barely cross the chip on one clock cycle.
2
When we get to the point where the clock cannot drive the entire chip, we'll be forced to
come up with a solution. One possible solution is a second clock, but this will incur
overhead and power consumption, so this is a poor solution. It is also important to note
that doubling the frequency of the clock does not double the chip speed, therefore blindly
trying to increase chip speed by increasing frequency without considering other options
is foolish.
The other major problem with a clocked design is power consumption. The
clock consumes more power than another other component of the chip. The most
disturbing thing about this is that the clock serves no direct computational use. A clock
does not perform operations on information; it simply orchestrates the computational
parts of the computer.
The natural solution to the above problems, as you may have guessed, is to
eliminate the source of these headaches: the clock.
By throwing out the clock, chipmakers will be able to escape from the problems
of the synchronous circuits. Clockless chips draw power only when there is useful work
to do, enabling a huge savings in battery-driven devices; an asynchronous-chip-based
pager marketed by Philips Electronics, for example, runs almost twice as long as
competitors' products, which use conventional clocked chips.
3
Like a team of horses that can only run as fast as its slowest member, a clocked
chip can run no faster than its most slothful piece of logic; the answer isn't guaranteed
until every part completes its work. By contrast, the transistors on an asynchronous chip
can swap information independently, without needing to wait for everything else. The
result? Instead of the entire chip running at the speed of its slowest components, it can
run at the average speed of all components. At both Intel and Sun, this approach has led
to prototype chips that run two to three times faster than comparable products using
conventional circuitry.
Another advantage of clockless chips is that they give off very low levels of
electromagnetic noise. The faster the clock, the more difficult it is to prevent a device
from interfering with other devices; dispensing with the clock all but eliminates this
problem.
As we can see above there is the usual logical circuitry and instead of a clock
signal, which controls the circuit, there are two lines on the top and bottom. The wires
4
are used to transfer the data bits and the control bits together. So there is no separate
control signal going across the circuit. The control signal is encoded within the data that
is being transferred. This control signals act as handshaking and handoff signals which
indicates when the component is ready for the next logical operation.
There are different ways to implement an asynchronous circuit. The next part is
about various types of implementation
5
2. CLOCKLESS TECHNIQUES
There are mainly three kinds of implementations of an asynchronous circuit.
They are the following.
6
2.1 BOUNDED DELAY
In the circuit we can see that, comparing with the general model, the circuit,
which introduces the prototype delay, acts as the completion detection circuit in bounded
delay method. That is, a component is considered to have finished its working when the
introduced delay is over.
But this kind of implementation has a disadvantage. Here we are assuming the
maximum time taken and this is introduced as the delay. So it is not possible to do early
completion even if the circuit doesn’t take the maximum time. So it is forced to wait
until the delay is over.
Contrary to the bounded delay method, which assumes bounds on time, the
delay-insensitive method doesn’t assume any bounds on time. Therefore communication
between independent components is essential. This is done with the help of handshake
and hand off signals. These signals indicate when the job of a component is over.
7
There are many ways in which a delay insensitive method can be done. The
most popular and efficient method is the “duel-rail encoding” method. In this method
separate channels are open for data and control signals. Signals of both the channels
together indicate the control and data signals.
In one method each signal X is encoded with two wires XH & XL. The encoding
scheme is shown below
As we see from the coding above, each wire in the logical circuit will now
need two wires to implement a duel-rail circuit. So the input will consist of a total of four
wires and the output will consist of two wires. Thus special kind of gates would be
required to implement the logic. The AND, OR & NOT gates are shown below.
8
2.3 NULL CONVENTIONAL LOGIC
NCL uses threshold gates with hysteresis: Threshold gates provide the basic building
block of NCL designs. Threshold gate inputs and outputs can be in one of two states,
DATA or NULL. A threshold gate starting with its output in a NULL state will remain in
the NULL state until the specified number of inputs is placed in the DATA state. Once
the gate reaches the DATA state, it remains in this state until all of the inputs return to
the NULL state. The hysteresis in the threshold gate provides the threshold needed to
keep from switching during the intermediate state when the number of inputs in the
DATA state is greater than zero, but less than the threshold limit. In addition, hysteresis
provides the storage to remain at DATA until all of the inputs have returned to NULL.
Since these gates use two values, as traditional Boolean logic does, they can be
constructed with traditional CMOS (or Bipolar) processes
9
2.3.1 M of N Threshold gates
For example a m of n threshold gate has out put high only if its m on total n inputs are
in data state (high) otherwise the output of gate is 0 or Null
Fig 2.3 A two of three threshold gate out put is high only it 2 of three gates are high.
10
3. HANDSHAKE PROTOCOLS & COMPONENTS
For proper handshake certain protocols are required. Depending on requirement the one
of following protocol can be used.
3.1 BUNDLED-DATA HANDSHAKING PROTOCOLS
This protocol is used in AMULET processor. They are called single-rail though bundled-
data is used to describe the simultaneous transmission of control and data signals,
whereas single-rail describes usage of one wire for each data bit.
The 2-phase bundled-data protocol uses a regular data path to transfer data and two
additional state signals for data send request and data receive acknowledgment. The 2-
phase bundled-data protocol is used in the AMULET3 processor and is often referred to
as called Micropipeline , which was developed by Ivan Sutherland.
To issue a transfer on the data bus the sender alternates the request signal from “0” ->
“1” or “1” -> “0” (phase 1) and when the receiver has read all the data from the bus
(which may take an arbitrary time) it confirms this by altering the acknowledge signal
the same way (phase 2). The sender has to guarantee that the data on the output is valid
and stable till the receiver alters the acknowledge signal (phase 2) and no new data can
11
be transfered until phase 2 has finished. Due to the fact that this protocol has very little
switching activity, it seems very efficient in both time and energy. But components
sensitive on transitions are more complex than elements, which just react to signal levels
In fact, this protocol is very good where high-speed is preferred over energy or space
efficiency.
To issue a transfer on the data bus, the sender alters the request signal from “0” -> “1”
(phase 1) and when the receiver has read all the data from the bus , it confirms this by
altering the acknowledge signal from “0” -> “1” (phase 2). In reaction to the
acknowledgment, the sender sets the request signal to “0” (phase 3) which is similarly
followed by the receiver setting the acknowledge signal to “0” (phase 4). The sender has
to guarantee that the data on its output is stable and valid until the sender lowers the
acknowledge signal again. Again, no new data can be transferred before the last phase
has finished. This protocol has more switching activity than the 2-phase protocol, which
at first sight may lead to slower and more energy consuming circuits, but
implementations sensitive to transitions are often more complex than those sensitive to
levels.
12
request signal, these protocols are completely insensitive to all wire delays with the
drawback of requiring 2n+1 wires to relay n data bits in contrast to n+2 bits for the
bundled-data protocols.
When issuing a data relay, the sender alters the bit-pair (n,m) from (0,0) to the
code word which represents the data bit (phase 1). When the receiver has valid code
words on all its input wire pairs, it sets acknowledge to “1” and absorbs the data (phase
2). The sender confirms this by altering the bit-pair (n,m) to the empty code word (0,0)
again (phase 3). When the receiver gets the empty code word on all its input wires, it sets
acknowledge to “0” (phase 4). At this point in time, new data may be relayed .
In this example the one-bit- wide sequence “0-1-1” is transmitted. It is obvious, that
only one data wire per pair is changed at a time. It is further Acknowledge shown that
after every valid code word an empty code word must be transmitted, which has a
negative impact on utilization.
13
Fig 3.5 4-phase dual-rail transmission of a one bit wide sequence “0-1-1”
Due to the low pipeline utilization this protocol is less adequate for high-speed circuits
but for low-power and very robust circuits with the drawback of needing nearly twice
the space of the bundled-data protocols.
Fig 3.6 2-phase dual-rail transmission of the one-bit-wide wide sequence “110100”.
As one can see there is no need for an empty code word compared to the 4-phased dual-
rail transmission. This leads to an optimal utilization of the pipeline. Again, the 2-phase
protocol seems to be faster and more efficient. This doesn’t necessarily lead to a higher
energy-efficiency, because the implementation of transition-sensitive elements is more
complex than that of level-sensitive elements. This protocol is best for high-speed, very
robust but less energy and space-efficient circuits.
14
while true do
if a == b then
y=a
else
y=y
end if
done
Fig 3.7 : Muller's-C Element and its Symbol
In other words: The Muller-C element only changes its output value if both inputs have
the same value, otherwise it retains its value. Using this element together with the
handshaking methods explained earlier leads to a very fundamental pipeline technique,
which solves the problem of only propagating valid data: The Muller pipeline .
Fig. 3.8 An asynchronous Muller pipeline with 4-phase bundled-data handshake and
combinatory parts
In the following example, a simple Muller pipeline with 4-phase bundled-data
handshake is shown (see figure 3). The bold parts are the “backbone” of the pipeline
consisting of Muller-C elements and inverters. The dashed boxes show the part
15
responsible for the handshake (wide, dotted box) and the computation (small, dotted
box). Without the parts in the small boxes this would represent a simple asynchronous
FIFO.
Assume that the first Muller-C element has a “1” at (a) and the first (not shown)
combinatorial function block has valid and stable data at its output. The first latch is
enabled by this signal (a), captures the data and propagates it to the next combinatorial
block. As shown, the request signal from (a) is delayed so that it takes at least as much
time as the critical path in the corresponding combinatorial block. As the request signal
arrives at the next Muller-C element (c), there are two possibilities: If the right most part
of the pipeline is free and it’s “Ack” signal therefore is “0” (g), the second latch will
immediately be enabled (d) to propagate the data. If the right most part is not available,
the second pipeline stage will be stalled until its successor becomes available. After this,
the middle part is finished and is ready to accept new data, which it propagates back to
its predecessor by setting its “Ack” to “0” (d) after (c) also got “0”.
Domino logic, named for the use of a transistor precharging phase and subsequent rapid
discharge, like toppling dominos, offers a higher-performance avenue into asynchronous
logic. It's delay-insensitive, precharges during the logic block handshake, and offers the
same efficiency as other clockless circuits .Its properties matches 4 phase handshake.
16
4. ARCHITECTURE OF CLOCKLESS CHIP
Several asynchronous processors are implemented in past. The most important of them
are:
17
Amulet contains 6-stage pipeline architecture and is based on 2-phase bounded
delay handshake method. Built in 1-µm CMOS technology. The pipelines used in amulet
was slightly modified version of muller pipline stated in Ch 3.
The address interface is responsible for issuing read and writes requests to
memory. It issues instruction prefetch requests autonomously and accepts data transfer
and branch target addresses from the execution unit as required. Branch target addresses
are immediately issued to memory and also change the prefetching stream to continue
from the target location; data transfer addresses temporarily interrupt the prefetching
stream, which resumes, once the data address has been issued. The ARM architecture
makes the program counter readily accessible to the programmer as register 15 in the
regis- ter bank. PC values are therefore copied from the address interface to the register
bank through a PC pipeline, which buffers the values until the associated instruction
arrives from memory.
18
streams with data dependencies between successive instructions and enables register read
and write processes to proceed asynchronously with- out arbitration and without risk of
metastability in the control and data circuits.
1. Half Adder
The half adder can be formed by NCL techniques. Here (10) represents 1 and
(01) represents 0, The o/p of 2 of 2-threshold gate is high only if its both I/p are high.
This gate is same as Muller's C element. Stage 1 represents the four cases depending on
case of i/p logic the o/p of any one gate is high. That gate sets the final logic level of O/p
19
Fig 4.2 Half Adder using NCL
20
5. PROS, CONS AND APPLICATIONS
5.1 PROS:
There are mainly four advantages of clockless design. They are,
Reduced power consumption.
High Performance Efficiency
Less electromagnetic noise
No Clock Skew
21
5.1.3 LESS ELECTROMAGNIC NOISE
When a clocked circuit is used in these types of devices the noise generated by
the large frequency of the clock interferes with the working frequency of the mobile
devices. In order to avoid errors caused by these noise signals, designers would not be
free to provide the scale of integration they wish.
Asynchronous systems produce less radio interference than synchronous machines.
Because a clocked system uses a fixed rhythm, it broadcasts a strong radio signal at its
operating frequency and at the harmonics of that frequency.
CONS:
5.2 LIMITATIONS OF ASYNCHRONOUS CIRCUITS
Design difficulties.
Lack of good tools.
Testing difficulties.
And of course, there is the basic obstacle that asynchronous design techniques
have been out of favor since the 1980s, and are therefore not typically taught in
22
universities. If a microprocessor design company today wanted to use asynchronous
logic, they would have to begin by training their engineering staff in the basics.
Smart cards
With its ultra-low power consumption, Handshake Technology was the natural choice
for a number of market-leading contactless and dual-interface smart card ICs. It has
given these products a real competitive edge, allowing larger memories and enhanced
features within the constraints of an extremely limited power supply.
23
Automotive systems
Already employed in a range of networking transceivers, Handshake Technology boasts
dramatically lowered electromagnetic emission and current peaks, simplifying on-chip
integration of digital and analog / RF components. This improved reliability and enabled
the creation of the low-cost integrated components required for drive-by-wire, control,
safety and entertainment applications.
Wireless applications
Handshake Technology is bringing the advantages of longer battery lifetimes to a
number of 900 MHz mobile phone as well as a wireless controller for a leading games
console. What’s more, by making it easier to integrate analog and RF components into
digital designs, Handshake Technology lets manufacturers create connected handheld
devices at attractive prices.
Multi-standard pager IC
Remember pagers? Well Handshake Technology was even used to improve their
capabilities. Because RF components can receive signals while Handshake Technology
circuits on the same chip are active, many functions could be implemented in software.
This allowed multiple standards to be handled by one low-cost, easily upgradable device.
Finally the asynchronous design may find it pathway to commercial PC but before that
Design has to go through long evolution. It is essential in that market to create an
efficient design that is reasonably priced.
24
6. CONCLUSION
Why isn’t it popular?
Should it be used?
My conclusion is an emphatic yes! Clocks are getting faster, while chips are
getting bigger, both of which make clock distribution harder. Chips are also becoming
more heterogeneous, with functions like memory and network interfaces being
considered, all of which complicates the global timing analysis necessary for a
synchronous design. Finally, we are entering an age when processors will be just about
everywhere, and this will require very low power designs. It’s just not practical to
expect a clean, skew-free clock for every (say) piece of clothing with a processing
element.
But this can only happen if more focus, especially at the university level, is given to
asynchronous design. Most of today’s designers don’t understand it well enough to use
it, and may even regard it with suspicion. It is certainly a challenge, but just as the
software community is moving towards more concurrency, the hardware community
must move to incorporate asynchronous logic.
25
7. BIBLIOGRAPHY
9 WEB SUPPORT
www.cs.virginia.edu/~robins/Computing_Without_Clocks.pdf
http://sit.iitkgp.ernet.in/~kss/clockless_chips_presentation.pdf
http://www.handshakesolutions.com
Efficiency of Asynchronous Processor - Michael Kauffmann
THE CPU MAGAZINE Computer Power User Article - Asynchronous Logic
http://www.cs.manchester.ac.uk/apt/publications/papers/async97_A2e.php
9 BOOK SUPPORT
26
8. ABBREVIATIONS
27
APPENDIX 1 CLOCKLESS ACHIEVEMENTS
INTEL Santa Clara, CA Clockless prototype in 1997 Stay current with clockless
ran three times faster than R&D.
the conventional-chip
equivalent, on half the
power
ASYNCHRONOUS Founded by students of Produce chips for cell
DIGITAL DESIGN Caltech's Alain Martin, who phones and other low-
Pasadena, CA developed the first power communications
asynchronous devices expected to
microprocessor. announce plans by year-
end.
28