Professional Documents
Culture Documents
Messerschmitt Timing PDF
Messerschmitt Timing PDF
O C T O B E R I Y W
Abstract-In digital system design, synchronization ensures that op- tem design, and hence there are a numberof opportunities
erations occur in the logically correct order, and is a critical hctur in to apply digital communications principles to VLSI and
ensuring the correct and reliable system operation.As the physical size
of a system increases, or as the speed of operation increases, synchro-
digital system design. However, because of the wide dif-
nization plays an increasingly dominant role in the system design. Dig- ference between system physical sizes in relation to clock
ital communication has developed a number of techniques to deal with speeds, the design styles of the communities have devel-
synchroniLation on a global and even cosmic scale; and as the clock oped almost independently. Thisis in spite of the fact that
speeds of chip, hoard, and rwm-sized digital systems increase, they digital design is a necessary element of digital commu-
may benefit from similar techniques. Yet, the digital system and digital
communication communities have evolved synchronization techniques
nication design (for example, in modems, switches, etc.).
independently,choosing different techniquesand different terminol- In this paper. we place the synchronization problem and
ogy.Inthispaper. we attempt to presenta unified frameworkand design approaches in digitalcommunication and digital
terminology for synchronization design in digital systems, borrowing system design in a common framework, andthen examine
techniques and terminologies from both digital system and digital com- opportunities for cross-fertilization between the two fields.
munication design disciplines. We then compare the throughput of syn-
chronous and asynchronous interconnect, emphasizing how it is im-
In attaining the unification of design methodologies that
pacted by interconnect delay. Finally, we discus5 opportunities toapply we attempt in this paper. the first difficulty we face is the
principles long employed in digitalrommunications to the design of inconsistencies. and even contradictions, between terms
digital systems, with the goal of reducing this dependence on intercon- as used in thetwofields. For example, the term “self-
nect delay. timed” generally means “no independent clock,” but as
used in digital communication it means no clock at all (in
the sense that if a clock is needed it can be derived from
I . INTRODUCTION the data), and in digitalsystem design it indicatesthat
SYSTEM
lutely necessary, they should be applied with care. The
MODULE
essence of an abstraction i s that we are ignoring some de-
I tails of the underlying bchavior. which we hope are irrel-
evant to the operation, while emphasizing others that are
most critical to the operation. It should always be verified
that in fact the characteristics beingignored are in fact
irrelevant, and with considerable margin, or else the final
system may turn out to be inoperative or unreliable. This
~
is espccially true of synchronization, which is one of the
Fig. I . An example of some abstractions applied to digital system design most frequent causes of unreliable operation of a system.
In the following, we therefore highlight behaviors that arc
ignored or hidden by the abstraction.
straction relies on the essential features of the abstraction
level below, and hides unessential details from the higher A . Some Basic Synchronization Abstractions
level. This as illustrated for digital system design in Fig. 1 ) Boolean Signals: A Buolecrn signal (voltage or cur-
I . At the base of the design, we have the physical repre- rent)is assumed to represent, at cach time, one of two
sentation, where we have semiconductor materials, inter- possible levels. At the physical level, this signal is gen-
connectmetalization,etc., and we are very concerned erated by saturating circuits andbistablememory ele-
about the underlying physical laws that govern their prop- ments. There are a couple of underlyingbehaviorsthat
erties. Above this, we havethe circuit abstractions, where are deliberately ignored in this abstraction:$nite rise-time
we deal with circuit entities such as transistors, intercon- and the metastable behavior of memory elements.
nections, etc. Our descriptions of the circuit entities at- Finiterise-time behavior isillustrated in Fig. 2 for a
tempt to ignore the details of the physical laws underlying simple RC time constant. The deleterious effects of rise-
them, but rather characterize them in terms that empha- time can often be bypassed by the sampling of the signal.
size their operational properties, such as current-voltage In digital systems this is often accomplished using edge-
curves, transfer functions, ctc. Above the circuit abstrac- triggeredmemory elements. In digital communications,
tion is the element, which groups circuit entities to yield rise-time effects are often much more severe because of
slightly higher functions, such as flip-flops and gates. We the long distances traversed, and manifest themselves in
describe elements in terms that deal with their operational themore complex phenomenon of intersymbol interfer-
characteristics (timing diagrams, Boolean functions, etc.) ence [3]. In this case, one of several forms of equalization
but ignore the details of how they are implemented. Above can precede the sampling operation.
the element abstraction, we have the module (sometimes Metastability is an anomalous behavior of all bistable
called “macrocell”)abstraction, where elementsare devices, in which thedevicegets stuck in an unstable
grouped together to form more complex entitics (such as equilibrium midway bctwccn the two states for an inde-
memories, register files, arithmetic-logic units, etc.). Fi- terminate period of time [4]. Metastability is usually as-
nally, we group these modules together to form systems sociatedwith thesampling (usingan edge-triggered bi-
(like microcomputers). stable device) of a signal whose Boolean state can change
Many othersystcms of abstractionsarepossible,de- at any time, with the result that sampling at some point
pending on circumstances. For example, in digital com- very near the transitionwilloccasionally occur. Meta-
munication system protocols, we extend the abstractions stability is less sevcrc aproblem than rise-time in the sense
in Fig. 1, which carry us to the extreme of hardware de- that it happens only occasionally, but more severe in that
sign. to add various abstractions that hierarchically model the condition will persist for an indeterminate time (like
the logical(usuallysoftware-defined) operation of the a rise-time which is random in duration).
systcm(including many layers of protocols).Similarly, The Boolean abstraction is most valid when therise-
the computer industry dctincs other system abstractions time is very much shorter than the interval between tran-
such as instruction sets, operating system layers, etc. sitions,and metastability is avoided orcarefullycon-
In the present paper, we are concerned specifically with trolled. But the designer must be sure thattheseeffects
synchronization i n the design of digital systems and dig- are negligible, or unreliable system operation will result.
ital communications. In the context of digital systems. by 2) SignalTransitions: For purposes of synchroniza-
synchronization we mean the set of techniques used to en- tion we are often less concerned with the signal level than
sure that opcrations arc performed in the proper order. with the times at which the signal changes state. In digital
Thefollowingsubsections define someappropriatc ab- systems this time of change would be called an edge or
stractions for the synchronization design. While these ab- transition of the signal. Thc notion of the transition time
stractions are by no means new, perhaps this is the first ignoresrise-timeeffects. In fact,the transition time is
time that a systematictreatment of them has been at- subject to interpretation. For example, if we definethe
tempted.Thissystematictreatment gives us a common transition time as the instantthat the waveform crosses
baseofterminology forsynchronizationdesign, and i b someslicer level (usingterminology from digitalcom-
utilized in the following subsections. munication). then it will depend on the slicer level as il-
Whileabstractionsare very useful. and in fact abso- lustratcd in Fig. 3. Eventhisdefinition will fail if the
oolloclollllllolllooll~ioolllo,
- '3 12 11
VOLTAGE
TIME
where p ( t ) is a 50%' duty cycle pulse
stantaneous phase,
quite
significant in digital communications, especially where the
a chain of regenerative
SYNCHRONOUS
MESOCHRONWS
are plesiochronous(from
the
ANISOCHRONOUS
PLESIOCHRONOUS
HETEROCHRONOUS
F I R .5 . A taxonomy of aynchron~zationapproaches
4) S V J C ~ R V Z ~ WHaving
Z: carefully defincd some terms,
A$([) = - Ah)t + (4l(t) ~ +z(t)), (10)
we can now define some terminology related to the syn- where the first term increases (or decreases) linearly with
chronization of twosignals.A taxonomy of synchroni- time. For two plesiochronous signals, it cannot be pre-
zation possibilities is indicated in Fig. 5 [ 5 ] , 131. As pre- dicted which has the higher frequency.
viously mentioned,a Booleansignalcan be either iso- Finally, if two signals have nominally diferenr average
chronous oranisochronous.Giventwodatasignals, if frequencies. they are called heterochronous(from the
both are isochronous, the frequency offsets are the samc. Greek “hetcro” for “different”). Usually the toleranccs
and the instantaneous phase difference is 7ero. on frequency are chosen to guarantee that one signal will
have an actual frequency guaranteed higher than the other
Aq5(t) = 0 (8) (naturally the one with the higher nominal rate). For ex-
thenthey are said tobesynchronous(from the Greek ample, if they have nominal frequencies fl and where fr.
“syn” for “together”). Common examples would be a fi < f r , and thc worst-case frequency offsets are 1 Afl 1 5
Booleansignalthat is synchronous with its associated q l and I Afi 1 5 q 2 . then we would guarantee this condi-
clock (by definition), or two signals slaved to the same tion iffl and.f2 were chosen such that
clock at their point of generation. Any two signals that
are not synchronous are asyncl~ronous (or “not to-
fi + rll <f2 - v2. (11)
gether”). Some people would relax the definition of syn-
chronous signals to allow a nonzero phase difference that B. Additional Timing Abstractions in Digital Systems
is constant and known.
In practice, we have to deal with several distinct forms This section has covered some general considerations
of asynchrony. Any signal that is anisochronous will be in the modeling of Boolean signals from a synchroniza-
asynchronous to any other signals. except in thc special tion perspective. In this subsection we describe a couple
and unusual case that twoanisochronoussignals have of additional abstractions that are sometimes applied i n
identical transitions (this can happen if they are co-gen- digital system design, and represent simplifications due to
erated by the same circuit). Thus, anisochrony is a form the relatively small physical size of such systems.
of asynchrony . I ) Equipotential Region: Returning toFig. 1. it is
If two isochronous signals have exactly the same avcr- often assumed at the level of the element abstraction that
age frequencyf + A j , then they are called rnesockronous the signal is identical at all points along a given wire. The
(from the Greek “meso” for “middle”). For mesochro- largest region for which this is true is called (by Seitz 161)
nous signals. the fact that each of their phases is bounded the equipotential region. Like our other models, this is
per (4) ensures also that the phasedifference is also never strictly valid, but is useful if the actual time it takes
bounded, t o equalize the potential along a wire is small in relation
to otheraspects of the signals, such as the associated clock
Aq5(r) 5 24,n,,. (9) period or rise-time. Theequipotentialregion is a useful
a fact of considerable practical significance. Two signals concept in digital system design because of the relatively
generated from the same clock (even different phases of small size of such systems. Howevcr, the element dimen-
the same clock). but suffering indeterminate Interconnect sions for which it is valid is decreasingbecausc of in-
delays relative to one another, are mesochronous. creases in clock frequency with scaling, and a single chip
, LEL
,
1408 lEhL JOIIRNAI. ON SELECTED AREAS IN COMMUNICATIOUS. VOL. H. NO X. OCTOBER I W O
l o + 1, r Q + 1“ rQ+T+rp ro+T+f,
UNCERTAINTY CERTAINTY
PERIOD PERIOD
(‘2)
Fig. 7 . Synchronmng a computation CI by latching its input. (a) Areg-
lster R I at the input controls the starling time for the computation. (b)
A second register R2 at the output samplcs the output signal durlng the
certain(y period. (c) Timing diagram showing the uncertaintyand ccr-
tainty period (crosshatched) as a lunctlon of the setting time r,, the prop-
agation time /,,. and the clock period T.
Given the propagation and settling times, the certainty CLOCK PERIOD
period is shown as the crosshatched period in Fig. 7(c).
The region of time not crosshatched,during which the
signal may possibly be changing, is known as the unccr-
ruinry period. If two successive clock transitions come at
times to and ( t o + T ) , the certainty period starts at time
+
( t o t r ) , the time when the steady-state output due to the
computation starting at time f n is guaranteed, and ends at
+ +
time ( t o T f,,), which is the earliest that the output
can transition due to the new computation starting at time
( tn + T ) . The length of the certainty period is ( T + ti, -
1.:). An acceptable register clocking phase for R2 requires
that this period must be positive in length. or
__I( SElTUNGTlME
Optimistic Cuse: Forsimple topologies likeaone- Again, it should be emphasized that greater intercon-
dimensional pipeline, much higher throughput can bc ob- nect distance is possible if the clock skew and intercon-
tained by routing the signals and clocksin such a way that nect delay can be coordinated. which may be possible if
d and 6 can be coordinated with one another [8]. Assume the interconnect topology is simple as in one-dimensional
that the interconnect delay is known to be do with varia- pipeline. This statement applies atboth the chip and board
tion Ad, and the skew is chosen to be 6, with variation levels.
A s . Further assume that e and 6" are chosen most advan- 4 ) ParallelSignal Paths: An important practicalim-
tageously to maximize the throughput. Then the through- portance of pipeline registers is in synchronizing the sig-
put is bounded by nals on parallel paths. The transition phase offset between
theseparallelpaths tendsto increasethrough computa-
tional blocks and interconncct, and can be reduced by a
1 1
-< (20) pipeline register to the order of the clock skew across the
T ( I , - t,,) + 2(A6 + Ad)' bits of thismultibit register. Again. the register can be
viewed as reducing the size of the uncertainty region, in
which is a considerable improvement over (19) if the de- this case spatially as well as temporally.
lay andskewvariations are small.Thisanalysisshows In Section 111-A-I), we defined the total uncertainty re-
that the reliableoperationof the idealized synchronous gion for a collectionof parallel signals as theunion of the
interconnection of Section 111-A-I) can be extended to ac- uncertaintyregions for the individual signals. Fromthe
commodate interconnect delays and clock skew, even with preceding,the totalthroughput is thenbounded by the
variations of these parameters. albeit with some necessary reciprocal of the length of this aggregate uncertainty pe-
reduction in throughput. riod. In contrast, if each signal path from among the par-
To get a feeling for the numbers, consider a couple of allelpaths wcrc treatedindependently(sayusingthe
numerical examples. mesochronous techniques to be described later), the re-
Example 6: Consider thc pessimistic case. which would sulting throughput could in principle be increased to the
be typical of a digital system with an irregular intercon- reciprocal of the maximum of the individual uncertainty
nection topology that prevents easy coordination of inter- periods. For many practical cases, we would expect the
connect delay and clock skew. For agiven clock speed or longcst uncertainty period to include the otheruncertainty
throughput, we can determine from (19) the largest inter- periods assubsets, in which case these two bounds on
connect delay d,,,,,, that can be tolerated, namely, ( T - throughput would be equal; that is, there is no advantage
r , ) / 3 , assumingthattheinterconnectdelayandclock in treating the signals independently. The exception to this
skew are not coordinatedandassuming theworst-case rule would be where the uncertainty periods were largely
propagation delay, t,, = 0. Fora 100 MHz clockfre- nonoverlapping due to a relative skew between the paths
quency, a clock period of 10 ns and, assuming the settling that is larger than the uncertainty period for each path, in
time is 80% of thc clock period, the maximum intercon- which case therewould be considerableadvantage to
nect delay is 667 ps. The delay of a data or clock signal dealing with thc signals independently.
on a printed circuit board is on the order of 5-10 ps/mm
(ascompared to a free-space speed oflight of3.3 R. AnisochronousInterconnect
ps/mm).Themaximum intcrconnect distance is then
6.7-13.3 cm. Clearly, synchronous interconnect is not vi- Thesynchronousinterconncct approachusesisochro-
able on PC boards at this clock frequency under these pes- nous signals throughout the system, since all signals are
simistic assumptions. This also does not take into account slaved to an isochronous clock. A popular alternative to
the delay in passing through the pins o f a package, roughly synchronous interconnecthasbeen toabandonthe iso-
1-3 ns (for ECL or CMOS, respectively) due to capaci- chronous assumption, and further abandon the use of a
tive and inductive loading effects. Thus, we can see that global clock signal altogether. Rather, the elements of the
interconnect delays become a very serious limitation in system are chosen to fall within an equipotential region,
the board-level interconnection with 100 MHz clocks. and the interconnection between elements is designed to
0 operate i n a delay-insensitive manner: that is, opcratc re-
Example 7: On a chip, the interconnect delays are much liably regardless of what thc delays are. This is accom-
greater (about 90 ps/mm for AI-Si02-Si interconnect), plished by having each element of the system generate a
and are also somewhat variable due to dielectric and ca- cornplerion signal. which has a transition coincident with
pacitive processing variations. Given the same 667 ps in- the settling time of that element. The completion signal
terconnect delay, the maximum interconnect distance is is a sort of locally generated clock, and is used to syn-
now about 8 mm. (This is optimistic since it neglects the chronize the different elements. Since the completion sig-
delay due to source resistance and line capacitance-which nal depends on the settling time, which can be data-de-
will be dominant effects for relatively shortintercon- pendent, theresulting signalsareanisochronous. For
nects.) Thus, we see difficulties in using synchronous in- example, ifanALUhas two instructions with different
terconnect on a singlc chip for a complex and global in- settling times, then the signal frequencies will depend on
tcrconncct topology. 0 the mix of instructions, and will thus be anisochronous.
MESSERSCHMITT:SYNCHRONIZATION I N DIGITALSYSTEMDESIGN 1413
CLOCK Cl
1 1
ORIGINATING RECEIVING
MODULE MODULE
Limitations to throughput due to propagation delay are C1 and C2 areeithermesochronous, if they origi-
avoided in digital communications as shown in Fig. 14. nated from a common source. or they are independent. In
First,twocommunicatingmodulesareeach provided a thc lattcr casc, thcy are either plesiochronous or hetero-
clock-C1 and C2. A clock is also derived from the in- chronous.
comingsignal in the riming recovery circuit, and is de- The purpose of the FIFO (also called an elastic store)
noted C 3 . These clocks have the following relationships. is to adjust for the differences in phase, likely to be time
C3 is synchronous with the signal at the input to the varying. between C3 and C2. For mesochronous C1 and
FIFO, since it is derived from that signal. C2, this phase difference is guaranteed to be bounded, so
C3 is mesochronous to C I . since it hasthe same that the FIFO can be chosen with sufficient storage ca-
average frequency as dictated by the common signal but pacity to never overflow. For heterochronous CI and C2,
has an indeterminate phase due to the interconnect delay. wherethe average frequencyof C I is guaranteed to be
It can also have significant phase jitter due to a number lower than the average frequency of C2, the FIFO occu-
of effects in the long-distance transmission [3]. pancy will decrease with time, and henceno data will ever
MESSERSCHMITI'SYNCHROYIZhTION IN DIGITAL SYSTEM DtCIGN
be lost. For plesiochronous CI and C2, the FIFO could phase variation of any signal or clock arriving at a node
overflow, with thc loss of data. if the average frequcncy can be ignored. The primary cause of the residual phase
of C I happens to be higher than C2. This loss of data is modulation will be variations in temperature of the wires.
acceptable on an occasionalbasis, but may not be per- and this should occur at very slow rates and at most re-
missible in a digital system design. quiresinfrequcnt phaseadjustments.Thissimplification
impliesthat clocks need not be derived from incoming
I v . ISOCHRONOUS Ih.TERC0NNEC.I' IN DIGITAL SYSTEMS sign&, as i n the timing recovery of Fig. 14: but rather a
We found previously that the performance of both syn- distributed clock can be used as a reference with which to
chronous and anisochronous interconnects in digital sys- samplethe received signal.However, we must be pre-
tems are limited as a practical matter by the interconnect pared to adjust the phose of the clock used for sampling
delays in the system. With the anisochronous approach, the incoming signal to account for indeterminate intercon-
thislimitationwas fundamental to the use of roundtrip nect delays ofboth clock and signal. We thus arrivcat
handshaking to control synchronization. In the synchro- Fig.IS.
nous(but not anisochronous) case, we showed thatthis First, we divide the digital system into functional en-
limitation is not fundamental, but rather comes from the tities called "synchronous islands."Thegranularity of
inability to tightly control clock phases at synchronization partitioning intosynchronous islands is guided by the
points in the system. The reason is the "open-loop" na- principle that within an island the interconnect delays are
ture of theclock distribution. making us susceptibleto small relative to logic speed, and traditional synchronous
processing variations in delay. If we can more precisely designcan be used with minimal impact from intercon-
control clock phase using a "closed-loop'' approach. the nect delays. The maximum sizeof the synchronous island
throughput of the synchronous approach can more nearly depends to a large cxtent on the interconnect topology, as
approach the fundamental limit of (13), and considerably we have seen previously. In near-term technology, a syn-
exceed that of anisochronous interconnect in the presence chronous island might encompass a single chip within a
of significant intcrconnect delays. In this section, we ex- digital system and, for the longer term,a single chip may
plore some possibilities in that direction, borrowing tech- be partitioned into two or more synchronous islands. The
niques long used in digital communication. interconnection of a pair of synchronous islands is shown
in Fig. 15(a): externally, the connection is identical to the
A . Mesochronous Interconnect synchronous interconnection in Fig. 6. The differenceis
Consider the case where a signal has passed over an in the relaxed assumptions on interconnect delays.
interconnect and expericnced interconnect delay. The in- Themesochronouscase requires a more complicated
terconnect delay does not increase the uncertainty period, internal interface circuit, as illustrated in Fig. 15(bj. This
andthus does not place afundamental limitation on circuitperformsthe function ofmeJochronous-to-s?'n-
throughput. If this signal has a small uncertainty period, chronous conversion. similar in function but simpler than
as for example it has been resynchronized by a register, the FIFO in Fig. 14. This conversion rcquires the follow-
then the certainty period is likely to be a significant por- ing.
tion of the clock cycle, and the phase with which this sig- A clock phase generutor to create a set of discrete
nal is resarnpled by another register is not even very crit- phases of the clock. This does not require any circuitry
ical. Thc key is to avoid a sampling phase within the small running at speeds greater than the clock speed, but rather
uncertainty period, which in synchronous interconnect can can be accomplished using a circuit such as a ring oscil-
be ensuredonly by reducing thethroughput. But if the lator phase-locked to the external clock. A single phase
sampling phase can be controlled in closed-loop fashion, generator can be shared among synchronous islands (such
the interconnect delay should not be a factor, as demon- as one per chip), although to reduce the routing overhead
strated in digital communication systems. it can be duplicated on a per-island or even on a multiple
Another perspcctive on clock skew is that it results in generator per-island basis.
an indeterminate phase relationship betweenlocalclock A phase detector determines the certainty period of
and signal; in other words, the clock and signal are ac- the signal, for example, by estimating thephasediffer-
tually mesochronous. In mesochronous interconnect, we ence between a particular phase of the clock and the tran-
live with this indeterminate phase, rather than attempting sitions of the signal. Generally, one phase detector is re-
to circumvent it by careful control of interconnect delays quired for each signal line or group of signal lines with a
for clockand signal. This style of interconnect is illus- common source and destination and similarly routed.
tratedin Fig. 15. Variations on this methodwerepro- A sampling register R1 with sampling time chosen
posed some years ago [ 131 and pursued into actual chip well within the certainty period of thesignal,ascon-
realizations by a group at M.I.T. and BBN [14], [15] (al- trolled by thephasedetector.Thisregister reducesthe
though they did not use the term "mesochronous" to de- uncertainty period of the signal, for example. eliminating
scribe their technique). We have adapted our version of any finite rise-time effects due to the dispersionof the in-
this approachfromthe mesochronousapproach used terconnect, and also controls the phase of the uncertainty
worldwide in digital communication. except that in this period relative to the local clock. Depending on the effect
case we canmake thesimplifyingassumption thatthe of temperature variations in the system, this appropriate
1416 IEEF JOURNAL ON SFLECTEDAREAS IN COMMUNICATIONS. V O L X. N O X. OCTOBER IYYO
pt
INPUT CLOCK
1
PHASE GENERATOR
I
Qgq 1 -1
SYNCHRONOUS SYNCHRONOUS
SELECT
ISLAND
&
TO
SYNCHRONOUS
fR A W
DATA DATA
SYNCHRONIZER
(c) lREcoLb-<
FRAMINQ
phase may be chosen once as power-up, or may be ad- place constraints on the signal, such as requiring a mini-
justed infrequently if the phase variations are a significant mum number of transitions. In view of all these factors,
portion of a clock cycle. the phase detector is a challenging analog circuit design
A second register R2 resamples the signal using the problem.
clock phase used internally to the synchronous island. At
this point, the signal and thls internal clock arc synchro- B. Heterochmnous and Plesiochronous Intercnnncct
nous. By inference, all the signals at the input to the syn- Like synchronous interconnect, and unlike anisochro-
chronous island are also synchronous. nous interconnect,mesochronous interconnectrequires
If the system interconnect delays are larger than one distribution of aclockto all synchronousislands,al-
clock cycle, it may be convenient at the architectural lcvel though any interconnect delays in this distribution are not
to add an optional frame synchronizer, which is detailed critical. If the power consumption of this clock is of con-
in Fig. 15(c). The purpose of this block is to synchronize cern, in principle it would be possible to distribute a sub-
all incoming signal lines at the word-level (for example, multiple of the clock frequency, and phase-lock to it at
at byte boundaries) by adding a programmablc delay. This each synchronous island. But if the interconnect wires for
framing alao generally requires additional framing infor- clock distribution are of concern, they can be eliminated
mation added to the signal. as in the digital communication system of Fig. 14, where
The practicality of the mesochronous interconnection the timing recovery is now required. There are two useful
depends to a large extent vn the complexity of thc inter- cases.
connection circuit and the speed at which it can be imple- If clocks C1 and C2 are heterochronous, such that
mented. All the elements of thiscircuit are simple and C1 is guaranteed to have a lower frequency than C2, then
pretty standard in digital systems with the possible excep- the FIFO can bevery small and overflow can never occur.
tion of the phase detector. Thus, the practicality of me- However, the signal from the FIFO to the receiving mod-
sochronous interconnection hinges largely on innovative ule would have to contain “dummy bits,” inserted when-
approaches to implementing aphase detector. A phase de- ever necessary to prevent FIFO underflow, and a protocol
tector can be shared over a groupof signals with the same to signal where those bits occur.
origin and destination and same path length, and it can If clocks C1 and C2 arc plesiochronous, then over-
even be time-shared over multiple such functions sincewe flow of the FIFO can occur in the absence of flow control.
do not expect the phase to change rapidly. Nevertheless, Flow control would insert the “dummy bits” at the orig-
the phase detector should be realized in a small area, at inating modulewhenever necesaary to prevent FIFO
least competitive with anisochronoushandshaking cir- overflow. Flow control could be implemented with a re-
cuits. Further, i t should be fast, not restricting the clock verse handshaking signalthatsignaled theoriginating
rate of the interconnect. sincc the wholepoint is to obtain module. or if another interconnect link existed in the op-
themaximum speed.This should not be a problem o n posite direction, that could be used for this purpose.
slower out-of-package interconnections, but may be dif-
ficult toachieveon-chip. Any phasedetector design is V . ARCHITECTURALISSUES
also likely to display metastable properties, which must While synchronous interconnect is still the most com-
be minimized. The design of the phase detector may also mon, both anisochronous and mesochronous intercon-
MESSERSCHMITT SYNTHRONIZATION I N DlGlTAL SYSrEM DESIGN 1417
nects are more successful in abstracting the effects of in- information (beginning and end of message, etc.). and de-
terconnect delay and clock skew. Specifically. both design signthe architectureto use thisinformation.We have
styles ensure reliable operation independent of intercon- studied this problem in some detail in the context of in-
nect delay, without the global constraints of sensitivity to terprocessor communication [ 171.
clockdistribution phase. Each style of interconnecthas As previously mentioned, each data transfer in an an-
disadvantages. Mesochronous intcrconnect will have me- isochronous interconnect requires two to fourpropagation
tastabilityproblems and requires phase detectors, while delays (four in the case of the mostreliable four-phase
anisochronous intcrconnect requires extra
handshake handshake). In contrast, the mesochronousinterconnec-
wires,handshakecircuits, and a significantsilicon area tion does not have any feedback signals and is thus able
for completion-signal generation. to achieve whatever throughput can be achieved by the
The style of interconnection will substantially influence circuitry,independent of propagation delay.This logic
the associated processor architecturc. The converse is also appliestofeedforward-onlycommunications.The more
true-the architecture can be tailored to the interconnect interesting case is thecommand-response situation or.
style. As an example, if at the architectural level we can more generally, systems with feedback, since delay will
make a significant delay one stage of a pipeline, then the have a considerable impact on the performance of such
effects of thisdelay are substantially mitigated. (Con- systems. To some extcnt, theeffect of delay is fundamen-
sider, for example, (22) with f , = 0. ) tal and independent of interconnect style: the command-
Many of these issues have been discussed with respect response cycle time cannot bc smaller than the roundtrip
to anisochronous interconnection in [ 161. On the surface propagation delay. However, using the “delay build-out”
one might presume that mesochronous interconnection ar- frame synchronizer approach described earlier would have
chitectural issues are similar to synchronous interconncc- theundesirable effect of unnecessarilyincreasingthe
tion which. if true, would be a considerableadvantage command-response time of many interconnections in the
because of the long history and experience with synchron- system. Since this issue is an interesting point of contrast
ous design. However, the indeterminate delay in the in- between the anisochronous and mesochronous ap-
terconnection (measured in bit intervals at the synchron- proaches, we w i l l now discuss it in more detail.
ous output of the mesochronous-to-synchronous
conversion circuit) must be dealt with at the architectural A . Commund-Response Processing
level. For synchronous interconnection, normally the de-
lay between modules is guaranteedtobeless than one Suppose two synchronous islands are interconnected i n
clock period (that docs not have to be the case as illus- a bilateral fashion, where one requests data and the other
trated in Fig. S), but with mesochronous interconnection responds. An example of this situation would be a pro-
the delay can be multiples of a bit period. This has fun- cessor requesting data from a memory-the request is the
damental implications to the architecture. In particular, it address and the response is the data residing at that ad-
implies a highcr level of synchronization (usually called dress. Assuming for the momentthat there is no delay
“framing“ in digital communications) which line up sig- generating the response within the responding synchron-
nal word boundaries at computational and arithmeticunits. ous island, the command-response cycle can be modeled
In the course of addressing this issue, the following key simply as a delay by N clock cycles, corresponding to
question
must
probably answered:
be rouihly a delay of 2t,,, as illustrated
in Fig. 16.
It is intercsting thai this delay is precisely analogous to
Is there a maximum propagation delay that can be as-
the delay (measured in clock cycles) introduced by pipe-
sumed betwccn synchronous islands? If so, is this de-
lining-we can consider this command-response delay as
lay modest?
being- generated
- by N pipeline
~. registers. As in pipelining.
The answer to this question is most certainly “yes” in a this delay can be deleterious in reducing the throughput
given chip design, but difficult to answer for a chip de- ofthe processing, since the result of an action is not avail-
signed to be incorporated into a mesochronous board- or able for N clock cycles. One way of handling this delay
larger-level system. As an example of an architectural ap- is for the requesting synchronous island to go into a wait
proach suitable for worst-case propagation delay assump- state for N clock cycles after each request. The analogous
tions, we can include frame synchronizers like Fig. 15(c) approach in pipelining is to launch a new data sample each
which “build-out” eachinterconnection to some worst- N cycles. which is known as N-slow [Ig]. and the
casedelay. Everyinterconnectionthusbecomes worst- throughput will bc inversely proportional to N. This ap-
case and, more importantly, predictable in the design of proach is analogous to the anisochronous interconnection
the remainder of the architecture. However, this approach (which may require considerably more delay, such as four
is probably undesirahle for reasons that will soon he elab- propagation delays for the transfer in each direction).
orated. Another approach is to build into the architecture For the mesochronouscase,therearesomearchitec-
adjustment for the actual propagation delays, which can tural alternatives that can result in considerably improved
easily be detected at power-up. Yet another approach is performance under some circumstances, but only if this
to use techniquessimilarto packet switching in which issue is addressed at the architectural level. Some exam-
cach interconnection carries associated synchronization plcs of these include the following.
Synchronization is the most difficult issue faced in dig-
REQUEST RESPONSE
ital system interconnect viewed as a digital communica-
Fig. 16. A model for command-response in a mesochrunoua inlcrc(~nnec- tion problem, but there are some othertechniques that can
t1o11. The delay is wlth re\pect to c y c l e s of the common clock. be considcrcd. One of the moreinteresting is line code
design [3]. Electricalinterconnect displaysdispersive
propertieswhich cause intersymbol interference at high
If we have some forward-only communications coin-
cident with the command-response communications, we speeds. through both bandwidth constraints and microre-
flections. Multilcvcl linecodes would be aninteresting
can interleave these feedforward communications on the
possibility from the perspectiveof increasing the data rate
same lines.
If we have a set of N indcpcndcnt command-re- within a bandwidth constraint. Also, theline code can
sponsecommunications, we caninterleavethem on the constrain the number of transitions per unit time. easing.
interconnection. This is analogous to pipdirw interleav- for example, the design of phase detectors [20].
ing 1191 (which can make full use of the throughput ca- Equalization of the intersymbol interference is a wcll-
proven technology, but it is likely not practical at the high
pabilities of a pipelined proccssing clcmcnt provided that
we can decomposethe processinginto N independent speeds of digital system interconnect.
streams). If therespondingislandcannot accommodate
A P P ~ U IAX
this high a throughput, then it can be duplicated as many
times as necessary (in pipelining, an analogous cxamplc In this Appendix, we determine the certainty region for
would be memory interleaving). the parameters in Fig. IO. Consider a computation initi-
I f we cannot fully utilize the interconnection because ated by the clock at R1 at time to. Extending our earlier
of the propagation delay. then at least we should be able results, the conditions for reliable operation are now as
to allow each processing element to do uscful proccssing follows.
(at its full throughput) while awaiting the response. This The earliest the signal can change at R2 is after the
is analogous to what is sometimes done when a synchro- clock transition to + 6, or
nou5 processor interactsasynchronously with a slower to +6 < to + t/, + d + t . (23)
memory.
The last of these options would be available to an an- The latest time the signal has settled at R2 must be
isochronous system,sincethecommunication portion +
before the next clock transition to 6 + T , where T is
could be made aseparate pipeline stage.However, the the clock period,
first two options-two forms of interleaving of commu- to+t,+d+t<to+6+T. (24)
nications-are not available in anisochronous systems be- Simplified, these equations become
cause the tutu/ throughput is bounded by the propagation
delay. If the communication bottlenecks are taken into ac- 6 < t , + d + t (25)
count at the architectural level, it appears that mesochro- T+6>t,+d+t. (26)
nous interconnectionoffers considerable opportunity for
Together, ( 2 5 ) and (26) specify a certainty region for { 6 ,
improved performance.
T } where reliable operation is guaranteed.
With the aid of Fig. 17. we gain some interesting in-
VI. CONCLUS~ONS
sights. First, if the skew is sufficiently positive, reliablc
In this paper we have attempted to place the comparison operation cannot be guaranteed because the signal at R2
of digitalsystem synchronization on afirm theoretical might start to change before the clock transition. If the
foundation, and compare the fundamentallimitations of skew is negative, reliable operation is always guaranteed
thesynchronous,anisochronous, andInesochronous ap- if the clock period is large enough.
proaches. A firm conclusion is that interconnect delays Idealistic Case: The most advantageous choice for the
place a fundamcntallimitation on thecommunication skew is 6 = tl, + d + E atwhichpoint weget reliable
throughput for anisochronous interconnect(equal to the operation for T > f, ~ f,,. Choosing this skew, thefun-
reciprocal of two or four delays), this limitation does not damental limit of (13) would be achieved. This requires
exist for mesochronousinterconnect.Further, meso- precise knowledge of the interconnect delay d as well.
chronous interconnect can actually achieve pipelining in Pessimistic Case: In Fig. 17 we seethebeneficial ef-
the interconnect (as illustrated in Example 1 ) without ad- fect of E , because if it is large enough, reliable operation
ditionalpipeline registers,whereasanisochronouscan- is guaranteed for any 6,,. In particular, the condition is
not.Further,anisochronous requires extra interconnect t,, + d + F > or
wiresand completion signal generation.Thus, as clock
tp + d > 6,,, - t. (27)
speeds increase and interconnect delays become more im-
portant, mesochronous interconnect shows a great deal of Since the interconnect dclay d is always positive, (27) is
promise.However.theadvantages ofany synchroniza- guaranteed for any t,, and d so long as E > 6,,,, - t,,. Since
tion technique cannot be fully exploited without modifi- F also has the effect of increasing T. it is advantageous to
cations at the architectural level. E as small as possible; namely. t = (6,,,,, - r,j). Referring
MESSERSCHMIII SYNCHRONIZATION IN DIGITAL SYSIEM DESIGN 1419