Professional Documents
Culture Documents
Low Power Clock Distribution - II
Low Power Clock Distribution - II
Low Power Clock Distribution - II
• Tolerable skews are the maximum values of clock skew between each
pair of clock sinks with which the system can function correctly at the
desired frequency.
Derivation of Tolerable Skew
• Setup time is defined as the minimum amount
of time before the clock's active edge that the data
must be stable for it to be latched correctly. ... Hold
time is defined as the minimum amount of time after
the clock's active edge during which data must be
stable.
• The setup time is the interval before the clock where
the data must be held stable. The hold time is the
interval after the clock where the data must be held
stable. Hold time can be negative, which means the
data can change slightly before the clock edge and still
be properly captured.
• Figure 5.9 illustrates two cases of correct synchronous operations with
tolerable skews. In Figure 5.9(a), the clock arrives at CO2 later than the
previous
A Two-level Clock Distribution Scheme
• the tolerable skew between each pair of clock sinks is
well defined.
• In designing a low power system, tolerable skew instead
of minimum skew should be used during clock tree
construction
• Placing clock tree on a single metal layer reduces delays
and the attenuations caused by via's and decreases the
sensitivity to process induced wire or via variations.
• The clock wiring capacitance is also substantially
reduced.
• However, it is not always practical to embed the entire
clock tree on a single layer.
a two-level clock distribution scheme:
• Tolerable skew differs from one pair of clock sinks to another
as logic path delays vary from one combinational block to
another. The clock sinks that are close to each other and have
very small tolerable skews among them are grouped into
clusters.
• A global level clock tree connects the clock source to the
clusters and is routed on a single layer with the smallest RC
parameters by a planar routing algorithm.
• For clock sinks that are located close to each other, tolerable
skews among them can be easily satisfied.
• Little savings within a local cluster can be gained if the sinks
within the cluster have large tolerable skews.
• Local trees may be routed on multiple layers since the
total wiring capacitance inside each cluster is very
small and has less impact on total power.
• The tolerable skews between two clusters can be
determined from the smallest tolerable skew
between a clock sink in one cluster and a clock sink in
the other cluster.
• During clustering, the tolerable skews are maximized
between clusters.
• This will give the global level clock tree construction
more opportunity to reduce wire length, save buffer
sizes, and reduce power consumption since the global
level clock tree has much more impact on power.
Power reduction in two level clock distribution
• 1.Clock Gating
• 2.Reduced swing clock
• 3.Oscillator Circuit for Clock Generation
• 4.Frequency Division and Multiplication
• 5.reduce the capacitance of the clock signal
1.Clock Gating
• Clock gating, as depicted in Figure 6.1, is the most popular
method for power reduction of clock signals. When the clock
signal of a functional module (ALUs, memories, FPUs, etc.) is
not required for some extended period, a gating function is
used (typically NAND or NOR gate) to tum off the clock
feeding the module.
• the gating signal should be enabled and disabled at a much
slower rate compared to the clock frequency. Otherwise the
power required to drive the enable signal may outweigh the
power saving. Clock gating saves power by reducing
unnecessary clock activities inside the gated module.
The masking gate simply replaces one of the buffers in the clock
distribution tree. If the gating signal appears in a critical delay
path and degrades the overall speed, the designer can always
choose not to gate a particular module.
• Clock gating can significantly reduce the switching activity
in a circuit and on the clock nets; thus, it has been viewed
as one of the most effective logic, RTL and architectural
approaches to dynamic power minimization .
• Complex algorithms have been devised for calculating the
idle conditions of a circuit and for automatically inserting
the clock gating logic into the netlist
• Side effects of the clock-gating paradigm, such as its
impact on circuit testability, have been explored in details,
making this technology very mature also from the
industrial stand-point.
• As of today, most commercial EDA tools for power-driven
synthesis feature automatic clock-gating capabilities at
different levels of design abstraction
2. Reduced Swing Clock
• P = CV2 equation, the most attractive parameter to attack is the
voltage swing V due to the quadratic effect. Generally, it is difficult to
reduce the load capacitance or frequency of clock signals due to the
obvious performance reasons.
• in CMOS design, a clock signal is only connected to the gate of a
transistor when it reaches a sequential element. The clock signal is
seldom connected to the source or drain of a transistor. Inside a
sequential cell, the clock signal is used to tum on or tum off
transistors.
• Consider a 5V digital CMOS chip with an N-transistor threshold
voltage of O.8V. For a 5V regular full swing clock signal, an N-
transistor gated by the clock will tum on if the clock signal is above
O.8v.
• if the swing of the N-transistor clock signal can be limited from zero
to 2.5V (half swing), the on-off characteristics of all N transistors
remain digitally identical.
Similar observation can be made for the clock signal feeding a P-
transistor, where the swing is limited from 2.5V to 5V
The power saved from the reduced swing is 75% on the clock
signal. The penalty incurred is the reduced speed of the sequential
elements. The sequential delay, expressed in propagation delay
and setup hold time, is approximately doubled.
3.Oscillator Circuit for Clock Generation
• Clock less than 50MHz is generated using crystal oscillators.
• Actually crystal oscillators can easily go up to 10's of MHz.
Above that in most cases a PLL (Phase Locked Loop) is used.
• The frequency of this high-frequency oscillator is divided
by a suitable factor (dividing a signal by a power of 2 is easy
and totally accurate), and then compared to a let's say a 10
MHz oscillator. The comparison is used to adjust the high-
frequency oscillator. Thus a high frequency is made with
(almost) the accuracy of the lower frequency crystal
oscillator.
4.Frequency Division and Multiplication
• power reduction scheme that has been successfully applied is
frequency division and multiplication shown in Figure 6.6. This is
especially common for off-chip clock signals because they drive
very large capacitance.
• The off-chip clock signal runs at a slower speed and an on-chip
phase-locked loop circuit is used to multiply the frequency to
the desired rate.
• The slower signal also eases the off-chip signal distribution in
terms of electromagnetic interference and reliability.
• The frequency multiplier N is a trade-off between power
dissipation and the phase-locked loop circuit complexity.
• Larger values of N lead to better power dissipation but increases
the design complexity and performance of the phase-locked
loop circuit.
5.Reduce the capacitance of the clock signal