Professional Documents
Culture Documents
Chapter 3
Chapter 3
CHAPTER 3
3.1 INTRODUCTION
The total capacitance load Cload at the inverter output consists of the
inverter transistor drain diffusion capacitance, total interconnect capacitance, and
the input gate oxide capacitance of the driven gates which are connected to the
output of the inverter. The switching power is the dominant component in power
dissipation in most digital CMOS circuits. The inverter's average switching
power dissipation can be determined from the energy needed to charge the output
node to VDD and discharge the total output load capacitance to ground. The
generalized expression for a CMOS logic gate's switching power dissipation can
be written as:
where, αT is the switching activity factor of the gate, Cload represents the total load
capacitance, VDD is the supply voltage, and fCLK represents the operating
frequency. Equation (3.1) indicates the supply voltage is the dominant factor in
the dissipation of the switching power. Thus, the most powerful method to reduce
power dissipation is to reduce the supply voltage.
When the gates are idle, static power is induced by leakage currents.
Theoretically, CMOS gates should not consume any power in this mode. It is
because either pull-down or pull-up networks are turned off, thus prevents
dissipation of static power. However, in reality, there is always some leakage
current passing through the transistors, indicating that the CMOS gates consume
a certain amount of power. Although the static power consumption associated
with an individual logic gate is extremely small, the overall effect becomes
significant when tens of millions of gates are used in today's integrated circuits.
Furthermore, as the size of the transistors decreases, the level of doping has to be
increased, causing leakage currents to increase.
43
During the system and clock design phases, clock enabling signals are
generally implemented by designers, where the interdependencies of the different
functions are well understood. On the other hand, defining such signals at the gate
level is very difficult, especially in control logic, as the interdependencies between
the states of different flip-flops depend on automatically synthesized logic.
information. It can be applied from the system level where the entire functional
unit can be selectively set to sleep mode, or from the combination / sequential
circuit level where some parts of the circuit are in sleep mode while the rest of the
block is operating.
But gating clock doesn't come for free. To generate the clock enabling
signals, extra logic and interconnections are needed, and consideration must be
given to the resulting area and power overhead. In the extreme case, it is possible
to disable each clock input of a flip-flop separately, resulting in maximum
separation of the clock. However, this leads to a high overhead. Thus, a group of
several flip-flops share the clock disabling circuit in an attempt to decrease the
overhead.
To decrease dynamic power efficiently, hardware designers need to
understand a variety of clock-gating transformations and have practical
experience in knowing when they should be applied. The trade-off between power
reduction and verification cost is not always clear, so designers tend to be cautious,
leaving the table with power savings. During hardware design, Power has become
a primary consideration. Dynamic power can contribute up to 50% of the total
dissipation of the power. To reduce dynamic power, clock gating is the most
common RTL optimization.
Designers can use a wide range of clock gating techniques. These are
obviously all of these are not equal when it comes to reduce switching activity.
Many transformations are easy, while others are patented algorithms that are
extremely guarded. At Register Transfer Level (RTL), most clock-gating is
performed. It is possible to group RTL clock-gating algorithms into three
categories: system-level, sequential and combinational. System-level clock-gating
stops the clock for a whole block, efficiently disabling all features. On the
contrary, when the block continues to produce output, combinational and
45
Since combinational clock gated flops retain one to one state mapping
with the initial RTL, it is possible to use combinational equivalence checking tools
for functional verification. This makes it easy to set up and thorough verification.
On the other hand, since switching activity is only eliminated if data is not
46
changed, the actual power savings are limited. Combination clock-gating can
decrease dynamic power by around 5-10 percent in typical designs.
In idle cycles, the flip-flop receiving the derived clock is not activated;
the corresponding dynamic power dissipation is therefore saved.
The flip-flop excitation function is triggered by the derived clock can
be simplified since it has a don’t care condition in the cycle when the
derived clock does not trigger the flip-flop.
RTL is the best factor to improve energy in the design process. There
is flexibility in performance to make significant changes in energy efficiency in
this factor in the design flow. Accurate data from features is available to reflect
the overall impact on energy, moment and area as well. What is needed is a good
49
RTL measurement to determine how well a design is clock private and helps to
identify candidate clock gating optimizations within the design.
Latch free clock gating is shown in Figure 3.5. This is one of the
simplest clock gating method in the design. It uses a simple "AND" or "OR" gate
in the sequential circuit to gate the clock signal. The "AND" gate is used for
sequential circuits operating on the negative edge trigger and for the circuits
operating on the positive edge clock “OR” gate is used. These gates (AND or OR)
must not change the waveform of the clock signal, they should simply switch on
or off the clock signal as per our requirement. This type of clock gating may lead
in set up or hold time violations. The issue here is if the enable signal is inactive
between the clock pulses, then the gated clock output may terminate early or the
gated clock signal may result in unequal widths. In the gated clock, glitches may
occur if the clock gating is not properly executed. For example, when the "AND"
gate is used for circuits operating on the positive edge of the clock pulse. A basic
AND or OR gate (depending on the edge on which flip-flops are triggered) directs
the clock signals to the registers for the latch-free clock gating type. The EX-OR
gate between the d input and the q output of the flip-flop has been used as the
enable signal for the clock gate. When the output of the flop is same as input,
which would be detected by EX-ORing the two, one can gate the clock to the clock
gate. For example, in the case of an AND clock gate, the EN signal (here the
enable signal, EN is represented by K1) must be stable during the rising edge of
the clock. Otherwise, the EN signal may cause the clock signal to be corrupted in
the register. Note that if EN only changes when the clock is low, this is not the
51
When input must pass directly to output, the latch is transparent and
closed otherwise. Latch-based design can deliver area and power savings over
register-based design at the cost of additional timing testing. Latches are also often
able to support a higher operating frequency when powered by a single global
clock. A latch's transparent mode allows a limited amount of delay balance
between adjacent combination paths, possibly overcoming the time limit imposed
by the longest single path. The single longest path is always limited to registers.
If clock skew scheduling is considered, this performance advantage over register-
based design tends to disappear.
53
Figure 3.7 shows the block diagram used for pipeline design.
The granularity of clock gating is an important consideration in implementing
clock gating for ASIC programmers. Clock gating is relatively easy to identify the
enable logic in its simplest form. Clock gating effect can be multiplied in a
pipeline system. If the inputs to one pipeline stage remain the same, then all
subsequent pipeline stages may also be frozen. Figure 3.7 shows the same clock
gating logic used for the gating of multiple pipeline stages. This is a multi-cycle
optimization with various tradeoffs in implementation and can save significant
power, typically decreasing switching activity by 15–25%.
Figure 3.9 shows Pipeline design with C17 benchmark circuit. This
circuit requires more power with number of LUT utilization is high and the
explanation is provided in section 3.7 of Figure 3.12.
(a) RTL schematic and Power summary of Latch free clock gating
Figure 3.10(b) shows the simulation result of Latch free clock gating.
57
Figure 3.10(c) shows the FPGA Floorplan Layout result of Latch free
clock gating.
From Figure 3.10(d) it is inferred that the number of LUT used here is
01, Flop Latch used is 01, and I/O count is 01.
(a) RTL schematic and power summary of Latch based clock gating
59
Figure 3.11(b) shows the simulation result for Latch based clock
gating. Here a level sensitive latch is added as a bit storage element.
Figure 3.11(c) shows the FPGA Floorplan Layout result of Latch based
clock gating.
60
From Figure 3.11(d) it is inferred that the number of LUT used here is
02, Flop Latch used is 02, and I/O count is 03.
Figure 3.12(b) shows the simulation result for pipeline circuit which is
based on C17 circuit.
62
Figure 3.12(c) shows the FPGA Floorplan Layout result of Pipeline circuit
Figure 3.12(d) shows the estimated resource utilization report of Pipeline circuit
63
From Figure 3.12(d) it is inferred that the number of LUT used here is
07, Flop Latch used is 04, and I/O count is 11.
and hence for Latch free design dynamic power estimated was 0.265W, Static
power estimated was 0.245W and Total On-chip power estimated was 0.51W.
From Figure 3.13(d) it is inferred that the number of LUT used here is
0, Flop Latch used is 01, and I/O count is 03.
Table 3.1 shows the comparison of power and number of LUTs for
Latch free clock gating, Latch based clock gating, pipeline circuit as well as
without clock gating.
From Table 3.1, it is inferred that the pipeline circuit consume more
power and the number of LUTs used for pipeline circuit is also high compare
to the other gating techniques. There is no extra logics involved in without
clock gating technique it consumes comparatively less power and area with
clock gating techniques.