Chapter 3

40
CHAPTER 3
CLOCK GATING LOGIC FOR LOGIC CIRCUITS
3.1 INTRODUCTION
In today's semiconductor designs, lower power consumption is

compulsory for mobile and handheld applications for longer battery life and even
networking or storage devices for low carbon footprint requirement. In CMOS
digital circuits, power dissipation is classified into two types: peak power and
time-averaged power consumption. Peak power is a reliability issue which
determines both the life and performance of the chip. The effects of voltage drop,
caused by excessive instantaneous current flowing through the resistive power
network, affect a design 's performance due to the increased gate and interconnect
delays. This high power consumption causes the system to overheat which reduces
the circuit’s reliability and lifespan. The noise margins are also reduced,
increasing the risk of chip failure due to crosstalk. In conventional CMOS digital
circuits, the time-averaged power consumption occurs in two forms: Dynamic and
Static. An overview of the different types of power dissipation is given in
Figure 3.1.
Figure 3.1 Different power dissipation types in CMOS circuits

41
Dynamic dissipation of power occurs in the logic gates that are

switching from one state to another. Any internal and external capacitance
associated with the transistors of the gate must be charged during this process,
thereby consuming power. Static power dissipation is linked to inactive logic
gates. Dynamic power is important during normal operation, particularly at high
operating frequencies, while static power, particularly for battery-powered
devices, is more important during standby.
Dynamic power dissipation
Dynamic power, mainly caused by current flow from parasitic

capacitance charging and discharging, consists of three components: switching
power, short-circuit power, and glitching power. The switching power in digital
CMOS circuits is dissipated as the current is drawn from the power supply to
charge up the capacitance of the output node. The output voltage node usually
makes a complete transition from 0 to VDD during this switching process, and
one-half of the energy from the power supply is dissipated as heat in the pMOS
transistors. The energy stored in the output capacitance during charge-up is
dissipated as heat in the conductive nMOS transistors when the voltage switch is
switched from VDD to 0. A CMOS circuit inverter, shown in Figure 3.2, is
presented to illustrate this dynamic dissipation of power during switching.
Figure 3.2 CMOS inverter for switching

42
The total capacitance load Cload at the inverter output consists of the
inverter transistor drain diffusion capacitance, total interconnect capacitance, and
the input gate oxide capacitance of the driven gates which are connected to the
output of the inverter. The switching power is the dominant component in power
dissipation in most digital CMOS circuits. The inverter's average switching
power dissipation can be determined from the energy needed to charge the output
node to VDD and discharge the total output load capacitance to ground. The
generalized expression for a CMOS logic gate's switching power dissipation can
be written as:
Pavg = αT . Cload .V²DD . fCLK (3.1)
where, αT is the switching activity factor of the gate, Cload represents the total load
capacitance, VDD is the supply voltage, and fCLK represents the operating
frequency. Equation (3.1) indicates the supply voltage is the dominant factor in
the dissipation of the switching power. Thus, the most powerful method to reduce
power dissipation is to reduce the supply voltage.
Static Power Dissipation
When the gates are idle, static power is induced by leakage currents.
Theoretically, CMOS gates should not consume any power in this mode. It is
because either pull-down or pull-up networks are turned off, thus prevents
dissipation of static power. However, in reality, there is always some leakage
current passing through the transistors, indicating that the CMOS gates consume
a certain amount of power. Although the static power consumption associated
with an individual logic gate is extremely small, the overall effect becomes
significant when tens of millions of gates are used in today's integrated circuits.
Furthermore, as the size of the transistors decreases, the level of doping has to be
increased, causing leakage currents to increase.
43
Clock power consumes 60-70% of the total chip power. It is therefore

very essential to reduce the power of the clock. Clock gating is a key power
reduction method that is used by many designers and is typically implemented by
power synthesis tools at the gate level.
Clock gating works by using clock signals effectively on sequential or

synchronous circuits, found mostly in computer processors. Clock gating is
typically implemented in the form of integrated clock gating cells. It handles the
clock tree in a way that uses fewer parts of the circuitry, resulting in reduced flip-
flop switches. This leads to earlier incurred power savings by switching flip flop
states. As it replaces muxes with clock gating logic, it also leads to less die area.
Several methods have been developed to decrease dynamic power, of

which clock gating is predominant. Usually, when a logic unit is clocked, the clock
signal is received by its underlying sequential components regardless of whether
or not they will toggle over the next cycle.
During the system and clock design phases, clock enabling signals are
generally implemented by designers, where the interdependencies of the different
functions are well understood. On the other hand, defining such signals at the gate
level is very difficult, especially in control logic, as the interdependencies between
the states of different flip-flops depend on automatically synthesized logic.
There is a large gap between block disabling driven by the HDL

definitions and what can be achieved with data knowledge about the activities of
flip-flops and how they are related to each other.
The clock gating method has been designed to prevent unnecessary

power consumption, such as the power wasted during the time when the system is
idle by timing components. Clock gating means disabling the clock signal
especially for flip-flops when the input information does not change the stored
44
information. It can be applied from the system level where the entire functional
unit can be selectively set to sleep mode, or from the combination / sequential
circuit level where some parts of the circuit are in sleep mode while the rest of the
block is operating.
But gating clock doesn't come for free. To generate the clock enabling
signals, extra logic and interconnections are needed, and consideration must be
given to the resulting area and power overhead. In the extreme case, it is possible
to disable each clock input of a flip-flop separately, resulting in maximum
separation of the clock. However, this leads to a high overhead. Thus, a group of
several flip-flops share the clock disabling circuit in an attempt to decrease the
overhead.
To decrease dynamic power efficiently, hardware designers need to
understand a variety of clock-gating transformations and have practical
experience in knowing when they should be applied. The trade-off between power
reduction and verification cost is not always clear, so designers tend to be cautious,
leaving the table with power savings. During hardware design, Power has become
a primary consideration. Dynamic power can contribute up to 50% of the total
dissipation of the power. To reduce dynamic power, clock gating is the most
common RTL optimization.
Designers can use a wide range of clock gating techniques. These are
obviously all of these are not equal when it comes to reduce switching activity.
Many transformations are easy, while others are patented algorithms that are
extremely guarded. At Register Transfer Level (RTL), most clock-gating is
performed. It is possible to group RTL clock-gating algorithms into three
categories: system-level, sequential and combinational. System-level clock-gating
stops the clock for a whole block, efficiently disabling all features. On the
contrary, when the block continues to produce output, combinational and
45
sequential clock-gating selectively suspend clocking. The main objective of this

work is to obtain less power and to minimize delay in the circuit.
3.2 COMBINATIONAL CLOCK GATING
Combinational clock-gating is a simple replacement of the RTL code.

It decreases power when the output does not change by disabling the clock on the
registers. It is possible to find opportunities for combination clock gating by
searching for conditional assignments in the code. When code like "if (cond) out
< = in" is present, clock-gating logic is substituted. See Figure 3.3 Combinational
clock-gating in the RTL compilers is now a feature. Power aware synthesis tools
recognize patterns for RTL coding and build the suitable alternative. Hardware
developers only need to understand some easy guidelines for RTL coding to
obtain the advantages of combination clock gating.
Figure 3.3 Combinational clock gating
Since combinational clock gated flops retain one to one state mapping
with the initial RTL, it is possible to use combinational equivalence checking tools
for functional verification. This makes it easy to set up and thorough verification.
On the other hand, since switching activity is only eliminated if data is not
46
changed, the actual power savings are limited. Combination clock-gating can
decrease dynamic power by around 5-10 percent in typical designs.
3.3 SEQUENTIAL CLOCK GATING
Sequential clock gating changes the micro-architecture of the RTL

without affecting functionally of the design. Power is optimized by the
identification in the original code of unused computations, data-dependent
functions and don't-care cycles. There are many types of clock-gating sequential
transformations. Sequential clock-gating possibilities are hard to identify,
requiring sequential analysis. One example of sequential optimization is to turn
off subsequent stages of the pipeline based on a propagated valid condition. Due
to the extra logic, this transformation makes sense only if there are various bits
wide in the data path. Sequential clock gating is shown in Figure 3.4.
Figure 3.4 Sequential clock gating

47
Sequential clock-gating is a multi-cycle optimization with numerous

tradeoffs in implementation and modifications in RTL. As a result, there is a
higher demand on functional verification resources. On the other hand, sequential
clock-gating can save considerable power, which typically reduces switching
activity by 15-25%.
As sequential optimizations alter the design state, it is not possible to

use Combinational Equivalence Checking Tools for verification purposes. In
Sequential Equivalence Checking (SEC), this is not the case. SEC tools can
thoroughly check sequential changes to RTL such as clock gating. Sequential
circuits in a system are considered to be significant contributors to power
dissipation since one input of sequential circuits is the clock, that is the only signal
switching all the time. The clock signal, additionally, tends to be heavily loaded.
It is necessary to build a clock network (often a clock tree) with clock buffers to
distribute the clock and control the clock skew. All of this adds to the clock net
capacitance. Recent studies show that clock signals in digital computers consume
a huge percentage of system power (15 percent-45 percent). Thus, the circuit
power can be significantly decreased by decreasing the clock power dissipation.
Most clock power reduction efforts have largely focused on major

issues such as decreased voltage swings, clock routing and buffer insertion.
Switching the clock in many cases causes a lot of unnecessary gate activity.
Therefore, circuits with controllable clocks are being developed. This implies that
other clocks are derived from the master clock that can be slowed down or
completely stopped with respect to the master clock, depending on certain
conditions. Obviously, because of the following factors this scheme results in
power savings:
 Master clock load is reduced and the number of buffers required in the
clock tree is decreased. Therefore, it is possible to reduce the power
dissipation of the clock tree.
48
 In idle cycles, the flip-flop receiving the derived clock is not activated;
the corresponding dynamic power dissipation is therefore saved.
 The flip-flop excitation function is triggered by the derived clock can
be simplified since it has a don’t care condition in the cycle when the
derived clock does not trigger the flip-flop.
3.4 SYSTEM LEVEL CLOCK GATING
Clock gating at system level is designed into the original hardware

architecture and coded as part of the RTL functionality. For example, sleep modes
in a cell phone may disable the display, keyboard or radio strategically depending
on the present operating mode of the devices. Clock-gating at the system level
shuts off all RTL blocks. Because for many cycles large sections of logic do not
switch, it has the greatest ability to save power. On the other hand, these changes
are part of the function of design. The enable logic is part of an overall strategy of
power management and sometime includes software control consideration. In the
system-level test plan, verification of system-level power optimizations must be
considered.
Most hardware engineers know how to write RTL so that synthesis

tools can identify and automate combinational clock gating. Likewise, Hardware
architects also recognize and build opportunities for system-level clock gating.
Even with these optimizations, if designers know the cost / reward tradeoffs of
sequential clock gating, there are significant dynamic power saving possibilities
remaining in the RTL.
3.5 CLOCK GATING EFFICIENCY
RTL is the best factor to improve energy in the design process. There
is flexibility in performance to make significant changes in energy efficiency in
this factor in the design flow. Accurate data from features is available to reflect
the overall impact on energy, moment and area as well. What is needed is a good
49
RTL measurement to determine how well a design is clock private and helps to
identify candidate clock gating optimizations within the design.
A typical measurement used to evaluate clock gating efficiency is the

number of signs up in the design that are clock private. While this provides
designers an indication of the number of clock-gated signs up in the overall design,
has poor connection to real power savings. That's because a powerful intake of
energy depends on the amount of the toggle. On the other hand, clock-gating
efficiency views the amount of the toggle, making it a more telling signal of a real
powerful energy intake.
The performance of clock-gating is defined as the amount of energy

and effort a sign-up is private for a given stimulus or action that changes. Typical
clock-gating performance can be calculated as the periodic performance of all
clock-gating depends on representative change of action. There are several factors
in the implementation of clock gating. The permit indication should remain
constant when the clock is high and can only be switched when the clock is in low
stage. It should be turned on in efforts to guarantee after the gated-clock and
glitches on the private clock should be avoided.
3.6 SIMPLE SEQUENTIAL CIRCUITS
3.6.1 Latch Free Design
Depending on the benefit on which flip-flops are triggered, the

latch-free clock gating uses an efficient AND / OR gate. Here if allow indication
goes non-active in between clock beat or if it many clocks then private clock
outcome either can cancel ahead of clock or produce several clock impulses.
This constraint makes the latch-free clock gating design inappropriate for our
single-clock flip-flop centered design.
50
Figure 3.5 Latch free clock gating
Latch free clock gating is shown in Figure 3.5. This is one of the
simplest clock gating method in the design. It uses a simple "AND" or "OR" gate
in the sequential circuit to gate the clock signal. The "AND" gate is used for
sequential circuits operating on the negative edge trigger and for the circuits
operating on the positive edge clock “OR” gate is used. These gates (AND or OR)
must not change the waveform of the clock signal, they should simply switch on
or off the clock signal as per our requirement. This type of clock gating may lead
in set up or hold time violations. The issue here is if the enable signal is inactive
between the clock pulses, then the gated clock output may terminate early or the
gated clock signal may result in unequal widths. In the gated clock, glitches may
occur if the clock gating is not properly executed. For example, when the "AND"
gate is used for circuits operating on the positive edge of the clock pulse. A basic
AND or OR gate (depending on the edge on which flip-flops are triggered) directs
the clock signals to the registers for the latch-free clock gating type. The EX-OR
gate between the d input and the q output of the flip-flop has been used as the
enable signal for the clock gate. When the output of the flop is same as input,
which would be detected by EX-ORing the two, one can gate the clock to the clock
gate. For example, in the case of an AND clock gate, the EN signal (here the
enable signal, EN is represented by K1) must be stable during the rising edge of
the clock. Otherwise, the EN signal may cause the clock signal to be corrupted in
the register. Note that if EN only changes when the clock is low, this is not the
51
case. In practice, for positive edge-triggered registers where EN is refused, OR

gate is used for latch-free clock gating. If the EN computation is finished during
the rising edge of the clock, the correct behavior is kept free of glitches. Note that
the combinational delay of EN should be less than half clock cycle.
3.6.2 Latch based Design
The latch-based clock gating style adds a level-sensitive latch to the

model to hold the enabled signal from the active edge of the clock to the inactive
edge of the clock, making the circuit itself unnecessary to satisfy that requirement.
Since the latch captures the enable signal state and hold it until the entire clock
pulse is generated, the enable signal only needs to be stable around the clock's
rising edge, as in the traditional ungated design.
Latch-based clock gating style consisting of a latch and an AND gate

can also avoid glitches. The EN signal is propagated to the input of the AND gate
at the falling edge of the clock signal and then the level-sensitive latch can hold
the enable signal when clock is high. Different from the OR clock gating, in the
latch-based clock gating, the combinational part of EN can use the full clock cycle.
Latch-based clock gating is adopted for power Optimization since EN can be
extracted from anywhere in the circuit.
Figure 3.6 Latch based clock gating

52
Latch based clock gating is shown in Figure 3.6. Latch-based design

can save area and power, but its theoretical efficiency after clock scheduling is no
better than that of register-based design. Under the traditional deterministic timing
model, the maximum mean delay of any cycle in the circuit limits the optimal
period in both latch and register-based designs. However, latch-based design is
often dramatically more tolerant of variation when considering process variation,
resulting in better output performance and/or allowing more aggressive clocking
than the equivalent register design. This effect cannot be observed using
traditional deterministic timing analysis and requires a probabilistic timing model
to quantify. By analyzing several benchmark circuits and demonstrating that using
registers, manufacturing a latch-based design will result in 4 times fewer errors
than the equivalent design.
Usually, one of two devices are used to implement synchronization and

state storage in a sequential design: edge triggered registers or level-sensitive
latches. An ideal edge-triggered component will only propagate the value from its
input to its output at the time of each clock cycle when the clock input is rising
(or falling, depending on the implementation) and maintain the output value at all
other times. An ideal level-sensitive device will propagate the value from its input
to its output whenever the input of the clock is high (or low) and retain the output
value at all other times.
When input must pass directly to output, the latch is transparent and
closed otherwise. Latch-based design can deliver area and power savings over
register-based design at the cost of additional timing testing. Latches are also often
able to support a higher operating frequency when powered by a single global
clock. A latch's transparent mode allows a limited amount of delay balance
between adjacent combination paths, possibly overcoming the time limit imposed
by the longest single path. The single longest path is always limited to registers.
If clock skew scheduling is considered, this performance advantage over register-
based design tends to disappear.
53
3.6.3 Pipeline Circuit
Clock gating works by recognizing groups of flip-flops that share a

common enable signal (which suggests that a new value should be clocked into
the flip-flops). This enable signal is ANDed with the clock to generate the gated
clock, which is fed to the clock ports of all of the flip-flops that had the common
enable signal. The sel signal encodes whether the latch retains its earlier value,
or takes a new input. This sel signal is ANDed with the clock signal to generate
the gated clock for the latch. This transformation preserves the functional
correctness of the circuit, and therefore does not increase the burden of
verification. This simple transformation can decrease a synchronous circuit's
dynamic power by 5-10%.
There are several considerations in implementing clock gating. First,

when the clock is high, the enable signal should remain stable and can only change
when the clock is in low phase. Second, in order to guarantee correct functioning
of the logic implementation after the gated clock, it should be turned on in time
and glitches on the gated clock should be avoided. Third, additional clock skew
may result from the AND gate. The clock skew could be important and needs to
be taken into careful consideration for high-performance design with short-clock
cycle time.
Figure 3.7 Block diagram used for pipeline design

54
Figure 3.7 shows the block diagram used for pipeline design.
The granularity of clock gating is an important consideration in implementing
clock gating for ASIC programmers. Clock gating is relatively easy to identify the
enable logic in its simplest form. Clock gating effect can be multiplied in a
pipeline system. If the inputs to one pipeline stage remain the same, then all
subsequent pipeline stages may also be frozen. Figure 3.7 shows the same clock
gating logic used for the gating of multiple pipeline stages. This is a multi-cycle
optimization with various tradeoffs in implementation and can save significant
power, typically decreasing switching activity by 15–25%.
Figure 3.8 C17 benchmark circuit
This is a small benchmark circuit comprised of 6 NAND gates.

Although the circuit is not large it does provide an interesting example that is easy
to hand check. Here Combinational logic is replaced by C17 testing circuit in
Figure 3.7 which is the circuit referred in Figure 3.8. C17 has two outputs in which
one output is given as d flip-flop input another as next stage d flip-flop input
whereas last d flip-flop alternate output is used as select (sel) input to next stage
this process repeats. By the manner, since the rule clock provided to all stage is
the same, this is AND to the select (sel) input.
55
Figure 3.9 Pipeline design with C17 benchmark circuit
Figure 3.9 shows Pipeline design with C17 benchmark circuit. This
circuit requires more power with number of LUT utilization is high and the
explanation is provided in section 3.7 of Figure 3.12.
3.7 RESULTS AND DISCUSSIONS
3.7.1 Simulation Results
Simulations are done using Xilinx Vivado 2015.2 system design

software. For synthesizing purpose, Xilinx's FPGAs Virtex®-7 FPGA
Development Boards is used as a target environment. These Boards are designed
for the highest performance and integration at 28 nm. Image of Xilinx Virtex-7
FPGA Development BoardsVirtex-7 FPGAs from Xilinx are optimized for system
performance and integration at 28 nm and bring best-in-class performance/watt
fabric, DSP performance, and I/O bandwidth to customer designs. The family is
used in a range of applications from 10 G to 100 G networking, ASIC prototyping
and portable radar.
56
Simulation Results for Latch free clock gating

Figure 3.10(a) shows the RTL schematic and power summary of Latch
free clock gating.
(a) RTL schematic and Power summary of Latch free clock gating
Figure 3.10(a) represents the information about number of LUT which

is used to realize a complex function in digital logic and also gives the information
about input buffer & output buffer. Here power is measured by the unit Watt(W)
and hence for Latch free design dynamic power estimated was 0.271W, Static
power estimated was 0.245W and Total On-chip power estimated was 0.515W.
Figure 3.10(b) shows the simulation result of Latch free clock gating.
57
(b) Simulation result of Latch free clock gating
The simulation result for Latch free clock gating provided in

Figure 3.10(b) is obtained based on the operation of d flip-flop, X-OR, AND clock
gating. The expression for wired K1, K2 is K1=in^q, K2=K1 & clk.
Figure 3.10(c) shows the FPGA Floorplan Layout result of Latch free
clock gating.
(c) FPGA Floorplan Layout result of Latch free clock gating
Figure 3.10(d) gives the estimated resource utilization report of Latch

free clock gating.
58
(d) Estimated resource utilization report of Latch free clock gating

Figure 3.10 Simulation Results for Latch free clock gating
From Figure 3.10(d) it is inferred that the number of LUT used here is
01, Flop Latch used is 01, and I/O count is 01.
Simulation Results for Latch based clock gating

Figure 3.11(a) shows the RTL schematic and power summary of Latch
based clock gating.
(a) RTL schematic and power summary of Latch based clock gating
59

Figure 3.11(b) shows the simulation result for Latch based clock
gating. Here a level sensitive latch is added as a bit storage element.
(b) Simulation result of Latch based clock gating
The simulation result for Latch based clock gating provided in

Figure 3.11(b) was obtained based on the operation of d flip-flop, X-OR, AND
clock gating. Wired K1, K2, K3 had the following operation i.e., K1=in^q, K2=K1
(stored bit of latch), K3=K2&clk.
Figure 3.11(c) shows the FPGA Floorplan Layout result of Latch based
clock gating.
60
(c) FPGA Floorplan Layout result of Latch based clock gating
Figure 3.11(d) shows the estimated resource utilization report of

Latch based clock gating.
(d) Estimated resource utilization report of Latch based clock gating

Figure 3.11 Simulation Results for Latch based clock gating
61
Simulation Results for Pipeline circuit

Figure 3.12(a) shows the RTL schematic and power summary of
Pipeline circuit.
(a) RTL schematic and power summary of Pipeline circuit

Figure 3.12(b) shows the simulation result for pipeline circuit which is
based on C17 circuit.
62
(b) Simulation result of Pipeline circuit
The simulation result for Pipeline circuit provided in Figure 3.12(b)

was obtained based on the operation of C17 circuit also it consists of d flip-flop
execution. The C17 operation is M1=~(X1&X2), M2=~(X2&X3), M3=~(X4&M2),
M4=~(X5&M2), Y1=~(M3&M1), Y2=~(M3&M4).
Figure 3.12(c) shows the FPGA Floorplan Layout result of Pipeline circuit
(c) FPGA Floorplan Layout result of Pipeline circuit
Figure 3.12(d) shows the estimated resource utilization report of Pipeline circuit
63
(d) Estimated resource utilization report of Pipeline circuit

Figure 3.12 Simulation Results for Pipeline circuit
Simulation results for Without clock gating

Figure 3.13(a) shows the RTL schematic and power summary of
without clock gating.
(a) RTL schematic and power summary of without Clock Gating

64
Figure 3.13(b) shows the simulation result of without clock gating.
(b) Simulation result of without Clock Gating
The simulation result without clock gating provided in Figure 3.13(b)

is obtained based on the operation of d flip-flop.
Figure 3.13(c) shows the FPGA Floorplan Layout result of without

Clock Gating
(c) FPGA Floorplan Layout result of without Clock Gating

65
Figure 3.13(d) shows the estimated resource utilization report of

without Clock Gating.
(d) Estimated resource utilization report of without Clock Gating

Figure 3.13 Simulation results for without clock gating
Table 3.1 shows the comparison of power and number of LUTs for
Latch free clock gating, Latch based clock gating, pipeline circuit as well as
without clock gating.
Table 3.1 Comparison of Power and LUTs of various gating techniques
Dynamic Static Total on-chip

Gating techniques No of
power(W) power(W) power(W)
LUTs
Latch free clock gating 0.271W 0.245W 0.515W 1
Latch based clock gating 0.288W 0.245W 0.533W 2
Pipelined circuit 1.286W 0.254W 1.54W 7
Without clock gating 0.265W 0.245W 0.51W 0
66
From Table 3.1, it is inferred that the pipeline circuit consume more
power and the number of LUTs used for pipeline circuit is also high compare
to the other gating techniques. There is no extra logics involved in without
clock gating technique it consumes comparatively less power and area with
clock gating techniques.

Chapter 3

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 3

Uploaded by

Copyright:

Available Formats

40

CLOCK GATING LOGIC FOR LOGIC CIRCUITS

In today's semiconductor designs, lower power consumption is

Figure 3.1 Different power dissipation types in CMOS circuits

Dynamic dissipation of power occurs in the logic gates that are

Dynamic power dissipation

Dynamic power, mainly caused by current flow from parasitic

Figure 3.2 CMOS inverter for switching

Pavg = αT . Cload .V²DD . fCLK (3.1)

Static Power Dissipation

Clock power consumes 60-70% of the total chip power. It is therefore

Clock gating works by using clock signals effectively on sequential or

Several methods have been developed to decrease dynamic power, of

There is a large gap between block disabling driven by the HDL

The clock gating method has been designed to prevent unnecessary

sequential clock-gating selectively suspend clocking. The main objective of this

3.2 COMBINATIONAL CLOCK GATING

Combinational clock-gating is a simple replacement of the RTL code.

Figure 3.3 Combinational clock gating

3.3 SEQUENTIAL CLOCK GATING

Sequential clock gating changes the micro-architecture of the RTL

Figure 3.4 Sequential clock gating

Sequential clock-gating is a multi-cycle optimization with numerous

As sequential optimizations alter the design state, it is not possible to

Most clock power reduction efforts have largely focused on major

3.4 SYSTEM LEVEL CLOCK GATING

Clock gating at system level is designed into the original hardware

Most hardware engineers know how to write RTL so that synthesis

3.5 CLOCK GATING EFFICIENCY

A typical measurement used to evaluate clock gating efficiency is the

The performance of clock-gating is defined as the amount of energy

3.6 SIMPLE SEQUENTIAL CIRCUITS

3.6.1 Latch Free Design

Depending on the benefit on which flip-flops are triggered, the

Figure 3.5 Latch free clock gating

case. In practice, for positive edge-triggered registers where EN is refused, OR

3.6.2 Latch based Design

The latch-based clock gating style adds a level-sensitive latch to the

Latch-based clock gating style consisting of a latch and an AND gate

Figure 3.6 Latch based clock gating

Latch based clock gating is shown in Figure 3.6. Latch-based design

Usually, one of two devices are used to implement synchronization and

3.6.3 Pipeline Circuit

Clock gating works by recognizing groups of flip-flops that share a

There are several considerations in implementing clock gating. First,

Figure 3.7 Block diagram used for pipeline design

Figure 3.8 C17 benchmark circuit

This is a small benchmark circuit comprised of 6 NAND gates.

Figure 3.9 Pipeline design with C17 benchmark circuit

3.7 RESULTS AND DISCUSSIONS

3.7.1 Simulation Results

Simulations are done using Xilinx Vivado 2015.2 system design

Simulation Results for Latch free clock gating

Figure 3.10(a) represents the information about number of LUT which

(b) Simulation result of Latch free clock gating

The simulation result for Latch free clock gating provided in

(c) FPGA Floorplan Layout result of Latch free clock gating

Figure 3.10(d) gives the estimated resource utilization report of Latch

(d) Estimated resource utilization report of Latch free clock gating

Simulation Results for Latch based clock gating

Figure 3.11(a) represents the information about number of LUT which

(b) Simulation result of Latch based clock gating