Professional Documents
Culture Documents
Register-Transfer Level - Wikipedia
Register-Transfer Level - Wikipedia
In digital circuit design, register-transfer level (RTL) is a design abstraction which models a synchronous
digital circuit in terms of the flow of digital signals (data) between hardware registers, and the logical
operations performed on those signals.
Register-transfer-level abstraction is used in hardware description languages (HDLs) like Verilog and VHDL
to create high-level representations of a circuit, from which lower-level representations and ultimately actual
wiring can be derived. Design at the RTL level is typical practice in modern digital design.[1]
Unlike in software compiler design, where the register-transfer level is an intermediate representation and at
the lowest level, the RTL level is the usual input that circuit designers operate on. In fact, in circuit synthesis,
an intermediate language between the input register transfer level representation and the target netlist is
sometimes used. Unlike in netlist, constructs such as cells, functions, and multi-bit registers are available.[2]
Examples include FIRRTL and RTLIL.
RTL description
A synchronous circuit consists of two kinds of elements:
registers (Sequential logic) and combinational logic.
Registers (usually implemented as D flip-flops)
synchronize the circuit's operation to the edges of the
clock signal, and are the only elements in the circuit that
have memory properties. Combinational logic performs
all the logical functions in the circuit and it typically
consists of logic gates.
D <= not Q;
process(clk)
begin
if rising_edge(clk) then
Q <= D;
end if;
end process;
Using an EDA tool for synthesis, this description can usually be directly translated to an equivalent hardware
implementation file for an ASIC or an FPGA. The synthesis tool also performs logic optimization.
At the register-transfer level, some types of circuits can be recognized. If there is a cyclic path of logic from
a register's output to its input (or from a set of registers outputs to its inputs), the circuit is called a state
machine or can be said to be sequential logic. If there are logic paths from a register to another without a
cycle, it is called a pipeline.
An RTL description is usually converted to a gate-level description of the circuit by a logic synthesis tool.
The synthesis results are then used by placement and routing tools to create a physical layout.
Logic simulation tools may use a design's RTL description to verify its correctness.
Motivation
It is well known that more significant power reductions are possible if optimizations are made on levels of
abstraction, like the architectural and algorithmic level, which are higher than the circuit or gate level [3]
This provides the required motivation for the developers to focus on the development of new architectural
level power analysis tools. This in no way implies that lower level tools are unimportant. Instead, each layer
of tools provides a foundation upon which the next level can be built. The abstractions of the estimation
techniques at a lower level can be used on a higher level with slight modifications.
Gate Equivalents[4]
It is a technique based on the concept of gate equivalents. The complexity of a chip architecture can be
described approximately in terms of gate equivalents where gate equivalent count specifies the average
number of reference gates that are required to implement the particular function. The total power required
for the particular function is estimated by multiplying the approximated number of gate equivalents with the
average power consumed per gate. The reference gate can be any gate e.g. 2-input NAND gate.
Steps:
1. Identify the functional blocks such as counters, decoders, multipliers, memories,
etc.
2. Assign a complexity in terms of Gate Equivalents. The number of GE’s for each unit
type are either taken directly as an input from the user or are fed from a library.
Where Etyp is the assumed average dissipated energy by a gate equivalent, when
active. The activity factor, Aint, denotes the average percentage of gates switching per
clock cycle and is allowed to vary from function to function. The capacitive load, CL, is a
combination of fan-out loading as well as wiring. An estimate of the average wire length
can be used to calculate the wiring capacitance. This is provided by the user and cross-
checked by using a derivative of Rent’s Rule.
Assumptions:
1. A single reference gate is taken as the basis for all the power estimates not taking
into consideration different circuit styles, clocking strategies, or layout techniques.
2. The percentage of gates switching per clock cycle denoted by Activity factors are
assumed to be fixed regardless of the input patterns.
3. Typical gate switching energy is characterized by completely random uniform white
noise (UWN) distribution of the input data. This implies that the power estimation is
same regardless of the circuit being idle or at maximum load as this UWN model
ignores how different input distributions affect the power consumption of gates and
modules.[5]
Class-Dependent Power Modeling: This approach is slightly better than the previous
approach as it takes into account customized estimation techniques to the different types of
functional blocks thus trying to increase the modelling accuracy which wasn’t the case in the
previous technique such as logic, memory, interconnect, and clock hence the name. The power
estimation is done in a very similar manner to the independent case. The basic switching
energy is based on a three-input AND gate and is calculated from technology parameters e.g.
gate width, tox, and metal width provided by the user.
Where Cwire denotes the bit line wiring capacitance per unit length and Ccell denotes
the loading due to a single cell hanging off the bit line. The clock capacitance is based
on the assumption of an H-tree distribution network. Activity is modelled using a UWN
model. As can be seen by the equation the power consumption of each components is
related to the number of columns (Ncol) and rows (Nrow) in the memory array.
Disadvantages:
1. The circuit activities are not modeled accurately as an overall activity factor is
assumed for the entire chip which is also not trustable as provided by the user. As a
matter of fact activity factors will vary throughout the chip hence this is not very
accurate and prone to error. This leads to the problem that even if the model gives
a correct estimate for the total power consumption by the chip, the module wise
power distribution is fairly inaccurate.
2. The chosen activity factor gives the correct total power, but the breakdown of power
into logic, clock, memory, etc. is less accurate. Therefore this tool is not much
different or improved in comparison with CES.
This technique further customizes the power estimation of various functional blocks by having separate
power model for logic, memory, and interconnect suggesting a power factor approximation (PFA) method
for individually characterizing an entire library of functional blocks such as multipliers, adders, etc. instead
of a single gate-equivalent model for “logic” blocks.
The power over the entire chip is approximated by the expression:
Where Ki is PFA proportionality constant that characterizes the ith functional element is the measure of
hardware complexity, and denotes the activation frequency.
Example
Gi denoting the hardware complexity of the multiplier is related to the square of the input word length i.e. N2
where N is the word length. The activation frequency is the rate at which multiplies are performed by the
algorithm denoted by and the PFA constant, , is extracted empirically from past multiplier
designs and shown to be about 15 fW/bit2-Hz for a 1.2 µm technology at 5V. The resulting power model for
the multiplier on the basis of the above assumptions is:
Advantages:
There is the implicit assumption that the inputs do not affect the multiplier activity which is
contradictory to the fact that the PFA constant is intended to capture the intrinsic internal
activity associated with the multiply operation as it is taken to be a constant.
The estimation error (relative to switch-level simulation) for a 16x16 multiplier is experimented and it is
observed that when the dynamic range of the inputs does not fully occupy the word length of the multiplier,
the UWN model becomes extremely inaccurate.[6] Granted, good designers attempt to maximize word length
utilization. Still, errors in the range of 50-100% are not uncommon. The figure clearly suggests a flaw in the
UWN model.
See also
Datapath
Electronic design automation (EDA)
Electronic system-level
Finite-state machine with datapath
Integrated circuit design
Synchronous circuit
Algorithmic state machine
Power estimation
Gate equivalent
Power optimization (EDA)
Gaussian noise
References
1. Frank Vahid (2010). Digital Design with RTL Design, Verilog and VHDL (https://books.google.c
om/books?id=-YayRpmjc20C&pg=PA247) (2nd ed.). John Wiley and Sons. p. 247. ISBN 978-
0-470-53108-2.
2. Yosys Manual (https://yosys.readthedocs.io/_/downloads/en/latest/pdf/) (RTLIL)
3. "Power Estimation Techniques for Integrated Circuits " (http://www.eecg.toronto.edu/~najm/pap
ers/iccad95-tutorial.pdf)
4. "Low-Power Architectural Design Methodologies " (http://citeseerx.ist.psu.edu/viewdoc/downloa
d?doi=10.1.1.61.4551&rep=rep1&type=pdf)
5. "Register-Transfer Level Estimation Techniques for Switching Activity and Power Consumption"
(http://delivery.acm.org/10.1145/250000/244548/p158-raghunathan.pdf?ip=103.27.8.42&id=24
4548&acc=ACTIVE%20SERVICE&key=045416EF4DDA69D9%2EF8E7F338DF557316%2E4
D4702B0C3E38B35%2E4D4702B0C3E38B35&CFID=504808115&CFTOKEN=79046804&__a
cm__=1429710434_0d9c0bce018bcd071c079ecb15be69e8)
6. "Power Macromodeling for High Level Power Estimationy" (http://delivery.acm.org/10.1145/270
000/266171/p365-gupta.pdf?ip=103.27.8.42&id=266171&acc=ACTIVE%20SERVICE&key=04
5416EF4DDA69D9%2EF8E7F338DF557316%2E4D4702B0C3E38B35%2E4D4702B0C3E38
B35&CFID=504808115&CFTOKEN=79046804&__acm__=1429710436_686f8f2ffb085b129fe5
87723a6ee130)