Professional Documents
Culture Documents
Calypto - RTL Verification Without Testbenches
Calypto - RTL Verification Without Testbenches
Calypto - RTL Verification Without Testbenches
-1-
Increasingly, designers begin with high-level models to partition and verify system functionality.
Best-practicing teams reuse these models within testbenches to verify the resulting RTL
designs. However, the cost and brittleness of these testbenches limit the design teams ability
to explore and optimize their RTL. Sequential equivalence checking technology reduces
testbench requirements, improving productivity and giving designers more opportunities to
optimize their RTL designs. This paper traces the development of a DES encryption design to
demonstrate the advantages of doing RTL verification without testbenches.
Verification-Limited Design
Relentlessly growing design complexity and time to market pressures are putting a lot of stress on the
SoC development flow. Increasingly, designers are starting with a high-level model of their design to
verify overall functionality and algorithm correctness, as well as, to evaluate architectural choices and
make early hardware-software partitioning decisions.
Often, a hardware task is modeled and verified at the untimed functional level using Matlab, C, or
SystemC descriptions. The verified functionality, often with only anecdotal interface and performance
specifications, is passed on to an RTL design engineer for implementation.
Best-practice development teams leverage their high-level models during RTL verification, incorporating
them as reference models in simulation testbenches.
Unfortunately, the cost of developing a good testbench frequently exceeds the cost of the RTL design it is
intended to verify! Because of this, testbenches are usually created only at the chip or major subsystem
level, with these consequences:
Thorough verification is delayed until enough design blocks are available to assemble within an
available testbench.
As a simulation-based approach, testbenches do not cover the entire design functionality, and
require complex and long-running regression suites.
At the testbench level of integration, random sequences, limited controllability and poor
observability make detecting and then diagnosing bugs more difficult.
With many resources tied up in verification, block interface and performance changes, especially
those which impact the testbench, are discouraged.
The time and cost associated with verification prohibit a reasonable exploration of the RTL
implementation space. Often times a poor architectural choice wont manifest itself until late in the design
flow, and with advanced manufacturing processes, the risks greatly increase that a poor architectural
choice will fail to achieve acceptable design closure during physical synthesis.
-2-
To demonstrate the advantages of sequential equivalence checking, this paper traces the development of
a DES encryption circuit, verifying RTL functionality using sequential equivalence checking instead of
waiting for testbenches.
Core DES
The Data Encryption Standard (DES) was originally adopted by the US National Institute of Standards
and Technology (NIST) in 1977. The core DES algorithm takes a single 64-bit data value and a 64-bit key
value and produces a 64-bit encrypted data result. Figure 1 shows an overview of the algorithm. After
initial permutations of data and key, 16 rounds of computation encrypt the data, and a final permutation
produces the encrypted result. The algorithm is symmetric, so the same key which encrypts the data can
be used to decrypt and recover the original data block.
DES Algorithm
data_in
key_in
Initial Data
Permutation
Initial Key
Permutation
Round0
Computation Roundi
Round1
li
Round2
ri
ci
di
Round3
Shift
Round4
Round6
Round7
Compression
Permutation
Expansion
Permutation
Round5
Switchbox
Substitution
Round8
Round9
P-Box Permutation
Round10
Round11
Round12
li+1
Round13
ri+1
ci+1
di+1
Round14
Round15
data_out
-3-
data_in
key_in
return data_out;
}
/** DES function wrapper. */
SC_MODULE(des_c0) {
sc_out<sc_bv<64> >
data_out;
sc_in<sc_bv<64> >
data_in, key_in;
sc_in<bool>
decrypt_in;
SC_CTOR(des_c0) {
SC_METHOD(evaluate);
sensitive << data_in << key_in << decrypt_in;
}
SC_MODULE(des_c0)
data_out
void evaluate() {
data_out.write( des(data_in.read(),
key_in.read(), decrypt_in.read()));
}
};
Integrated DES
The core DES algorithm is used within a larger module to encrypt entire message streams. Additional
modes of operation, such as electronic codebook (ECB) or cipher block chaining (CBC), are layered ontop of the algorithm to process full message streams. Because the original encryption algorithm is
becoming vulnerable to brute-force attacks, triple-DES combinations are recommended for better security.
Given the costs and time required for testbench development, it is likely that a complete testbench will
only be developed for the integrated message stream module. Any core DES implementation would be
verified within this larger context. Figure 3 shows an example testbench setup. The testbench contains a
reference model including the core C/SystemC DES model along with a high-level model of the message
encryption streamer. This is compared with the corresponding implementation modules, wrapped with
both input and output transactors to account for the differences in timing and interface protocols.
-4-
Mode Assembler
Mode Sequencer
Input Transactors
Stimulus Generator
Output Transactors
simulation complete.
Because of the level of integration required for the testbench, any RTL implementation of the core DES
algorithm will not be thoroughly verified until all other blocks are available.
There are a variety of power and performance options possible for the DES RTL design. Because of the
time required to develop a testbench, implementation timing and interfaces are chosen early, yet
functionality is verified late. If functional verification requirements could be removed from the testbench
and shifted earlier in the design process, more optimal RTL implementations could be explored.
-5-
However, unlike existing combinational tools, equivalence checkers with sequential and data abstraction
capabilities can verify designs with common functionality despite differences in:
State representation
Interface timing and protocols
Resource scheduling and allocation
Sequential analysis capabilities and the ability to bridge abstraction gaps enable these equivalence
checkers to be used far earlier in the design process, resulting in:
Early detection of system-level and RTL functional bugs, without the need for block-level
testbenches.
Complete functional verification of the RTL with respect to a system-level or other RTL reference
models.
Pipelining - pipelines are often added to a design to meet throughput requirements. Pipeline
refinements include inserting or modifying the number of pipeline stages in data and control paths.
A common scenario might be adding a pipeline stage to a key datapath, increasing the
implementation latency by one.
Resource Scheduling - resources are allocated and scheduled to meet cost and performance
targets. A single-cycle computation in a design specification may become multi-cycled in the
implementation, changing the sharing and timing of required resources.
Register Retiming - register retiming is a common RTL optimization used to balance the amount
of logic between flip-flops. Although the state of the two RTL models is different, the interface
behavior of the two designs remains identical.
Interface Refinements - as designs are refined, block interfaces are changed from abstract data
types, such as integers, to bit-accurate busses. While preserving the core functionality, interface
protocols, timings, and data sizes may be changed, for example, from full parallel to byte-serial or
bit-serial interfaces.
State Recoding - state machine encodings may be changed to optimize implementation area,
timing, and/or dynamic power. A typical recoding might change from a binary-encoded machine
to a one-hot implementation.
Additional Operating Modes additional modes of operation may be present in an RTL
implementation. Scan path, for example, is often added to an RTL implementation. High-level
behaviors can be verified by constraining the RTL so that the additional modes of operation are
disabled.
-6-
data_in
key_in
Initial Permutations
clk
Round0-16
key_in
data_in
data_out
Inverse Initial
Permutation
data_out
Counterexample
When a functional difference is found, a counterexample is generated which demonstrates the difference.
Typically, the counterexample will be the shortest sequence from reset of the two designs to a state which
demonstrates an output difference. Figure 5 shows the two waveform sequences.
-7-
c0 - spec.vcd
v2 - impl.vcd
clk
clk
rst
data_in
4724
0021
0300
000F
data_in
4724
0021
0300
000F
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
key_in
0C92
2408
8E04
E828
key_in
0C92
2408
8E04
E828
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
data_out
FC2A
7711
7589
B82C
data_out
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
XXXX
7C2A
7711
7589
B82C
-8-
slec> verify
Output-pair: spec.data_out (throughput=1, latency=0) and impl.data_out (throughput=16, latency=16)
proven to be equivalent.
Testbench Comparison
Without equivalence checking, functional verification must often be delayed until all RTL blocks are
available. Rerunning the sample message with the initial design yields:
DES V2 ECB Mode Message Simulation
Original message ->
simulation complete.
Diagnosis of the difference, if detected, is more difficult because the differences must be traced through
more layers of logic before the bug is reached. Additionally, the stimulus, which often is random, will
contain extraneous vectors which do not contribute to the design difference.
Assertions, such as OVL or formal property specifications, can be added to the designs to help observe
and localize errors. Unlike the high-level model, these assertions need to be checked and rewritten with
each implementation change. They are incomplete and introduce an additional error source.
-9-
Summary
Using SLEC, four different DES implementations were designed and proven functionally equivalent
without testbenches. The implementations have different throughputs and areas as shown in figure 9.
- 10 -
Implementation Area
- 11 -