Calypto - RTL Verification Without Testbenches

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

RTL Verification without Testbenches

Calypto Design Systems, Inc.


www.calypto.com

Copyright 2005 Calypto Design Systems, Inc.

-1-

Increasingly, designers begin with high-level models to partition and verify system functionality.
Best-practicing teams reuse these models within testbenches to verify the resulting RTL
designs. However, the cost and brittleness of these testbenches limit the design teams ability
to explore and optimize their RTL. Sequential equivalence checking technology reduces
testbench requirements, improving productivity and giving designers more opportunities to
optimize their RTL designs. This paper traces the development of a DES encryption design to
demonstrate the advantages of doing RTL verification without testbenches.

Verification-Limited Design
Relentlessly growing design complexity and time to market pressures are putting a lot of stress on the
SoC development flow. Increasingly, designers are starting with a high-level model of their design to
verify overall functionality and algorithm correctness, as well as, to evaluate architectural choices and
make early hardware-software partitioning decisions.
Often, a hardware task is modeled and verified at the untimed functional level using Matlab, C, or
SystemC descriptions. The verified functionality, often with only anecdotal interface and performance
specifications, is passed on to an RTL design engineer for implementation.
Best-practice development teams leverage their high-level models during RTL verification, incorporating
them as reference models in simulation testbenches.
Unfortunately, the cost of developing a good testbench frequently exceeds the cost of the RTL design it is
intended to verify! Because of this, testbenches are usually created only at the chip or major subsystem
level, with these consequences:

Thorough verification is delayed until enough design blocks are available to assemble within an
available testbench.
As a simulation-based approach, testbenches do not cover the entire design functionality, and
require complex and long-running regression suites.
At the testbench level of integration, random sequences, limited controllability and poor
observability make detecting and then diagnosing bugs more difficult.
With many resources tied up in verification, block interface and performance changes, especially
those which impact the testbench, are discouraged.

The time and cost associated with verification prohibit a reasonable exploration of the RTL
implementation space. Often times a poor architectural choice wont manifest itself until late in the design
flow, and with advanced manufacturing processes, the risks greatly increase that a poor architectural
choice will fail to achieve acceptable design closure during physical synthesis.

Sequential Equivalence Checking


Many of the disadvantages of testbenches can be avoided by using equivalence checking techniques to
verify individual block functionality early on in the design flow. RTL designs can be compared with
corresponding high-level models. While the RTL designs and high-level models have matching
functionality, they typically have differences in timing and interfacing.
Sequential equivalence checking technology, such as Calyptos SLEC, are targeted at targets verifying
system-level to RTL and RTL to RTL functionality in the presence of timing and interface differences.
Existing combinational equivalence checkers, used for RTL to gate verification, cannot handle the
sequential differences.
Copyright 2005 Calypto Design Systems, Inc.

-2-

To demonstrate the advantages of sequential equivalence checking, this paper traces the development of
a DES encryption circuit, verifying RTL functionality using sequential equivalence checking instead of
waiting for testbenches.

Core DES
The Data Encryption Standard (DES) was originally adopted by the US National Institute of Standards
and Technology (NIST) in 1977. The core DES algorithm takes a single 64-bit data value and a 64-bit key
value and produces a 64-bit encrypted data result. Figure 1 shows an overview of the algorithm. After
initial permutations of data and key, 16 rounds of computation encrypt the data, and a final permutation
produces the encrypted result. The algorithm is symmetric, so the same key which encrypts the data can
be used to decrypt and recover the original data block.
DES Algorithm
data_in

key_in

Initial Data
Permutation

Initial Key
Permutation

Round0

Computation Roundi

Round1
li

Round2

ri

ci

di

Round3
Shift

Round4

Round6
Round7

Compression
Permutation

Expansion
Permutation

Round5

Switchbox
Substitution

Round8
Round9

P-Box Permutation

Round10
Round11
Round12
li+1

Round13

ri+1

ci+1

di+1

Round14
Round15

Inverse Initial Data


Permutation

data_out

Figure 1 - Algorithm/Compute Block


At the system level, the core DES algorithm can be modeled in C. The model is untimed, and because
the algorithm uses various bit widths internally, SystemC datatypes are used to unambiguously specify
the arithmetic operations. For simulation, the design can be wrapped in a simple SystemC harness.
Figure 2 shows part of this design.

Copyright 2005 Calypto Design Systems, Inc.

-3-

/** DES function. */


sc_bv<64> des(sc_bv<64> data_in, sc_bv<64> key_in,
bool decrypt) {
sc_bv<56> cd0;
sc_bv<64> lr0, lr16, rl16, data_out;
lr0 = des_ip(data_in);
cd0 = des_ic(key_in);
lr16 = des_compute(lr0, cd0, decrypt);
rl16 = (lr16.range(31, 0), lr16.range(63, 32));
data_out = des_iip(rl16);

data_in

key_in

return data_out;
}
/** DES function wrapper. */
SC_MODULE(des_c0) {
sc_out<sc_bv<64> >
data_out;
sc_in<sc_bv<64> >
data_in, key_in;
sc_in<bool>
decrypt_in;
SC_CTOR(des_c0) {
SC_METHOD(evaluate);
sensitive << data_in << key_in << decrypt_in;
}

dataout = DES(datain, keyin)

SC_MODULE(des_c0)
data_out

void evaluate() {
data_out.write( des(data_in.read(),
key_in.read(), decrypt_in.read()));
}
};

Figure 2 Portion of C/SystemC DES Model


NIST publishes some test data which can be used to validate the algorithm.

Integrated DES
The core DES algorithm is used within a larger module to encrypt entire message streams. Additional
modes of operation, such as electronic codebook (ECB) or cipher block chaining (CBC), are layered ontop of the algorithm to process full message streams. Because the original encryption algorithm is
becoming vulnerable to brute-force attacks, triple-DES combinations are recommended for better security.
Given the costs and time required for testbench development, it is likely that a complete testbench will
only be developed for the integrated message stream module. Any core DES implementation would be
verified within this larger context. Figure 3 shows an example testbench setup. The testbench contains a
reference model including the core C/SystemC DES model along with a high-level model of the message
encryption streamer. This is compared with the corresponding implementation modules, wrapped with
both input and output transactors to account for the differences in timing and interface protocols.

Copyright 2005 Calypto Design Systems, Inc.

-4-

Mode Assembler

Mode Sequencer

Input Transactors

Stimulus Generator

Output Transactors

Figure 3 - Extended Testbench/DUT


Simulation of the high-level portion of the testbench gives the following results:
DES C0 ECB Mode Message Simulation
Original message ->

Calypto: Bridging System and RTL

Encrypted message -> f4966ffc92c8df0c 6b6c41c883c70700 615617aac9640b2e 0f434cdf068b53e1


Decrypted message ->

Calypto: Bridging System and RTL

simulation complete.

Because of the level of integration required for the testbench, any RTL implementation of the core DES
algorithm will not be thoroughly verified until all other blocks are available.
There are a variety of power and performance options possible for the DES RTL design. Because of the
time required to develop a testbench, implementation timing and interfaces are chosen early, yet
functionality is verified late. If functional verification requirements could be removed from the testbench
and shifted earlier in the design process, more optimal RTL implementations could be explored.

Testbench Not Required


Sequential equivalence checking, embodied in tools such as SLEC, enables functional verification of RTL
designs to be done without a testbench.
Two designs are compared for functional equivalence. Typically, the specification design is an abstract
system level or register transfer (RTL) model, while the implementation design is a more concrete RTL
design. Equivalence checkers provide:

Functional verification without test vectors


Corner case bug detection and counterexamples when differences are found

Copyright 2005 Calypto Design Systems, Inc.

-5-

However, unlike existing combinational tools, equivalence checkers with sequential and data abstraction
capabilities can verify designs with common functionality despite differences in:

State representation
Interface timing and protocols
Resource scheduling and allocation

Sequential analysis capabilities and the ability to bridge abstraction gaps enable these equivalence
checkers to be used far earlier in the design process, resulting in:

Early detection of system-level and RTL functional bugs, without the need for block-level
testbenches.
Complete functional verification of the RTL with respect to a system-level or other RTL reference
models.

Sequential Design Variations


Throughout the RTL design process, sequential equivalence checkers can verify all design refinements
which do not change the functional behavior of the design. Some common RTL design refinements
include:

Pipelining - pipelines are often added to a design to meet throughput requirements. Pipeline
refinements include inserting or modifying the number of pipeline stages in data and control paths.
A common scenario might be adding a pipeline stage to a key datapath, increasing the
implementation latency by one.
Resource Scheduling - resources are allocated and scheduled to meet cost and performance
targets. A single-cycle computation in a design specification may become multi-cycled in the
implementation, changing the sharing and timing of required resources.
Register Retiming - register retiming is a common RTL optimization used to balance the amount
of logic between flip-flops. Although the state of the two RTL models is different, the interface
behavior of the two designs remains identical.
Interface Refinements - as designs are refined, block interfaces are changed from abstract data
types, such as integers, to bit-accurate busses. While preserving the core functionality, interface
protocols, timings, and data sizes may be changed, for example, from full parallel to byte-serial or
bit-serial interfaces.
State Recoding - state machine encodings may be changed to optimize implementation area,
timing, and/or dynamic power. A typical recoding might change from a binary-encoded machine
to a one-hot implementation.
Additional Operating Modes additional modes of operation may be present in an RTL
implementation. Scan path, for example, is often added to an RTL implementation. High-level
behaviors can be verified by constraining the RTL so that the additional modes of operation are
disabled.

Serial DES Implementation


To implement the core DES algorithm, the first RTL design is a straightforward, minimal hardware, serialcompute approach, written in Verilog. Figure 4 shows a schematic and timing diagram for the
implementation. Each compute round takes one cycle.

Copyright 2005 Calypto Design Systems, Inc.

-6-

data_in

key_in

Initial Permutations

clk

Round0-16

key_in

data_in

data_out

Inverse Initial
Permutation

data_out

Figure 4 Serial DES Schematic & Timing Diagram


To functionally verify the RTL implementation with the system-level description, the reset state,
throughput, and output latency of the two designs must be specified. Since the system-level model is
untimed and stateless, there is no reset needed and the circuit can be considered combinational. The
RTL design accepts new inputs every 16 cycles and outputs a result every 16 cycles, so the throughput
and output latency are both 16 cycles. Running an equivalence check with this information results in the
following:
slec> verify
[ORC-SSA] Starting to perform sequential analysis.
[SEQ-NPF] Number of equivalence problems falsified = 1
[SIM-VMM] Simulation verified the falsification for map
'name_map<spec.xdif_out.q,impl.xdif_out.q>' in transaction 1.
[SIM-SVF] Split VCD files into file 'impl.vcd' and file 'spec.vcd'.
[CEG-CVT] Generating testbenches for counter-example...
==========================================================
**************** Verification summary ********************
==========================================================
Proven Bounded-proven Falsified Unknown
Output pairs
0
2
0
0
Flop pairs
0
2
1
0

Counterexample
When a functional difference is found, a counterexample is generated which demonstrates the difference.
Typically, the counterexample will be the shortest sequence from reset of the two designs to a state which
demonstrates an output difference. Figure 5 shows the two waveform sequences.

Copyright 2005 Calypto Design Systems, Inc.

-7-

c0 - spec.vcd

v2 - impl.vcd

clk

clk
rst

data_in

4724
0021
0300
000F

data_in

4724
0021
0300
000F

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

key_in

0C92
2408
8E04
E828

key_in

0C92
2408
8E04
E828

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

data_out

FC2A
7711
7589
B82C

data_out

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

XXXX
XXXX
XXXX
XXXX

7C2A
7711
7589
B82C

Figure 5 - Counterexample VCDs:


There is only one bit difference in the output. Considering all the exclusive-ors and permutations inside
the compute modules, its likely the bug is in the final output stage. The NIST specification specifies all
the bit locations starting with 1 and numbering left to right, while the Verilog implementation chose the
number from 0 going right to left. One of the permutations ended up off by 1. Figure 6 shows the source
of the bug.

Figure 6 DES Design Correction


Fixing the error and rerun SLEC achieves full functional equivalence:

Copyright 2005 Calypto Design Systems, Inc.

-8-

slec> verify
Output-pair: spec.data_out (throughput=1, latency=0) and impl.data_out (throughput=16, latency=16)
proven to be equivalent.

Testbench Comparison
Without equivalence checking, functional verification must often be delayed until all RTL blocks are
available. Rerunning the sample message with the initial design yields:
DES V2 ECB Mode Message Simulation
Original message ->

Calypto: Bridging System and RTL

Encrypted message -> f4966ffc92c8df0c eb6c41c883c70700 e15617aac9640b2e 0f434cdf068b53e1


Decrypted message ->

Calypto:`Bridgin' System and RTL

simulation complete.

Diagnosis of the difference, if detected, is more difficult because the differences must be traced through
more layers of logic before the bug is reached. Additionally, the stimulus, which often is random, will
contain extraneous vectors which do not contribute to the design difference.
Assertions, such as OVL or formal property specifications, can be added to the designs to help observe
and localize errors. Unlike the high-level model, these assertions need to be checked and rewritten with
each implementation change. They are incomplete and introduce an additional error source.

Alternative Sequential Designs


In addition to the serial implementation, three other alternatives were implemented as shown in figures 7a,
7b, and 7c.

Figure 7a Quarter Pipelined Design

Copyright 2005 Calypto Design Systems, Inc.

-9-

Figure 7b Fully Pipelined Design

Figure 7c Double Pipelined Design


The first design, in figure 7a, inserts a pipeline stage every 4 compute rounds. The output latency remains
16 cycles. The compute hardware is replicated 4 times, but new encryption inputs (throughput) now occur
every 4 cycles.
The second design, in figure 7b, is a fully pipelined implementation. The compute rounds are fully
unrolled, but new encryption inputs occur every cycle.
The third design, in figure 7c, is super-pipelined, with a pipeline stage inserted in each compute round.
The output latency increases to 32 cycles. Registers are added, but since there is less work done each
cycle, either a higher clock frequency could be used, or the power could be reduced at the same clock
frequency.
In all cases, sequential equivalence checking demonstrates functional equivalence with the original highlevel design. Since the designs all start in the same reset state, throughput and output latency values are
the only setup changes needed between verification runs.

Summary
Using SLEC, four different DES implementations were designed and proven functionally equivalent
without testbenches. The implementations have different throughputs and areas as shown in figure 9.

Copyright 2005 Calypto Design Systems, Inc.

- 10 -

Implementation Area

Figure 8 RTL Design Metrics


Sequential equivalence checking tools such as SLEC enable designers to verify their blocks earlier in the
design process without having to wait for testbenches or other blocks to be ready for verification.
Full functional verification can be completed without test vectors. With a testbench approach, even after
the initial testbenches are developed, stimulus generation must be continually analyzed and extended to
uncover as many bugs as possible.
When SLEC uncovers bugs, counterexamples are typically the shortest sequences possible to excite the
bugs and propagate them to the outputs. Conversely, random-stimulus testbenches have long, often
extraneous, sequences and require assertions to improve the chances of observing bugs.
Although this paper focused on block-level equivalence checking, sequential equivalence checkers can
also be used to improve functional testing at the integration level as well. At whichever level, when
testbenches are used, their functional testing burdens are greatly reduced when used in combination with
sequential equivalence checkers.
Sequential equivalence checkers, such as Calyptos SLEC, put early functional verification back in the
hands of the designer. This capability enables designers to explore more RTL variations during design.
The flexibility to evaluate and productively verify sequential changes throughput the design process
improves the quality of the RTL signed off for subsequent physical implementation.

Copyright 2005 Calypto Design Systems, Inc.

- 11 -

You might also like