Riscv Crypto Spec v0.9.0 Scalar

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

RISC-V Cryptographic Extension Proposals

Volume I: Scalar & Entropy Source Instructions


Editor: Ben Marshall
ben.marshall@bristol.ac.uk

Version 0.9.0 (March 4, 2021)


https://github.com/riscv/riscv-crypto/
git:master @ 19b36d744ffd2af3eb1c7d1a2b1c077af5e43f97

This work is licensed under a Creative Commons Attribution 4.0 International License

Contributors to all versions of the spec in alphabetical order (please contact editors to suggest corrections):
Alexander Zeh, Andy Glew, Barry Spinney, Ben Marshall, Daniel Page, Derek Atkins, Ken Dockser,
Markku-Juhani O. Saarinen, Nathan Menhorn, Richard Newell, Claire Wolf

Contents
1 Introduction 3
1.1 Sail Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Implementation Profiles 4

3 Scalar Cryptography Extension 6


3.1 Shared Bitmanip Extension Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.1 Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.2 Bit & Byte Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.3 Carry-less Multiply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.4 Logic With Negate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.5 Packing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1.6 Crossbar Permutation Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Scalar AES Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.1 RV32 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.2 RV64 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Scalar SHA-256 / SHA-512 Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3.1 SHA-256 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3.2 SHA-512 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.4 Scalar SM3 Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.5 Scalar SM4 Acceleration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.6 Instruction Latency Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1
4 Entropy Source Extension 19
4.1 PollEntropy Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Polling Mechanism with WFI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 Entropy Source Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3.1 Virtual Entropy Sources – Secret State Size Requirement . . . . . . . . . . . . . . 21
4.3.2 FIPS 140-3 / SP 800-90 IID Track . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3.3 Common Criteria / AIS 31 PTG.2 Class . . . . . . . . . . . . . . . . . . . . . . . . 22
4.4 Security Controls (Tests) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.5 GetNoise Test Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Appendix A Instruction Encodings 30


A.1 Decoding Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
A.2 Instruction Encodings Table: RV32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
A.3 Instruction Encodings Table: RV64 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Appendix B Entropy Source: Rationale and Discussion 35


B.1 Standards and Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
B.1.1 Entropy Source (ES) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
B.1.2 Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
B.1.3 Pseudorandom Number Generator (PRNG) . . . . . . . . . . . . . . . . . . . . . . 35
B.1.4 Deterministic Random Bit Generator (DRBG) . . . . . . . . . . . . . . . . . . . . 36
B.2 Specific Rationale and Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
B.3 Implementation Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
B.3.1 Noise Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
B.3.2 Samplers and GetNoise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
B.3.3 Continuous Health Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
B.3.4 Non-cryptographic Conditioners . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
B.3.5 Cryptographic Conditioners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
B.3.6 The Final Random: DRBGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
B.4 Quantum vs Classical Random . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Appendix C Supplementary Materials, Code and RTL 44

Appendix D Supporting Sail Code 45

2
1 Introduction
This document describes the proposed scalar cryptography extension for RISC-V. All instructions pro-
posed here use the general purpose X registers, and obey the 2-read-1-write register access constraint.
These instructions are designed to be lightweight, and be suitable for 32 and 64 bit base architectures,
from embedded, IoT class cores to large, application class cores which do not implement a vector unit.
This document also describes the architectural interface to an Entropy Source, which can be used to
generate cryptographic secrets. A companion document “Volume II: Vector Instructions”, describes in-
struction proposals which build on the RISC-V Vector Extension. In creating this proposal, we tried to
adhere to the following policies:
Policy #1: Where there is a choice between: 1) supporting diverse implementation strategies for an
algorithm or 2) supporting a single implementation style which is more performant / less expensive;
the crypto extension will pick the more constrained but performant option. This fits a common
pattern in other parts of the RISC-V specification, where recommended (but not required) instruction
sequences for performing particular tasks are given as an example, such that both hardware and
software implementers can optimise for only a single use-case.
Policy #2: The extension will be designed to well support existing standardised cryptographic
constructs. It will not try to support proposed standards, or cryptographic constructs which exist only
in academia. Cryptographic standards which are settled upon concurrently with, or after the RISC-V
cryptographic extension standardisation will be dealt with by future additions too, or versions of, the
RISC-V cryptographic standard extension1 .
Policy #3: Historically, there has been some discussion [Lee+04] on how newly supported operations
in general purpose computing might enable new bases for cryptographic algorithms. The standard will
not try to anticipate new useful low level operations which may be useful as building blocks for future
cryptographic constructs.
Policy #4: Regarding side-channel countermeasures: Where relevant, proposed instructions must
aim to remove the possibility of any timing side-channels. For side-channels based on power or electro-
magnetic (EM) measurements, the extension will not aim to support countermeasures which are im-
plemented above the ISA abstraction layer. Recommendations will be given where relevant on how
micro-architectures can implement instructions in a power/EM side-channel resistant way.

1.1 Sail Specifications


RISC-V maintains a formal model of the ISA specification2 , implemented in the Sail ISA specification
language[Sai]. Note that Sail refers to the specification language itself, and that there is a model of RISC-
V, written using Sail. It is not correct to refer to “the Sail model”. This is ambiguous, given there are
many models of different ISAs implemented using Sail. We refer to the Sail implementation of RISC-V
as “the RISC-V model”.
The Cryptography extension uses inline Sail code snippets from the actual model to give canonical
descriptions of instruction functionality. Each instruction is accompanied by it’s expression in Sail, and
includes calls to supporting functions which are too verbose to include directly in the specificaiton. This
supporting code is listed in Appendix D.
The Sail Manual3 is recommended reading in order to best understand the code snippets.

2
https://github.com/rems-project/sail-riscv
3
https://github.com/rems-project/sail/blob/sail2/manual.pdf

3
2 Implementation Profiles
All instructions in the scalar cryptography extension are grouped into functional sets and feature sets.
Functional sets are very fine grained, and are constructed around specific algorithms, standards require-
ments, or small logical groupings of instructions. Feature sets are more coarse grained, and are what cores
are expected to implement. The letters used to construct Functional Set names are explained in Table 1.
The Feature Sets for instructions exclusive to the scalar cryptography extension are listed in Table 2.
All Bit-manipulation instructions used by the scalar cryptography extension (see Section 3.1) are always
included in the feature sets listed in Table 2. Exceptions are RV64 only instructions, which are not
included in RV32 based implementations.

Functional Set Description


K The default scalar cryptography extension, short for ZknZkr
Zkg Constant time carry-less multiply for Galois/Counter Mode.
Zkb Bitmanip subset included in the scalar cryptography extension, minus those in Zkg.
Zkr Entropy source for seeding random number generators.
Zkn NIST algorithm suite. Short for ZkneZkndZknhZkgZkb.
Zkne NIST AES Encryption Instructions.
Zknd NIST AES Decryption Instructions.
Zknh NIST SHA2 Hash function instructions.
Zks ShangMi (SM) algorithm suite. Short for ZksedZkshZkgZkb.
Zksed SM4 Instructions.
Zksh SM3 Hash function instructions.

Table 1: Explanation of the feature strings used to refer to the functional sets.

Encryption and decryption instructions are separated into different functional groups because some
popular use cases (e.g. Galois/Counter Mode in TLS 1.3, among others) do not require decryption
functionality.
The NIST and ShangMi algorithms suites are separated because their usefulness is heavily dependent
on the countries a device is expected to operate in. NIST ciphers are a part of most standardised internet
protocols, while ShangMi ciphers are required for use in China.
Presence of the cryptography extension in any form is indicated by bit 10 of the MISA CSR. i.e. bit “K”,
because “C” is taken and “K” is for Kappa, the first letter of the ancient Greek word kruptós, meaning
“hidden”. Detection of fine-grained functionality uses the mechanisms defined by the tech-config RISC-
V Task Group. At the time of writing, these mechanisms are still being defined.
Some example GCC compiler strings:

ˆ -march=rv32ik - Implement the 32-bit NIST feature set (Zkn, the entropy source feature set (Zkr)
and all of the 32-bit Bit-manipulation instructions used by the scalar cryptography extension listed
in Section 3.1.

ˆ -march=rv64ik - As above, but implementing the 64-bit version of the NIST feature set, and
additional 64-bit instructions from the Bit-manipulation subset.

ˆ -march=rv64i Zks Zkr - Implement the Entropy Source instructions, the 64-bit Bit-manipulation
instructions, and the XLEN independent ShangMi suite instructions.

ˆ -march=rv64i Zkne Zknh - Implement only the 64-bit NIST AES Encryption and hash function
instructions.

4
Feature Sets
Instructions Functional Set
Zkn (RV32) Zkn (RV64) Zks (RV32) Zks (RV64) Zkr
aes32dsi Zknd 4
aes32dsmi Zknd 4
aes32esi Zkne 4
aes32esmi Zkne 4
aes64ds Zknd 4
aes64dsm Zknd 4
aes64es Zkne 4
aes64esm Zkne 4
aes64im Zknd 4
aes64ks1i Zkne 4
aes64ks2 Zkne 4
sha256sig0 Zknh 4 4
sha256sig1 Zknh 4 4
sha256sum0 Zknh 4 4
sha256sum1 Zknh 4 4
sha512sig0h Zknh 4
sha512sig0l Zknh 4
sha512sig1h Zknh 4
sha512sig1l Zknh 4
sha512sum0r Zknh 4
sha512sum1r Zknh 4
sha512sig0 Zknh 4
sha512sig1 Zknh 4
sha512sum0 Zknh 4
sha512sum1 Zknh 4
sm3p0 Zksh 4 4
sm3p1 Zksh 4 4
sm4ed Zksed 4 4
sm4ks Zksed 4 4
pollentropy Zkr 4
getnoise Zkr 4
clmul, clmulh Zkg 4 4 4 4
xperm.n, xperm.b Zkb 4 4 4 4
ror, rol, rori Zkb 4 4 4 4
roriw Zkb 4 4
andn, orn, xnor Zkb 4 4 4 4
pack, packu, packh Zkb 4 4 4 4
packw, packuw Zkb 4 4
rev.b, rev8 (grevi) Zkb 4 4 4 4
rev8.w (grevi) Zkb 4 4
zip (shfli) Zkb 4 4 4 4
unzip (unshfli) Zkb 4 4 4 4

Table 2: Feature sets for instructions in the scalar cryptography extension.

5
3 Scalar Cryptography Extension
As per the RISC-V Cryptographic Extensions Task Group charter: “The committee will also make ISA
extension proposals for lightweight scalar instructions for 32 and 64 bit machines that improve the perfor-
mance and reduce the code size required for software execution of common algorithms like AES and SHA
and lightweight algorithms like PRESENT and GOST.”

For context, some of these instructions have been developed based on academic work at the Univer-
sity of Bristol as part of the XCrypto project [MPP19], and work by Paris Télécom on acceleration of
lightweight block ciphers [Teh+19].

3.1 Shared Bitmanip Extension Functionality


Many of the primitive operations used in symmetric key cryptography and cryptographic hash functions
are well supported by the RISC-V Bitmanip [Risb] extension 4 . We propose that the scalar cryptographic
extension reuse a subset of the instructions from the Bitmanip extension directly. Specifically, this would
mean that a core implementing either the scalar cryptographic extensions, or the Bitmanip extension, or
both, would be able to depend on the existence of these instructions.
The following subsections give the assembly syntax of instructions proposed for inclusion in the scalar
crypto extension, along with a set of use-cases for common algorithms or primitive operations. For
information on the semantics of the instructions, we refer directly to the Bitmanip draft specification.

3.1.1 Rotations
RISC-V Bitmanip/Crypto ISA
RV32, RV64: RV64 only:
ror rd, rs1, rs2 rorw rd, rs1, rs2
rol rd, rs1, rs2 rolw rd, rs1, rs2
rori rd, rs1, imm roriw rd, rs1, imm

See [Risa, Section 3.1.1] for details of these instructions. Standard bitwise rotation is a primitive
operation in many block ciphers and hash functions. It particularly features in the ARX (Add, Rotate,
Xor) class of block ciphers 5 and stream ciphers.
Algorithms making use of 32-bit rotations: SHA256, AES (Shift Rows), ChaCha20, SM3.
Algorithms making use of 64-bit rotations: SHA512, SHA3.

3.1.2 Bit & Byte Permutations


RISC-V Bitmanip/Crypto ISA
RV32:
rev.b rd, rs1 // grevi rd, rs1, 7 - Reverse bits in bytes
rev8 rd, rs1 // grevi rd, rs1, 24 - Reverse bytes in 32-bit word

RV64:
rev.b rd, rs1 // grevi rd, rs1, 7 - Reverse bits in bytes
rev8 rd, rs1 // grevi rd, rs1, 56 - Reverse bytes in 64-bit word
rev8.w rd, rs1 // grevi rd, rs1, 24 - Reverse bytes in 32-bit words
4
At the time of writing, the Bitmanip extension is still undergoing standardisation. Please refer to the Bitmanip draft
specification [Risa] directly for the latest information, as it may be slightly ahead of what is described here.
5
https://www.cosic.esat.kuleuven.be/ecrypt/courses/albena11/slides/nicky_mouha_arx-slides.pdf

6
The scalar cryptography extension provides the following instructions for manipulating the bit and
byte endianness of data. They are all parameterisations of the Generalised Reverse with Immediate
(grevi) instruction. The scalar cryptography extension requires only the above instances of grevi be
implemented, which can be invoked via their pseudo-ops.
Reversing bytes in words is very common in cryptography when setting a standard endianness for
input and output data. Bit reversal within bytes is used for implementing the GHASH component of
Galois/Counter Mode (GCM) [Dwo07].
Cores which also implement the Bit-manipulation extension may also implement the complete grevi
instruction, with all immediate values supported. The full specification of the grevi instruction is available
in [Risa, Section 2.2.2].
RISC-V Bitmanip/Crypto ISA
RV32:
zip rd, rs1 // shfli rd, rs1, 15 - Bit interleave
unzip rd, rs1 // unshfli rd, rs1, 15 - Bit de-interleave

The zip and unzip pseudo-ops are specific instances of the more general shfli and unshfli instruc-
tions. The scalar cryptography extension requires only the above instances of [un]shfli be implemented,
which can be invoked via their pseudo-ops. Only RV32 implementations require these instructions.
They perform a bit-interleave (or de-interleave) operation, and are useful for implementing the 64-bit
rotations in the SHA3 [Nisc] algorithm on a 32-bit architecture6 . On RV64, the relevant operations in
SHA3 can be done natively, so zip and unzip are not required.
Cores which also implement the Bit-manipulation extension may also implement the complete [un]shfli
instruction, with all immediate values supported. The full specification of the shfli instruction is avail-
able in [Risa, Section 2.2.3].

3.1.3 Carry-less Multiply


RISC-V Bitmanip/Crypto ISA
RV32, RV64:
clmul rd, rs1, rs2
clmulh rd, rs1, rs2

See [Risa, Section 2.6] for details of this instruction. As is mentioned there, obvious cryptographic
use-cases for carry-less multiply are for Galois Counter Mode (GCM) block cipher operations 7 . GCM is
recommended by NIST as a block cipher mode of operation [Dwo07], and is the only required mode for
the TLS 1.3 protocol.
See Section 3.6 for additional implementation requirements for this instruction, related to data inde-
pendent execution latency.

3.1.4 Logic With Negate


RISC-V Bitmanip/Crypto ISA
RV32, RV64:
andn rd, rs1, rs2
orn rd, rs1, rs2
xnor rd, rs1, rs2

See [Risa, Section 2.1.3] for details of these instructions. These instructions are useful inside hash
functions, block ciphers and for implementing software based side-channel countermeasures like masking.
6
It is also useful for the ASCON cipher, which is a candidate in the NIST Lightweight Cryptography competition.
7
https://en.wikipedia.org/wiki/Galois/Counter_Mode

7
The andn instruction is also useful for constant time word-select in systems without the ternary Bitmanip
cmov instruction.
Useful for: SHA3 Chi step, bit-sliced function implementations and software based power/EM side-
channel countermeasures based on masking.

3.1.5 Packing
RISC-V Bitmanip/Crypto ISA
RV32, RV64: RV64:
pack rd, rs1, rs2 packw rd, rs1, rs2
packu rd, rs1, rs2 packuw rd, rs1, rs2
packh rd, rs1, rs2

See [Risa, Section 2.1.4] for details of these instructions. Some lightweight block ciphers (e.g. SPARX
[Din+16]) use sub-word data types in their primitives. The Bitmanip pack instructions are useful for
performing rotations on 16-bit data elements. They are also useful for re-arranging halfwords within
words, and generally getting data into the right shape prior to applying transforms. This is particularly
useful for cryptographic algorithms which pass inputs around as byte strings, but can operate on words
made out of those byte strings. This occurs for AES when loading blocks and keys (which may not be
word aligned) into registers to perform the round functions.

3.1.6 Crossbar Permutation Instructions


RISC-V Bitmanip/Crypto ISA
RV32, RV64:
xperm.n rd, rs1, rs2
xperm.b rd, rs1, rs2

See [Risa, Section 2.2.4] for a complete description of this instruction.


The xperm.n instruction operates on nibbles. The rs1 register contains a vector of XLEN/4 4-bit
elements. The rs2 register contains a vector of XLEN/4 4-bit indexes. The result is each element in rs2
replaced by the indexed element in rs1, or zero if the index into rs2 is out of bounds.
The xperm.b instruction operates on bytes. The rs1 register contains a vector of XLEN/8 8-bit ele-
ments. The rs2 register contains a vector of XLEN/8 8-bit indexes. The result is each element in rs2
replaced by the indexed element in rs1, or zero if the index into rs2 is out of bounds.
The instruction can be used to implement arbitrary bit permutations. For cryptography, they can
accelerate bit-sliced implementations, permutation layers of block ciphers, masking based countermeasures
and SBox operations.
Lightweight block ciphers using 4-bit SBoxes include PRESENT[Bog+07], Rectangle[Zha+15], GIFT[Ban+17],
Twine[Suz+12], Skinny, MANTIS[Bei+16], Midori [Ban+15].
National ciphers using 8-bit SBoxes include Camellia[Aok+00] (Japan), Aria[Kwo+03] (Korea), AES[Nisa]
(USA, Belgium), SM4[Blo] (China) and Kuznyechik (Russia). All of these SBoxes can be implemented
efficiently, in constant time, using the xperm.b instruction8 . Note that this technique is also suitable for
masking based side-channel countermeasures.

8
http://svn.clairexen.net/handicraft/2020/lut4perm/demo02.cc

8
3.2 Scalar AES Acceleration
This section details proposals for acceleration of the AES block cipher [Nisa] within a scalar RISC-V core,
obeying the two-read-one-write constraint on general purpose register file accesses. Supporting material,
including rationale and a design space exploration for these instructions can be found in [Mar+20].

3.2.1 RV32 Instructions


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

bs 11011 rs2 rt 000 00000 0110011 aes32esmi


bs 11001 rs2 rt 000 00000 0110011 aes32esi
bs 11111 rs2 rt 000 00000 0110011 aes32dsmi
bs 11101 rs2 rt 000 00000 0110011 aes32dsi

RISC-V Crypto ISA


aes32esi rt, rs2, bs // Encrypt: SubBytes
aes32esmi rt, rs2, bs // Encrypt: SubBytes & MixColumns
aes32dsi rt, rs2, bs // Decrypt: SubBytes
aes32dsmi rt, rs2, bs // Decrypt: SubBytes & MixColumns

These instructions are a very lightweight proposal, derived from [Saa20a]. They are designed to
enable a partial T-Table based implementation of AES in hardware, where the SubBytes, ShiftRows
and MixColumns transformations are all rolled into a single instruction, with the per-byte results then
accumulated. The bs immediate operand is a 2-bit Byte Select, and indicates which byte of the input
word is operated on. Sail Model code for each instruction is found in figure 1. Note that the instructions
source their destination register from bits 19 : 15 of the encoding, rather than the usual 11 : 7. This is
because the instructions are designed to be used such that the destination register is always the same as
rs1.
These instructions use the Equivalent Inverse Cipher construction [Nisa, Section 5.3.5]. This affects
the computation of the KeySchedule, as shown in [Nisa, Figure 15].

1 function clause e x e c u t e ( AES32 ( bs , r s 2 , r t , op ) )={


2 let rs1 val : x l e n b i t s = X( r t ) ;
3 let rs2 val : x l e n b i t s = X( r s 2 ) ;
4 l e t shamt : b i t s ( 6 ) = ( 0 b0 @ bs @ 0 b000 ) ; / * shamt = bs * 8 * /
5 let si : b i t s ( 8 ) = ( r s 2 v a l >> shamt ) [ 7 . . 0 ] ; / * SBox Input * /
6 l e t so : b i t s ( 8 ) = i f a e s o p f w d ( op ) then a e s s b o x f w d ( s i )
7 else aes sbox inv ( si ) ;
8 l e t mixed : xlenbits =
9 i f a e s o p d o e s m i x ( op ) then
10 i f a e s o p f w d ( op ) then a e s m i x c o l u m n b y t e f w d ( s o )
11 e l s e aes mixcolumn byte inv ( so )
12 else
13 0 x000000 @ s o ;
14 l e t r e s u l t : x l e n b i t s = r s 1 v a l ˆ ( mixed << shamt ) ˆ ( mixed >> ( 0 b100000 − shamt ) ) ;
15 X( r t ) = r e s u l t ;
16 RETIRE SUCCESS
17 }

Figure 1: Sail specification for the lightweight AES instructions targeting the RV32 base architecture.

9
3.2.2 RV64 Instructions
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

00 11000 1 rcon<=10 rs1 001 rd 0010011 aes64ks1i


01 11111 rs2 rs1 000 rd 0110011 aes64ks2
00 11000 00000 rs1 001 rd 0010011 aes64im
00 11011 rs2 rs1 000 rd 0110011 aes64esm
00 11001 rs2 rs1 000 rd 0110011 aes64es
00 11111 rs2 rs1 000 rd 0110011 aes64dsm
00 11101 rs2 rs1 000 rd 0110011 aes64ds

RISC-V Crypto ISA


aes64ks1i rd, rs1, rcon // KeySchedule: SubBytes, Rotate, Round Const
aes64ks2 rd, rs1, rs2 // KeySchedule: XOR summation
aes64im rd, rs1 // KeySchedule: InvMixColumns for Decrypt
aes64esm rd, rs1, rs2 // Round: ShiftRows, SubBytes, MixColumns
aes64es rd, rs1, rs2 // Round: ShiftRows, SubBytes
aes64dsm rd, rs1, rs2 // Round: InvShiftRows, InvSubBytes, InvMixColumns
aes64ds rd, rs1, rs2 // Round: InvShiftRows, InvSubBytes

These instructions are for RV64 only. They implement the SubBytes, ShiftRows and MixColumns
transformations of AES. Each round instruction takes two 64-bit registers as input, representing the
128-bit state of the AES cipher, and outputs one 64-bit result, i.e. half of the next round state. The
byte mapping of input register values to AES state and output register values is shown in Figure 2a.
Pseudocode for the instructions is illustrated in Figure 3a.

ˆ The aes64ks1i / aes64ks2 instructions are used in the encrypt KeySchedule. aes64ks1i
implements the rotation, SubBytes and Round Constant addition steps. aes64ks2 implements
the remaining xor operations.

ˆ The aes64im instruction applies the inverse MixColumns transformation to two columns of the
state array, packed into a single 64-bit register. It is used to create the inverse cipher KeySchedule,
according to the equivalent inverse cipher construction in [Nisa, Page 23, Section 5.3.5].

ˆ The aes64esm / aes64dsm instructions perform the (Inverse) SubBytes, ShiftRows and Mix-
Columns Transformations.

ˆ The aes64es / aes64ds instructions perform the (Inverse) SubBytes and ShiftRows Transforma-
tions. They are used for the last round only.

ˆ Computing the next round state uses two instructions. The high or low 8 bytes of the next state
are selected by swapping the order of the source registers. The following code snippet shows one
round of the AES block encryption. t0 and t1 hold the current round state. t2 and t3 hold the
next round state.
1 aes64esm t2 , t0 , t 1 // ShiftRows , SubBytes , MixColumns b y t e s 0 . . 7
2 aes64esm t3 , t1 , t 0 // ” ” ” ” 8..15

This proposal requires 6 instructions per AES round: 2 ld instructions to load the round key, 2
xor to add the round key to the current state and 2 of the relevant AES encrypt/decrypt instructions

10
to perform the SubBytes, ShiftRows and MixColumns round functions. An un-rolled AES-128 block
encryption with an offline KeySchedule hence requires 69 instructions in total.
These instructions are amenable to macro-op fusion. The recommended sequences are:
1 aes64esm rd1 , r s 1 , r s 2 // D i f f e r e n t d e s t i n a t i o n r e g i s t e r s ,
2 aes64esm rd2 , r s 2 , r s 1 // i d e n t i c a l s o u r c e r e g i s t e r s with swapped o r d e r .

This is similar to the recommended mulh , mul sequence in the M extension to compute a full 32∗32− > 64
bit multiplication result [Wat+14, Section 7.1].
Unlike the 32-bit AES instructions, the 64-bit variants do not use the Equivalent Inverse Cipher
construction [Nisa, Section 5.3.5].

Figure 2: Mapping of AES state between input and output registers for the round instructions. Rout1
is given by aes64esm rd, rs1, rs2 , and Rout2 by aes64esm rd, rs2, rs1 . The [Inv]ShiftRows
blocks show how to select the relevant 8 bytes for further processing from the concatenation rs2 || rs1.

11
1 f u n c t i o n c r y p t o a e s 6 4 ( rd , r s 1 , r s 2 , enc , mix ) = {
2 l e t s r : b i t s ( 6 4 ) = match enc {
3 t r u e => a e s r v 6 4 s h i f t r o w s f w d (X( r s 2 ) [ 6 3 . . 0 ] , X( r s 1 ) [ 6 3 . . 0 ] ) , / * Encrypt * /
4 f a l s e => a e s r v 6 4 s h i f t r o w s i n v (X( r s 2 ) [ 6 3 . . 0 ] , X( r s 1 ) [ 6 3 . . 0 ] ) / * Decrypt * /
5 };
6 l e t wd : b i t s ( 6 4 ) = s r [ 6 3 . . 0 ] ;
7 l e t sb : b i t s ( 6 4 ) = match enc {
8 t r u e => a e s a p p l y f w d s b o x t o e a c h b y t e (wd) , / * Encrypt * /
9 f a l s e => a e s a p p l y i n v s b o x t o e a c h b y t e (wd) / * Decrypt * /
10 };
11 X( rd ) = match ( mix , enc ) {
12 ( t r u e , t r u e ) => aes mixcolumn fwd ( sb [ 6 3 . . 3 2 ] ) @ aes mixcolumn fwd ( sb [ 3 1 . . 0 ] ) ,
13 ( t r u e , f a l s e ) => a e s m i x c o l u m n i n v ( sb [ 6 3 . . 3 2 ] ) @ a e s m i x c o l u m n i n v ( sb [ 3 1 . . 0 ] ) ,
14 ( false , ) => sb
15 };
16 RETIRE SUCCESS
17 }
18
19 function clause execute ( AES64KS1I ( rcon , r s 1 , rd ) ) = {
20 l e t tmp1 : b i t s ( 3 2 ) = X( r s 1 ) [ 6 3 . . 3 2 ] ;
21 l e t rc : bits (32) = a es deco de rco n ( rcon ) ;
22 l e t tmp2 : b i t s ( 3 2 ) = i f ( r c o n ==0xA) then tmp1 e l s e r o r 3 2 ( tmp1 , 8 ) ;
23 l e t tmp3 : b i t s ( 3 2 ) = a e s s b o x f w d ( tmp2 [ 3 1 . . 2 4 ] ) @ a e s s b o x f w d ( tmp2 [ 2 3 . . 1 6 ] ) @
24 a e s s b o x f w d ( tmp2 [ 1 5 . . 8 ] ) @ a e s s b o x f w d ( tmp2 [ 7 . . 0 ] ) ;
25 l e t r e s u l t : b i t s ( 6 4 ) = ( tmp3 ˆ r c ) @ ( tmp3 ˆ r c ) ;
26 X( rd ) = EXTZ( r e s u l t ) ;
27 RETIRE SUCCESS
28 }
29
30 function c l a u s e e x e c u t e ( AES64KS2 ( r s 2 , r s 1 , rd ) ) = {
31 l e t w0 : b i t s ( 3 2 ) = X( r s 1 ) [ 6 3 . . 3 2 ] ˆ X( r s 2 ) [ 3 1 . . 0 ] ;
32 l e t w1 : b i t s ( 3 2 ) = X( r s 1 ) [ 6 3 . . 3 2 ] ˆ X( r s 2 ) [ 3 1 . . 0 ] ˆ X( r s 2 ) [ 6 3 . . 3 2 ] ;
33 X( rd ) = w1 @ w0 ;
34 RETIRE SUCCESS
35 }
36
37 function c l a u s e e x e c u t e ( AES64IM( r s 1 , rd ) ) = {
38 l e t w0 : b i t s ( 3 2 ) = a e s m i x c o l u m n i n v (X( r s 1 ) [ 3 1 . . 0 ] ) ;
39 l e t w1 : b i t s ( 3 2 ) = a e s m i x c o l u m n i n v (X( r s 1 ) [ 6 3 . . 3 2 ] ) ;
40 X( rd ) = w1 @ w0 ;
41 RETIRE SUCCESS
42 }

Figure 3: Sail specification for the RV64 AES instructions.

12
3.3 Scalar SHA-256 / SHA-512 Acceleration
3.3.1 SHA-256 Instructions
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

00 01000 00010 rs1 001 rd 0010011 sha256sig0


00 01000 00011 rs1 001 rd 0010011 sha256sig1
00 01000 00000 rs1 001 rd 0010011 sha256sum0
00 01000 00001 rs1 001 rd 0010011 sha256sum1

RISC-V Crypto ISA


sha256sig0 rd, rs1
sha256sig1 rd, rs1
sha256sum0 rd, rs1
sha256sum1 rd, rs1

P
The sha256.* instructions implement the four σ and functions used in the SHA256 hash function
[Nisb, Section 4.1.2]. These operations are be supported for a both RV32 and RV64 targets. For RV32, the
entire XLEN source register is operated on. For RV64, the low 32-bits of the XLEN register are read and
operated on, with the result zero extended to XLEN bits. Though named for SHA256, the instructions
work for both the SHA-224 and SHA-256 parameterisations as described in [Nisb]. Sail Model code for
each instruction is found in figure 4.

1 f u n c t i o n c r y p t o s h a 2 5 6 ( op , rd , r s 1 ) = {
2 l e t inb : b i t s ( 3 2 ) = X( r s 1 ) [ 3 1 . . 0 ] ;
3 l e t r e s u l t : b i t s ( 3 2 ) = match op {
4 OP SHA256 SIG0 => r o r 3 2 ( inb , 7 ) ˆ r o r 3 2 ( inb , 18) ˆ ( i n b >> 3 ) ,
5 OP SHA256 SIG1 => r o r 3 2 ( inb , 1 7 ) ˆ r o r 3 2 ( inb , 19) ˆ ( i n b >> 1 0 ) ,
6 OP SHA256 SUM0 => r o r 3 2 ( inb , 2 ) ˆ r o r 3 2 ( inb , 13) ˆ r o r 3 2 ( inb , 2 2 ) ,
7 OP SHA256 SUM1 => r o r 3 2 ( inb , 6 ) ˆ r o r 3 2 ( inb , 11) ˆ r o r 3 2 ( inb , 2 5 )
8 };
9 X( rd ) = EXTZ( r e s u l t ) ;
10 RETIRE SUCCESS
11 }

Figure 4: Sail specification for the scalar RV32/RV64 SHA256 instructions.

3.3.2 SHA-512 Instructions


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

00 01000 00110 rs1 001 rd 0010011 sha512sig0


00 01000 00111 rs1 001 rd 0010011 sha512sig1
00 01000 00100 rs1 001 rd 0010011 sha512sum0
00 01000 00101 rs1 001 rd 0010011 sha512sum1

13
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

01 01000 rs2 rs1 000 rd 0110011 sha512sum0r


01 01001 rs2 rs1 000 rd 0110011 sha512sum1r
01 01010 rs2 rs1 000 rd 0110011 sha512sig0l
01 01110 rs2 rs1 000 rd 0110011 sha512sig0h
01 01011 rs2 rs1 000 rd 0110011 sha512sig1l
01 01111 rs2 rs1 000 rd 0110011 sha512sig1h

RISC-V Crypto ISA


RV32: RV64:
sha512sum0r rd, rs1, rs2 sha512sig0 rd, rs1
sha512sum1r rd, rs1, rs2 sha512sig1 rd, rs1
sha512sig0l rd, rs1, rs2 sha512sum0 rd, rs1
sha512sig0h rd, rs1, rs2 sha512sum1 rd, rs1
sha512sig1l rd, rs1, rs2
sha512sig1h rd, rs1, rs2

P
The sha512.* instructions implement the four σ and functions used in the SHA512 hash function
[Nisb, Section 4.1.3].
The RV32 instructions work by concatenating the two 32-bit rs1 and rs2 registers into a 64-bit word.
The high or low 32-bits of the full 64-bit function result are then written to the destination register
depending on the instruction.
For the sha512sum*r instructions, the operation is based purely on rotations; the high or low 32-
bits of the result can be selected by swapping the input source registers to the instruction. For the
sha512sig*[l|h] instructions, which include shifts, the *l instruction writes the low 32-bits of the σ
transform, and the *h instruction writes the high 32-bits.
P
The RV64 instructions compute the entire σ and functions based on a single input register, and
write the result to rd.
Though named for the SHA-512 parameterisation, the instructions can be used for all of the SHA-384,
SHA-512, SHA-512/224 and SHA-512/256 parameterisations as described in [Nisb].
Sail Model code for the RV32 and RV64 instructions can be found in Figure 5 and Figure 6 respectivley.
Note #3.1: The remaining two core functions in the SHA256/512 algorithms are the Ch and M aj
functions:

ˆ Ch(x,y,z) = z ˆ (x & (y ˆ z))

ˆ Maj(x,y,z) = x ˆ ((x ˆ y) & (x ˆ z))

As ternary functions, they are much too expensive in terms of opcode space to consider for inclusion
as dedicated instructions for such a specialist use-case.

14
1 f u n c t i o n c r y p t o s h a 5 1 2 r v 3 2 ( op , rd , r s 1 , r s 2 ) = {
2 l e t r1 : b i t s ( 3 2 ) = X( r s 1 ) [ 3 1 . . 0 ] ;
3 l e t r2 : b i t s ( 3 2 ) = X( r s 2 ) [ 3 1 . . 0 ] ;
4 l e t r e s u l t : b i t s ( 3 2 ) = match op {
5 OP SHA512 SIG0L => ( r 1 >> 1 ) ˆ ( r 1 >> 7 ) ˆ ( r 1 >> 8 ) ˆ ( r 2 << 3 1 ) ˆ ( r 2 << 2 5 ) ˆ
( r 2 << 2 4 ) ,
6 OP SHA512 SIG0H => ( r 1 >> 1 ) ˆ ( r 1 >> 7 ) ˆ ( r 1 >> 8 ) ˆ ( r 2 << 3 1 ) ˆ
( r 2 << 2 4 ) ,
7 OP SHA512 SIG1L => ( r 1 << 3 ) ˆ ( r 1 >> 6 ) ˆ ( r 1 >> 1 9 ) ˆ ( r 2 >> 2 9 ) ˆ ( r 2 << 2 6 ) ˆ
( r 2 << 1 3 ) ,
8 OP SHA512 SIG1H => ( r 1 << 3 ) ˆ ( r 1 >> 6 ) ˆ ( r 1 >> 1 9 ) ˆ ( r 2 >> 2 9 ) ˆ
( r 2 << 1 3 ) ,
9 OP SHA512 SUM0R => ( r 1 << 2 5 ) ˆ ( r 1 << 3 0 ) ˆ ( r 1 >> 2 8 ) ˆ ( r 2 >> 7 ) ˆ ( r 2 >> 2) ˆ
( r 2 << 4 ) ,
10 OP SHA512 SUM1R => ( r 1 << 2 3 ) ˆ ( r 1 >> 1 4 ) ˆ ( r 1 >> 1 8 ) ˆ ( r 2 >> 9 ) ˆ ( r 2 << 1 8 ) ˆ
( r 2 << 1 4 )
11 };
12 X( rd ) = EXTZ( r e s u l t ) ;
13 RETIRE SUCCESS
14 }

Figure 5: Sail specification for the scalar RV32 SHA512 instructions.

1 f u n c t i o n c r y p t o s h a 5 1 2 r v 6 4 ( op , rd , r s 1 ) = {
2 l e t inb : b i t s ( 6 4 ) = X( r s 1 ) [ 6 3 . . 0 ] ;
3 l e t r e s u l t : b i t s ( 6 4 ) = match op {
4 OP SHA512 SIG0 => r o r 6 4 ( inb , 1 ) ˆ r o r 6 4 ( inb , 8) ˆ ( i n b >> 7 ) ,
5 OP SHA512 SIG1 => r o r 6 4 ( inb , 1 9 ) ˆ r o r 6 4 ( inb , 61) ˆ ( i n b >> 6 ) ,
6 OP SHA512 SUM0 => r o r 6 4 ( inb , 2 8 ) ˆ r o r 6 4 ( inb , 34) ˆ ror64 ( inb ,39) ,
7 OP SHA512 SUM1 => r o r 6 4 ( inb , 1 4 ) ˆ r o r 6 4 ( inb , 18) ˆ ror64 ( inb ,41)
8 };
9 X( rd ) = EXTZ( r e s u l t ) ;
10 RETIRE SUCCESS
11 }

Figure 6: Sail specification for the scalar RV64 SHA512 instructions.

15
3.4 Scalar SM3 Acceleration
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

00 01000 01000 rs1 001 rd 0010011 sm3p0


00 01000 01001 rs1 001 rd 0010011 sm3p1

RISC-V Crypto ISA


RV32, RV64
sm3p1 rd, rs1
sm3p0 rd, rs1

These instructions are designed to accelerate the SM3 secure hash function[Ieta]. They are based on
work done in [Saa20b], and follow the same pattern as the scalar SHA2 instructions (Section 3.3).
The instructions implement versions of the P0 and P1 permutations, per the SM3 specification [Ieta].
Sail Model code for each instruction is found in figure 7.

1 f u n c t i o n c r y p t o s m 3 ( op , rd , r s 1 ) = {
2 l e t r1 : b i t s ( 3 2 ) = X( r s 1 ) [ 3 1 . . 0 ] ;
3 l e t r e s u l t : b i t s ( 3 2 ) = match op {
4 P0 => r 1 ˆ r o l 3 2 ( r1 , 9 ) ˆ r o l 3 2 ( r1 , 1 7 ) ,
5 P1 => r 1 ˆ r o l 3 2 ( r1 , 1 5 ) ˆ r o l 3 2 ( r1 , 2 3 )
6 };
7 X( rd ) = EXTZ( r e s u l t ) ;
8 RETIRE SUCCESS
9 }

Figure 7: Sail specification for the scalar RV32/RV64 SM3 instructions.

16
3.5 Scalar SM4 Acceleration
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

bs 11000 rs2 rt 000 00000 0110011 sm4ed


bs 11010 rs2 rt 000 00000 0110011 sm4ks

RISC-V Crypto ISA


RV32 / RV64:
sm4ed rt, rs2, bs
sm4ks rt, rs2, bs

This section proposes acceleration instructions for the SM4 block cipher[Blo; Ietb].
The instructions are taken from proposals found in [Saa20a]. They are very lightweight and require
only a single SBox instance. They are designed to give an very high performance improvement with
minimal area requirements, and resemble a TTables-esq software implementation.

ˆ sm4ed - Encrypt/Decrypt instruction. Applies the SBox and L transformations as part of the round
function.

ˆ sm4ks - KeySchedule instruction. Applies the SBox and L0 transformations as part of the KeySched-
ule.

Sail Model code for each instruction is found in figure 8. Note that the instructions source their
destination register from bits 19 : 15 of the encoding, rather than the usual 11 : 7. This is because the
instructions are designed to be used such that the destination register is always the same as rs1.

1 f u n c t i o n c l a u s e e x e c u t e (SM4ED( bs , r s 2 , r t ) ) = {
2 l e t shamt : b i t s ( 6 ) = ( 0 b0 @ bs @ 0 b000 ) ; / * shamt = bs * 8 * /
3 l e t s b i n : b i t s ( 8 ) = (X( r s 2 ) >> shamt ) [ 7 . . 0 ] ;
4 let x : b i t s ( 3 2 ) = 0 x000000 @ sm4 sbox ( s b i n ) ;
5 let y : bits (32) = x ˆ (x << 8 ) ˆ ( x << 2 ) ˆ
6 (x << 1 8 ) ˆ ( ( x & 0 x0000003F ) << 2 6 ) ˆ
7 ( ( x & 0 x000000C0 ) << 1 0 ) ;
8 let z : b i t s ( 3 2 ) = ( y << shamt ) ˆ ( y >> ( 0 b100000 − shamt ) ) ;
9 l e t r e s u l t : b i t s ( 3 2 ) = z ˆ X( r t ) [ 3 1 . . 0 ] ;
10 X( r t ) = EXTZ( r e s u l t ) ;
11 RETIRE SUCCESS
12 }
13
14 f u n c t i o n c l a u s e e x e c u t e (SM4KS( bs , r s 2 , r t ) ) = {
15 l e t shamt : b i t s ( 6 ) = ( 0 b0 @ bs @ 0 b000 ) ; / * shamt = bs * 8 * /
16 l e t s b i n : b i t s ( 8 ) = (X( r s 2 ) >> shamt ) [ 7 . . 0 ] ;
17 let x : b i t s ( 3 2 ) = 0 x000000 @ sm4 sbox ( s b i n ) ;
18 let y : b i t s ( 3 2 ) = x ˆ ( ( x & 0 x00000007 ) << 2 9 ) ˆ ( ( x & 0 x000000FE ) << 7 ) ˆ
19 ( ( x & 0 x00000001 ) << 2 3 ) ˆ ( ( x & 0 x000000F8 ) << 1 3 ) ;
20 let z : b i t s ( 3 2 ) = ( y << shamt ) ˆ ( y >> ( 0 b100000 − shamt ) ) ;
21 l e t r e s u l t : b i t s ( 3 2 ) = z ˆ X( r t ) [ 3 1 . . 0 ] ;
22 X( r t ) = EXTZ( r e s u l t ) ;
23 RETIRE SUCCESS
24 }

Figure 8: Sail specification for the SM4 instructions.

17
3.6 Instruction Latency Requirements
Any computation which operates on cryptographic secrets must not exhibit execution latency which is
correlated with the value of the secret. Integer multiply and carry-less multiply instructions are principly
affected by this requirement. Cores which implement the scalar cryptography extension must interpret
the following hint9 to provide some guarantees of constant time behaviour to cryptographic code:

ˆ rs1 <= rs2 - The instruction must execute with a latency which is independent of the values of its
input operands.

ˆ rs1 > rs2 - The instruction may execute with a latency which is independent of the values of its
input operands, but is not guaranteed to do so.

This hint is applied to the following instructions: mul, mulh, mulw, mulhs, mulhsu, clmul, clmulh. This
covers all of the integer multiply instructions in the M extension, and the carry-less multiply instructions
required by the scalar cryptography extension.
Cores which do not implement any data-dependant optimisations in their multipliers (e.g. early
termination, memoization) do not need to add any additional logic to interpret this hint. Note that this
hint must also be interpreted when fusing sequences of multiply instructions. For example, the sequence

mulh t0, a0, a1


mul t1, a0, a1

may be fused into a single operation inside the core. However, because it meets the rs1 <= rs2 constraint,
the fused operation must still execution with a latency which is not dependant on the input operands.
When fusing instruction pairs which partially meet the hint condition:

mulh t0, a0, a1 // a0 <= a1, latency must be independent of GPR[rs1], GPR[rs2]
mul t1, a1, a0 // a1 > a0, not guaranteed to have data-independent latency.

Cores do not need to abide by the data independent latency constraint, as this is consistent with the
un-fused sequence. Note that the register address choices may result in different execution latencies,
so long as the numerical values contained in the sourced registers do not affect the execution latency.
For example, mul a1, a2, a3 (a1 = a2 * a3) may have a different latency profile from self-assignment
mul a2, a2, a3 (a2 *= a3) or squaring mul a1, a2, a2 (a1 = a2 * a2) - so long as the contents of
GPR[a0/a1/a2] do not affect the execution latency. None of these latency requirements apply to floating
point operations.

9
We considered adding a CSR bit to turn on data-independent instruction latency guarantees, but decided against this
as it introduced new state.

18
4 Entropy Source Extension
The proposed RISC-V TRNG ISA is primarily an Entropy Source (ES) interface. A valid implementation
should satisfy properties that allow it to be used to seed standard and nonstandard cryptographic DRBGs
of virtually any state size and security level.
The purpose of this baseline specification is to guarantee that a simple, device-independent driver com-
ponent (e.g. in Linux kernel, embedded firmware, or a cryptographic library) can use the ISA instruction
to generate truly random bits. See Appendix B for rationale and further discussion. This section is also
supported by [SNM20].

4.1 PollEntropy Instruction


The main ISA-level interface consists of a single pseudoinstruction, pollentropy that returns a 32/64-bit
value in a CPU register. It is invoked in Machine Mode (which may be the only mode) as follows:
RISC-V Crypto ISA
RV32, RV64
pollentropy rd // Poll randomness. Encoding: csrrs rd, mentropy, x0

The pollentropy pseudoinstruction reads XLEN bits from the mentropy read-only machine-mode CSR
described in Table 3.

Bits Name Description


63:32 Set to 0 Upper bits are set to zero in RV64.
31:30 OPST Status: BIST (00), ES16 (01), WAIT (10), DEAD (11).
29:24 reserved For future use by the RISC-V specification.
23:16 custom Reserved for custom and experimental use.
15: 0 seed 16 bits of randomness, only when OPST=ES16.

Table 3: The mentropy CSR. It uses address 0xF15, a standard read-only machine-mode CSR.

The instruction is non-blocking and returns immediately, either with two status bits mentropy[31:30]
= OPST set to ES16 (01), indicating successful polling, or with no entropy and one of three polling failure
statuses BIST (00), WAIT (10), or DEAD (11), discussed below.
The sixteen bits of randomness in mentropy[15:0]=seed polled with ES16 status must be cryp-
tographically conditioned before they can be used as (up to 8 bits of) keying material. We suggest
entropy output to be post-processed in blocks of at least 256 bits, with 128 bit resulting output block.
When OPST is not ES16, seed should be set to 0. An implementation may safely set reserved and
custom bits to zeros. A polling software interface should ignore their contents.
The Status Bits at mentropy[31:30]=OPST:

00 BIST indicates that Built-In Self-Test “on-demand” (BIST) statistical testing is being performed. In
typical implementations, BIST will last only a few milliseconds, up to a few hundred. If the system
returns temporarily to BIST from any other state, this signals a non-fatal (usually non-actionable)
self-test alarm. BIST is also used to signal test mode ( getnoise ).

01 ES16 indicates success; the low bits mentropy[15:0] will have 16 bits of randomness which must
be guaranteed to have at least 8 bits true entropy regardless of implementation. For example,
0x4000ABCD is a valid ES16 status output on RV32, with 0xABCD being the seed value.

19
NOISE TEST = 0 NOISE TEST = 1

reset test mode disabled

enabled:
BIST BIST
non-fatal BIST
alarm WAIT
ES16
seed GetNoise active:
WAIT ES16
ok! no PollEntropy

fatal
DEAD DEAD DEAD
error
test mode disabled
stays dead!

Figure 9: Normally the operational state alternates between WAIT (no data) and ES16, which means
that 16 bits of randomness (seed) has been polled. BIST (Built-in Self-Test) only occurs after reset or to
signal a non-fatal self-test alarm (if reached after WAIT or ES16). DEAD is an unrecoverable error state.
In test mode (when GetNoise is active) WAIT and ES16 states are unavailable in PollEntropy.

10 WAIT means that a sufficient amount of entropy is not yet available. This is not an error condition
and may (in fact) be more frequent than ES16, since true entropy sources may not have very high
bandwidth. If polling in a loop, we suggest calling wfi (wait for interrupt) before the next poll.

11 DEAD is an unrecoverable self-test error. This may indicate a hardware fault, a security issue, or
(extremely rarely) a type-1 statistical false positive in the continuous testing procedures. Implemen-
tations do not need to implement DEAD as it may not require an end-user notification; an immediate
lock-down may be a more appropriate response in dedicated security devices.

4.2 Polling Mechanism with WFI


Figure 9 illustrates operational state (OPST) transitions. The state is usually either WAIT or ES16. There
are no mandatory interrupts. However, the polling mechanism should be implemented in a way that
allows even generic non-interrupt drivers to benefit from interrupts.
We specifically recommend against busy-loop polling on this instruction as it may have relatively
low bandwidth. Even though no specific interrupt sequence is specified, it is required that the wfi
(wait for interrupt) instruction is available. Cores which implement mentropy must not raise an Illegal
Instruction Exception when executing wfi unless required to do so by the Timeout Wait bit of the
mstatus register, as detailed in Section 3.1.6.5 of the Privileged ISA Manual [WA19]. The RISC-V ISA
allows wfi to be implemented as a nop . As a minimum requirement for portable drivers, a WAIT or
BIST from pollentropy should be followed by a wfi before another pollentropy instruction is
issued. There is no need to poll after a DEAD state.
To guarantee that no sensitive data is read twice and that different callers don’t get correlated output,
it is suggested that hardware implements “wipe-on-read” on the randomness pathway during each read
(successful poll). For the same reasons, only complete and fully processed randomness words shall be
made available via pollentropy (no half-conditioned buffers or even full buffers in WAIT state – even if
they are to be ignored by compliant callers).

20
4.3 Entropy Source Requirements
Output seed from pollentropy is not necessarily fully conditioned randomness due to hardware limi-
tations of smaller, low-powered implementations. However minimum requirements are defined. A caller
should not use the output directly but poll twice the amount of required entropy, cryptographically
condition (hash) it, and use that to seed a cryptographic DRBG.
The expectation is that seed output passes typical randomness tests, but pseudorandom conditioner
may cause weak entropy sources to pass such tests as well. The results of such tests should not be
confused with the security or robustness of an entropy source. Evaluation of entropy sources involves
an investigation of the stochastic model of the noise source, an analysis of the conditioning component,
continuous tests, etc. Entropy estimation (statistical testing) is just a part of this process.
The specification of entropy source requirements is complicated by the existence of two slightly con-
flicting standards – NIST SP 800-90B [Tur+18] for FIPS 140-3 and AIS 31 [KS11] for Common Criteria
evaluations. RISC-V hardware vendors may design their entropy sources to meet either one of these
standards (as different type of evidence is required for each certification). It is also possible for imple-
mentations to meet both criteria. Alternatively, for virtual entropy sources (DRBGs), the post-processed
output must meet the “256-bit security” requirements of Category 5 post-quantum cryptography.

4.3.1 Virtual Entropy Sources – Secret State Size Requirement


A pollentropy implementation can always output fully conditioned, perfectly distributed numbers.
However, it is required that if a DRBG is used as a source, it must have an internal state with at least 256
bits of secret entropy (Example: a CTR DRBG built from AES-128 is not sufficient). Any implementation
of pollentropy that limits the security strength shall not reduce it to less than 256 bits.
RISC-V requires drivers to implement at least 2-to-1 cryptographic post-processing in software with
the expectation that the final output from this post-processing would should have “computationally
bounded full entropy”. The computational bound is set so that a random-distinguishing attack should
require computational resources comparable or greater than those required for key search a block cipher
with a 256-bit key (e.g. AES 256). This is equivalent to a Category 5 classical or quantum adversary
[NIS16, Section 4.A.4 Security Strength Categories].

4.3.2 FIPS 140-3 / SP 800-90 IID Track


For FIPS 140-3 certification, vendors should design their physical entropy sources so that they can be
submitted to the “IID track”. This usually requires post-processing of noise in hardware.10

§E1 Entropy Requirement. Each 16-bit output sample (seed) must have more than 8 bits of in-
dependent, unpredictable randomness. This minimum requirement is satisfied if (in a NIST SP
800-90B [Tur+18] assessment) 128 bits of output entropy can be obtained from each 256-bit (16×16-
bit) pollentropy output sequence via a vetted cryptographic conditioning algorithm (see Section
3.1.5.1.2 in [Tur+18]). This means that the actual SP 800-90B entropy assessment must yield
significantly more than 8 bits (vendors should aim at 16).
Driver developers may make this conservative assumption but are not prohibited from using more
than twice the number of seed bits relative to the desired resulting entropy.

§E2 SP 800-90B IID The output must be close to Independent and Identically Distributed (IID),
meaning that the output distribution does not deteriorate over time and that output words convey
10
NIST SP 800-90B [Tur+18] and its FIPS 140-3 Implementation Guidance [NC20, Section D.K.] are being revised at the
time of writing. The type of appropriate post-processing for IID track depends on the physical noise source. Some simple
post-processing methods are currently considered to be part of the sampling (“digitization”) step of the process. Vetted or
non-vetted conditioning may also be required in hardware, forming a hardware-software conditioning chain.

21
little information about each other. This requirement is satisfied if the construction of the physical
source and sampling mechanism suggests nothing against the IID assumption and the IID tests in
Section 5 of NIST SP 800-90B [Tur+18] are consistently passed.

Note that both requirements must be satisfied (§E1 may appear looser than §E2). FIPS 140-3 certifi-
cation of course imposes many additional requirements.

4.3.3 Common Criteria / AIS 31 PTG.2 Class


For alternative Common Criteria certification (or self-certification) vendors should target AIS 31 PTG.2
requirements [KS11, Sect. 4.3.]. Entropy sources (seed bits) are viewed as “internal random numbers” in
the context of AIS 31. Note that PTG.2 does not preclude other certification levels – especially PTG.3
when combined with appropriate post-processing and DRBG on the software side.
For validation purposes, the PTG.2 requirements may be mapped to security controls §T 1-3 (Sect.
4.4) and the pollentropy interface as follows:

§P1 [PTG.2.1] Start-up tests map to §T1 and reset-triggered (on-demand) BIST tests.

§P2 [PTG.2.2] Continuous testing total failure maps to §T2 and the DEAD state.

§P3 [PTG.2.3] Online tests are continuous tests of §T2 – entropy output is prevented in the BIST state.

§P4 [PTG.2.4] Is related to the design of effective entropy source health tests, which we encourage.

§P5 [PTG.2.5] Raw random sequence may be checked via the GetNoise interface (Section 4.5).

§P6 [PTG.2.6] Test Procedure A [KS11, Sect 2.4.4.1] is part of the evaluation process, and we suggest
self-evaluation using these tests even if Common Criteria certification is not sought by a vendor.

§P7 [PTG.2.7] Average Shannon entropy of “internal random bits” exceeds 0.997.

Even though §E1, §E2, and post-processing imply that less than 16 of seed bits may be designated
as internal random bits for §P7, we recommend that all 16 ES bits meet this requirement.
Note that in Common Criteria validation the SP 800-90B IID requirement of §E2 is not stated.
However, it may also be satisfied – and the H=0.997 level of §P7 / PTG.2.7 leaves relatively little “room”
for an entropy defect (mutual entropy). Also note that the SP 800-90B validation process is concerned
with min-entropy, not Shannon entropy, so these numbers are not directly comparable.

4.4 Security Controls (Tests)


The primary purpose of a cryptographic entropy source is to produce secret keying material. In almost
all cases a hardware entropy source must implement appropriate security controls to guarantee unpre-
dictability, prevent leakage, detect attacks, and to deny adversarial control over the entropy output or
ts generation mechanism. Security controls are not mandatory for RISC-V (in case of virtual entropy
sources) but are required for security certification.
Many of the security controls built into the device are called “health checks.” Health checks can
take the form of integrity checks, start-up tests, and on-demand tests. These tests can be implemented
in hardware or firmware; typically both. Several are mandated by standards such as NIST SP 800-90B
[NIS19]. The choice of appropriate health tests depends on the certification target, system architecture,
the threat model, entropy source type, and other factors.
Health checks are not intended for hardware diagnostics but for detecting security issues – hence
the default action should be aimed at damage control (prevent weak crypto keys from being generated).
Additional “debug” mechanisms may be implemented if necessary, but then the device must be outside
production use.

22
§T1 On-demand testing. A sequence of simple tests is invoked via resetting, rebooting, or powering-
up the hardware (not an ISA signal). The implementation will simply return BIST during the
initial start-up self-test period; in any case, the driver must wait for them to finish before starting
cryptographic operations. Upon failure the entropy source will enter a no-output DEAD state.

§T2 Continuous checks. If an error is detected in continuous tests or environmental sensors, the
entropy source will enter a no-output state. We define that a non-critical alarm is signaled if the
entropy source returns to BIST state from live (WAIT or ES16) states. Such a BIST alarm should be
latched until polled at least once. Critical failures will result in DEAD state immediately. A hardware-
based continuous testing mechanism must not make statistical information externally available, and
it must be zeroized periodically or upon demand via reset, power-up, or similar signal.

§T3 Fatal error states, Since the security of most cryptographic operations depends on the entropy
source, a system-wide “default deny” security policy approach is appropriate for most entropy source
failures. A hardware test failure should at least result in the DEAD state and possibly reset/halt. It’s
a show stopper: The entropy source (or its cryptographic client application) must not be allowed
to run if its secure operation can’t be guaranteed.

4.5 GetNoise Test Interface


The optional GetNoise interface allows access to “raw noise” and is mainly intended for manufacturer
tests and validation of security modules. It is must not be used as a source of randomness or for other
production use. Its contents and behavior must be interpreted in the context of mvendorid, marchid,
and mimpid CSR identifiers, so getnoise is effectively “custom”.
The interface consists of the mnoise machine-mode CSR, which (unlike mentropy) is read-write. We
define a pseudoinstruction for reading it:
RISC-V Crypto ISA
RV32, RV64
getnoise rd // Noise source test. Encoding: csrrs rd, mnoise, x0

The Crypto ISE defines the semantics of only single bit, mnoise[31], which is named NOISE_TEST.
The only universal function is for enabling/disabling this interface. This is because the test interface
effectively disables pollentropy ; this way a soft reset can also reset this feature.
The mnoise CSR uses address 0x7A9, indicating it is a standard read-write machine-mode CSR. This
places it adjacently to debug/trace CSRs, indicating that it is not expected to be used in production.
When NOISE_TEST = 1 in getnoise and mnoise, pollentropy and mentropy must not return
anything via ES16; we recommend that it is in BIST state. When NOISE_TEST is again disabled, the
entropy source shall return from BIST via a zeroization and self-test mechanism (effectively a reset).
When not implemented (e.g. in virtual machines), getnoise can permanently read zero (0x00000000).
When available, but NOISE_TEST = 0, getnoise can return a nonzero constant such as 0x00000001.
The behavior of other input and output bits is left to the vendor. Although not used in production,
we recommend that the instruction is always non-blocking.

23
References
[AMD17] AMD. AMD Random Number Generator. AMD TechDocs. 2017. url: https://www.amd.
com/system/files/TechDocs/amd-random-number-generator.pdf.
[And20] Ross J. Anderson. Security engineering - a guide to building dependable distributed systems
(3. ed.) Wiley, 2020. isbn: 978-1-119-64278-7. url: https://www.cl.cam.ac.uk/ ~rja14/
book.html.
[Aok+00] Kazumaro Aoki, Tetsuya Ichikawa, Masayuki Kanda, Mitsuru Matsui, Shiho Moriai, Junko
Nakajima, and Toshio Tokita. “Camellia: A 128-bit block cipher suitable for multiple plat-
forms—design andanalysis”. In: International Workshop on Selected Areas in Cryptography.
Springer. 2000, pp. 39–56.
[ARM17] ARM. ARM TrustZone True Random Number Generator: Technical Reference Manual. ARM
100976 0000 00 en (rev. r0p0). 2017. url: http://infocenter.arm.com/help/index.jsp?
topic=/com.arm.doc.100976_0000_00_en.
[Bak86] Per Bak. “The Devil’s Staircase”. In: Phys. Today 39.12 (1986), pp. 38–45. doi: 10.1063/1.
881047.
[Ban+15] Subhadeep Banik, Andrey Bogdanov, Takanori Isobe, Kyoji Shibutani, Harunaga Hiwatari,
Toru Akishita, and Francesco Regazzoni. “Midori: A block cipher for low energy”. In: Inter-
national Conference on the Theory and Application of Cryptology and Information Security.
Springer. 2015, pp. 411–436.
[Ban+17] Subhadeep Banik, Sumit Kumar Pandey, Thomas Peyrin, Yu Sasaki, Siang Meng Sim, and
Yosuke Todo. “GIFT: a small present”. In: International Conference on Cryptographic Hard-
ware and Embedded Systems. Springer. 2017, pp. 321–345.
[Bar+12] Romain Bardou, Riccardo Focardi, Yusuke Kawamoto, Lorenzo Simionato, Graham Steel,
and Joe-Kai Tsay. “Efficient Padding Oracle Attacks on Cryptographic Hardware”. In: Ad-
vances in Cryptology - CRYPTO 2012 - 32nd Annual Cryptology Conference, Santa Barbara,
CA, USA, August 19-23, 2012. Proceedings. Ed. by Reihaneh Safavi-Naini and Ran Canetti.
Vol. 7417. Lecture Notes in Computer Science. Springer, 2012, pp. 608–625. isbn: 978-3-642-
32008-8. doi: 10.1007/978-3-642-32009-5\_36.
[Bau+11] Mathieu Baudet, David Lubicz, Julien Micolod, and André Tassiaux. “On the Security of
Oscillator-Based Random Number Generators”. In: J. Cryptology 24.2 (2011), pp. 398–425.
doi: 10.1007/s00145-010-9089-3.
[BBS86] Lenore Blum, Manuel Blum, and Mike Shub. “A Simple Unpredictable Pseudo-Random Num-
ber Generator”. In: SIAM J. Comput. 15.2 (1986), pp. 364–383. doi: 10.1137/0215025.
[Bei+16] Christof Beierle, Jérémy Jean, Stefan Kölbl, Gregor Leander, Amir Moradi, Thomas Peyrin,
Yu Sasaki, Pascal Sasdrich, and Siang Meng Sim. “The SKINNY family of block ciphers and
its low-latency variant MANTIS”. In: Annual International Cryptology Conference. Springer.
2016, pp. 123–153.
[BK15] Elaine Barker and John Kelsey. Recommendation for Random Number Generation Using
Deterministic Random Bit Generators. NIST Special Publication SP 800-90A Revision 1.
2015. doi: 10.6028/NIST.SP.800-90Ar1.
[Blo] The SM4 Block Cipher Algorithm And Its Modes Of Operations. https://tools.ietf.org/
id/draft-crypto-sm4-00.html. Retrieved 26th March, 2020.
[Blu86] Manuel Blum. “Independent unbiased coin flips from a correlated biased source – A finite
state Markov chain”. In: Combinatorica 6.2 (1986), pp. 97–108. doi: 10.1007/BF02579167.

24
[Bog+07] Andrey Bogdanov, Lars R Knudsen, Gregor Leander, Christof Paar, Axel Poschmann, Matthew
JB Robshaw, Yannick Seurin, and Charlotte Vikkelsoe. “PRESENT: An ultra-lightweight
block cipher”. In: International workshop on cryptographic hardware and embedded systems.
Springer. 2007, pp. 450–466.
[Dav02] Robert B. Davies. Exclusive OR (XOR) and hardware random number generators. Author-
hosted manuscript. 2002. url: http://www.robertnz.net/pdf/xor2.pdf.
[Din+16] Daniel Dinu, Léo Perrin, Aleksei Udovenko, Vesselin Velichkov, Johann Großschädl, and Alex
Biryukov. “Design strategies for ARX with provable bounds: Sparx and LAX”. In: Interna-
tional Conference on the Theory and Application of Cryptology and Information Security.
Springer. 2016, pp. 484–513.
[Dwo07] Morris Dworkin. “NIST Special Publication 800-38D: Recommendation for Block Cipher
Modes of Operation: Galois/Counter Mode (GCM) and GMAC”. In: US National Institute
of Standards and Technology http://csrc. nist. gov/publications/nistpubs/800-38D/SP-800-
38D. pdf (2007).
[Gro96] Lov K. Grover. “A Fast Quantum Mechanical Algorithm for Database Search”. In: Proceedings
of the Twenty-eighth Annual ACM Symposium on Theory of Computing. STOC ’96. ACM,
1996, pp. 212–219. doi: 10.1145/237814.237866. url: http://arxiv.org/pdf/quant-
ph/9605043.
[HKM12] Mike Hamburg, Paul Kocher, and Mark E. Marson. Analysis of Intel’s Ivy Bridge Digital
Random Number Generator. Technical Report, Cryptography Research (Prepared for Intel).
2012.
[HL98] Ali Hajimiri and Thomas H. Lee. “A general theory of phase noise in electrical oscillators”.
In: IEEE Journal of Solid-State Circuits 33.2 (1998), pp. 179–194. doi: 10.1109/4.658619.
[HLL99] Ali Hajimiri, Sotirios Limotyrakis, and Thomas H. Lee. “Jitter and phase noise in ring oscil-
lators”. In: IEEE Journal of Solid-State Circuits 34.6 (1999), pp. 790–804. doi: 10.1109/4.
766813. url: https://authors.library.caltech.edu/4916/1/HAJieeejssc99a.pdf.
[HSHC20] Darren Hurley-Smith and Julio César Hernández-Castro. “Quantum Leap and Crash: Search-
ing and Finding Bias in Quantum Random Number Generators”. In: ACM Transactions on
Privacy and Security 23.3 (2020), pp. 1–25. doi: 10.1145/3403643.
[Ieta] The SM3 Cryptographic Hash Function. https://tools.ietf.org/id/draft-oscca-cfrg-
sm3-02.html. Retrieved 12th March, 2020.
[Ietb] The SM4 Block Cipher Algorithm And Its Modes Of Operations. https://tools.ietf.org/
id/draft-crypto-sm4-00.html. Retrieved 26th March, 2020.
[ITU19] ITU. Quantum noise random number generator architecture. Recommendation ITU-T X.1702.
2019. url: https://www.itu.int/rec/T-REC-X.1702-201911-I/en.
[Jaq+20] Samuel Jaques, Michael Naehrig, Martin Roetteler, and Fernando Virdia. “Implementing
Grover Oracles for Quantum Key Search on AES and LowMC”. In: Advances in Cryptology -
EUROCRYPT 2020 - 39th Annual International Conference on the Theory and Applications
of Cryptographic Techniques, Zagreb, Croatia, May 10-14, 2020, Proceedings, Part II. Ed. by
Anne Canteaut and Yuval Ishai. Vol. 12106. Lecture Notes in Computer Science. Springer,
2020, pp. 280–310. isbn: 978-3-030-45723-5. doi: 10.1007/978-3-030-45724-2\_10. url:
https://arxiv.org/pdf/1910.01700.pdf.
[JR58] Wilbur B. Davenport Jr. and William L. Root. An Introduction to the Theory of Random
Signals and Noise. McGraw-Hill, 1958, p. 401. url: https : / / ieeexplore . ieee . org /
servlet/opac?bknumber=5265617.

25
[KS01] Wolfgang Killmann and Werner Schindler. A Proposal for: Functionality classes and evalua-
tion methodology for true (physical) random number generators. AIS 31, Version 3.1, English
Translation, BSI. 2001. url: https : / / www . bsi . bund . de / SharedDocs / Downloads / DE /
BSI/Zertifizierung/Interpretationen/AIS_31_Functionality_classes_evaluation_
methodology_for_true_RNG_e.html.
[KS11] Wolfgang Killmann and Werner Schindler. A Proposal for: Functionality classes for ran-
dom number generators. AIS 20 / AIS 31, Version 2.0, English Translation, BSI. 2011.
url: https : / / www . bsi . bund . de / SharedDocs / Downloads / DE / BSI / Zertifizierung /
Interpretationen/AIS_31_Functionality_classes_for_random_number_generators_
e.html.
[KSV13] Dusko Karaklajic, Jörn-Marc Schmidt, and Ingrid Verbauwhede. “Hardware Designer’s Guide
to Fault Attacks”. In: IEEE Trans. Very Large Scale Integr. Syst. 21.12 (2013), pp. 2295–
2306. doi: 10.1109/TVLSI.2012.2231707.
[Kwo+03] Daesung Kwon, Jaesung Kim, Sangwoo Park, Soo Hak Sung, Yaekwon Sohn, Jung Hwan
Song, Yongjin Yeom, E-Joong Yoon, Sangjin Lee, Jaewon Lee, et al. “New block cipher:
ARIA”. In: International Conference on Information Security and Cryptology. Springer. 2003,
pp. 432–445.
[Lac08] Patrick Lacharme. “Post-Processing Functions for a Biased Physical Random Number Gen-
erator”. In: Fast Software Encryption, 15th International Workshop, FSE 2008, Lausanne,
Switzerland, February 10-13, 2008, Revised Selected Papers. Ed. by Kaisa Nyberg. Vol. 5086.
Lecture Notes in Computer Science. Springer, 2008, pp. 334–342. isbn: 978-3-540-71038-7.
doi: 10.1007/978-3-540-71039-4\_21.
[Lee+04] Ruby B Lee, ZJ Shi, Yiqun Lisa Yin, Ronald L Rivest, and Matthew JB Robshaw. “On
permutation operations in cipher design”. In: International Conference on Information Tech-
nology: Coding and Computing, 2004. Proceedings. ITCC 2004. Vol. 2. IEEE. 2004, pp. 569–
577.
[Lib+13] John S. Liberty, Adrian Barrera, David W. Boerstler, Thomas B. Chadwick, Scott R. Cot-
tier, H. Peter Hofstee, Julie A. Rosser, and Marty L. Tsai. “True hardware random number
generation implemented in the 32-nm SOI POWER7+ processor”. In: IBM J. Res. Dev. 57.6
(2013). doi: 10.1147/JRD.2013.2279599.
[M2̈0] Stephan Müller. Documentation and Analysis of the Linux Random Number Generator, Ver-
sion 3.6. Prepared for BSI by atsec information security GmbH. 2020. url: https : / /
www.bsi.bund.de/SharedDocs/Downloads/EN/BSI/Publications/Studies/LinuxRNG/
LinuxRNG_EN.pdf.
[Mar+20] Ben Marshall, G. Richard Newell, Dan Page, Markku-Juhani O. Saarinen, and Claire Wolf.
The design of scalar AES Instruction Set Extensions for RISC-V. Cryptology ePrint Archive,
Report 2020/930. https://eprint.iacr.org/2020/930. 2020.
[Mec18] John P. Mechalas. Intel Digital Random Number Generator (DRNG) Software Implementa-
tion Guide. Intel Technical Report, Version 2.1. 2018. url: https://software.intel.com/
content/www/us/en/develop/articles/intel- digital- random- number- generator-
drng-software-implementation-guide.html.
[MM09] A. Theodore Markettos and Simon W. Moore. “The Frequency Injection Attack on Ring-
Oscillator-Based True Random Number Generators”. In: Cryptographic Hardware and Embed-
ded Systems - CHES 2009, 11th International Workshop, Lausanne, Switzerland, September
6-9, 2009, Proceedings. Ed. by Christophe Clavier and Kris Gaj. Vol. 5747. Lecture Notes in
Computer Science. Springer, 2009, pp. 317–331. isbn: 978-3-642-04137-2. doi: 10.1007/978-
3-642-04138-9\_23.

26
[Mog+20] Daniel Moghimi, Berk Sunar, Thomas Eisenbarth, and Nadia Heninger. “TPM-FAIL: TPM
meets Timing and Lattice Attacks”. In: 29th USENIX Security Symposium (USENIX Security
20). USENIX Association, 2020, To appear. url: https://www.usenix.org/conference/
usenixsecurity20/presentation/moghimi-tpm.
[MPP19] Ben Marshall, Daniel Page, and Thinh Pham. XCrypto: a cryptographic ISE for RISC-V.
Tech. rep. 1.0.0. 2019. url: https://github.com/scarv/xcrypto.
[NC20] NIST and CCCS. Implementation Guidance for FIPS 140-3 and the Cryptographic Module
Validation Program. CMVP Update. 2020. url: https://csrc.nist.gov/CSRC/media/
Projects / cryptographic - module - validation - program / documents / fips % 20140 - 3 /
FIPS%20140-3%20IG.pdf.
[NCS20] NCSC. Quantum security technologies. White paper, Version 1.0. National Cyber Security
Centre (UK). 2020. url: https://www.ncsc.gov.uk/whitepaper/quantum- security-
technologies.
[Neu51] John von Neumann. “Various Techniques Used in Connection with Random Digits”. In:
Monte Carlo Method. Ed. by A. S. Householder, G. E. Forsythe, and H. H. Germond. Vol. 12.
National Bureau of Standards Applied Mathematics Series. Washington, DC: US Government
Printing Office, 1951. Chap. 13, pp. 36–38. url: https://mcnp.lanl.gov/pdf_files/nbs_
vonneumann.pdf.
[Nisa] Advanced Encryption Standard (AES). National Institute of Standards and Technology (NIST)
Federal Information Processing Standard (FIPS) 197. 2001. url: https://www.nist.gov/
publications/advanced-encryption-standard-aes.
[Nisb] Secure Hash Standard (SHS). National Institute of Standards and Technology (NIST) Fed-
eral Information Processing Standard (FIPS) 180-4. 2015. url: https://csrc.nist.gov/
publications/detail/fips/180/4/final.
[Nisc] SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions. National Insti-
tute of Standards and Technology (NIST) Federal Information Processing Standard (FIPS)
202. 2015. url: https://csrc.nist.gov/publications/detail/fips/202/final.
[NIS01] NIST. Advanced Encryption Standard (AES). Federal Information Processing Standards Pub-
lication FIPS 197. 2001. doi: 10.6028/NIST.FIPS.197.
[NIS15a] NIST. Secure Hash Standard (SHS). Federal Information Processing Standards Publication
FIPS 180-4. 2015. doi: 10.6028/NIST.FIPS.180-4.
[NIS15b] NIST. SHA-3 Standard: Permutation-Based Hash and Extendable-Output Functions. Federal
Information Processing Standards Publication FIPS 202. 2015. doi: 10.6028/NIST.FIPS.
202.
[NIS16] NIST. Submission Requirements and Evaluation Criteria for the Post-Quantum Cryptography
Standardization Process. Official Call for Proposals, National Institute for Standards and
Technology. 2016. url: http : / / csrc . nist . gov / groups / ST / post - quantum - crypto /
documents/call-for-proposals-final-dec-2016.pdf.
[NIS19] NIST. Security Requirements for Cryptographic Modules. Federal Information Processing
Standards Publication FIPS 140-3. 2019. doi: 10.6028/NIST.FIPS.140-3.
[NSA15] NSA/CSS. Commercial National Security Algorithm Suite. 2015. url: https://apps.nsa.
gov/iaarchive/programs/iad-initiatives/cnsa-suite.cfm.
[Ram20] Rambus. TRNG-IP-76 / EIP-76 Family of FIPS Approved True Random Generators. Com-
mercial Crypto IP. Formerly (2017) available from Inside Secure. 2020. url: https://www.
rambus.com/security/crypto- accelerator- hardware- cores/basic- crypto- blocks/
trng-ip-76/.

27
[Ric44] Stephen O. Rice. “Mathematical analysis of random noise (Parts I-II)”. In: The Bell System
Technical Journal 23.3 (1944), pp. 282–332. doi: 10.1002/j.1538-7305.1944.tb00874.x.
[Ric45] Stephen O. Rice. “Mathematical analysis of random noise (Parts III-IV))”. In: The Bell
System Technical Journal 24.1 (1945), pp. 46–156. doi: 10 . 1002 / j . 1538 - 7305 . 1945 .
tb00453.x.
[Risa] RISC-V Bit manipulation extension draft proposal. url: https : / / github . com / riscv /
riscv-bitmanip/blob/master/bitmanip-draft.pdf.
[Risb] RISC-V Bit manipulation extension repository. url: https://github.com/riscv/riscv-
bitmanip.
[Risc] RISC-V Instruction Encoding Allocation Policy. url: https://docs.google.com/document/
d/1uC6QAyFmglGbO9kRR-X8LQWga6B3yBJR7-iw6ZXnfG8/edit#.
[Saa20a] Markku-Juhani O. Saarinen. Lightweight AES ISA. https://github.com/mjosaarinen/
lwaes_isa. Retrieved 24th January, 2020. Jan. 2020.
[Saa20b] Markku-Juhani O. Saarinen. Lightweight SHA ISA. https://github.com/mjosaarinen/
lwsha_isa. Retrieved 26th March, 2020. Mar. 2020.
[Sai] SAIL ISA Specification Language. url: https://github.com/rems-project/sail.
[Sal19] Jim Salter. How a months-old AMD microcode bug destroyed my weekend. Ars Technica. 2019.
url: https://arstechnica.com/gadgets/2019/10/how-a-months-old-amd-microcode-
bug-destroyed-my-weekend/.
[Sho94] Peter W. Shor. “Algorithms for quantum computation: Discrete logarithms and factoring”.
In: 35th Annual Symposium on Foundations of Computer Science, Santa Fe, New Mexico,
USA, 20-22 November 1994. IEEE, 1994, pp. 124–134. doi: 10.1109/SFCS.1994.365700.
url: https://arxiv.org/abs/quant-ph/9508027.
[SNM20] Markku-Juhani O. Saarinen, G. Richard Newell, and Ben Marshall. “Building a Modern
TRNG: An Entropy Source Interface for RISC-V”. In: 4th Workshop on Attacks and Solutions
in Hardware Security (ASHES’20), November 13, 2020, Virtual Event, USA. ACM, 2020. doi:
10.1145/3411504.3421212. url: https://eprint.iacr.org/2020/866.
[Suz+12] Tomoyasu Suzaki, Kazuhiko Minematsu, Sumio Morioka, and Eita Kobayashi. “TWINE: A
Lightweight Block Cipher for Multiple Platforms”. In: International Conference on Selected
Areas in Cryptography. Springer. 2012, pp. 339–354.
[Teh+19] Etienne Tehrani, Tarik Graba, Abdelmalek Si Merabet, Sylvain Guilley, and Jean-Luc Danger.
“Classification of Lightweight Block Ciphers for Specific Processor Accelerated Implementa-
tions”. In: 26th IEEE International Conference on Electronics Circuits and Systems. Nov.
2019.
[Tur+18] Meltem Sönmez Turan, Elaine Barker, John Kelsey, Kerry A. McKay, Mary L. Baish, and
Mike Boyle. Recommendation for the Entropy Sources Used for Random Bit Generation. NIST
Special Publication SP 800-90B. 2018. doi: 10.6028/NIST.SP.800-90B.
[Val+10] Boyan Valtchanov, Viktor Fischer, Alain Aubert, and Florent Bernard. “Characterization
of randomness sources in ring oscillator-based true random number generators in FPGAs”.
In: 13th IEEE International Symposium on Design and Diagnostics of Electronic Circuits
and Systems, DDECS 2010, Vienna, Austria, April 14-16, 2010. Ed. by Elena Gramatová,
Zdenek Kotásek, Andreas Steininger, Heinrich Theodor Vierhaus, and Horst Zimmermann.
IEEE Computer Society, 2010, pp. 48–53. isbn: 978-1-4244-6612-2. doi: 10.1109/DDECS.
2010.5491819. url: https://ieeexplore.ieee.org/xpl/conhome/5484099/proceeding.

28
[VD10] Michal Varchola and Milos Drutarovský. “New High Entropy Element for FPGA Based True
Random Number Generators”. In: Cryptographic Hardware and Embedded Systems, CHES
2010, 12th International Workshop, Santa Barbara, CA, USA, August 17-20, 2010. Proceed-
ings. Ed. by Stefan Mangard and François-Xavier Standaert. Vol. 6225. Lecture Notes in
Computer Science. Springer, 2010, pp. 351–365. isbn: 978-3-642-15030-2. doi: 10.1007/978-
3-642-15031-9\_24.
[WA19] Andrew Waterman and Krste Asanović, eds. The RISC-V Instruction Set Manual, Volume
II: Privileged Architecture. Document Version 20190608-Priv-MSU-Ratified. RISC-V Foun-
dation, 2019. url: https://riscv.org/specifications/.
[Wat+14] Andrew Waterman, Yunsup Lee, David Patterson, and Krste Asanovic. “The RISC-V in-
struction set manual”. In: Volume I: User-Level ISA’, version 2 (2014).
[Zha+15] Wentao Zhang, Zhenzhen Bao, Dongdai Lin, Vincent Rijmen, Bohan Yang, and Ingrid Ver-
bauwhede. “RECTANGLE: a bit-slice lightweight block cipher suitable for multiple plat-
forms”. In: Science China Information Sciences 58.12 (2015), pp. 1–15.

29
A Instruction Encodings
This section describes the encodings for the Scalar Cryptography extension. It is developed in accordance
with the RISC-V Instruction Encodings Allocation Policy [Risc]. Listed below are the evaluation criteria
for the proposed encodings. Some of these are re-statements of the obvious, some are re-statements of
the Allocation Policy document, and some are added to acknowledge the nature of the instructions being
proposed.

1. The total number of encoding points used will be minimised.

2. The size of the decoder logic generally will be minimised.

3. All instructions will re-use existing encoding formats where appropriate. Where an instruction does
not exactly fit an existing format, the closest format will be used, with arguments as to why that
format is best.

4. The complexity of rejecting invalid instructions must be balanced with the complexity of identifying
valid instructions.

5. Where a particular design choice has been made, rationale will be provided and other rejected
options will be described. Where the design choice is somewhat arbitrary (at-least to the designers
knowledge) this too will be noted as an indication someone might be able to do better.

6. Where instructions are exclusive to a particular base architecture (i.e. used on RV32 / RV64 only),
these encodings will be overlapped with other instructions which are exclusive to a different base
architecture. This allows the base architecture to act as an additional implicit bit in the encoding,
thus saving actual encoding space.

7. New encoding constraints such as relationships between register addresses will be sufficient but not
necessary to uniquely decode all of the instructions. This allows for additional saving of encoding
space if and only if the additional complexity and costs are deemed worth while.

A.1 Decoding Strategy


Scalar Cryptography instructions which write a single destination register, and read two source registers
best fit in the OP major opcode, which contains other ALU-esq instructions which source two registers.
Scalar Cryptography instructions which source a single register (rs1) best fit into the OP-IMM major
opcode. Even instructions with only a single source register and no immediate are better placed in the
OP-IMM space, because the decoder will already have inferred that only a single register needs reading,
and can disregard the rs2 field.
The encodings for the Scalar Cryptography instructions are chosen based on two assumptions. First,
the instruction implementations share little or no logic with existing RISC-V instructions, and so can be
expected to be implemented independently. Second, it will be reasonable to implement all of the Scalar
Cryptography instructions as part of the same functional unit within the execution pipeline of a core.
Based on these two assumptions, we want to: very quickly identify that we are decoding any Scalar
Cryptography instruction, in order to select the correct functional unit to send it too; and easily identify
which Scalar Cryptography instruction we are decoding, so that it may be identified and executed with
minimal effort by the functional unit.
Scalar Cryptography instructions all use func3=0b000 in the OP major opcode, or func3=0b001 in the
OP-IMM major opcode. Additionally, bit[28] is always set. This places them adjacent to the add/mul/sub
instructions in OP, or the slli/fsgnj.q instructions in OP-IMM but distinguished always by bit 28. Logic
for identifying each of these sub-field values already exists, so the cost of separating all Scalar Crypto
instructions from the rest of the ISA is of the order of three 2-input AND gates and a 2-input OR gate.

30
wire is_scalar_crypto = bit[28] && (
func3==0b000 && OPCODE==OP ||
func3==0b001 && OPCODE==OP-IMM );

Individual instructions within the Scalar Cryptography extension are then interpreted based on the
func7 and rs2/shamt bitfields.
All Scalar Cryptography instructions with a single register operand can be distinguished with func7=0b0001000
in the OP-IMM major opcode. The rs2 field is then used to encode the exact kind of instruction. Although
this means that an instruction which does not read rs2 has a non-zero rs2 field (which might be inefficient
or cause un-wanted register toggling in non-gated designs), it is more efficient in terms of opcode-space to
re-use the rs2 field for encoding bits, and so leave additional func7 encodings free for more instructions
which use two source registers. The rs2 field overlaps with the shamt field in the OP-IMM major opcode,
meaning the decoder can treat the encoding bits as an immediate for the execution unit responsible for
cryptography instructions.
For single operand SHA instructions, bit 22 distinguishes SHA256 and SHA512, while bits 21 : 20
distinguish between sum0, sum1, sig0, and sig1. For the RV32-only SHA512 instructions, bits 26 : 25
distinguish between sig/sum and 0/1, while bit 27 distinguishes high/low variants of the sha512sig*
instructions. All of these bits were chosen arbitrarily. Having the operation bits for SHA512/SHA256
instructions at a different place in the instruction on RV32 is slightly inefficient and costs more gates, but
is much more efficient in terms of opcode space. All encodings taken up by sha512* instructions on RV32
and free for use by other instructions on RV64.
Bit 29 distinguishes between hash function instructions and block cipher instructions. Bit 25 distin-
guishes between SM4 and AES Block ciphers. These bits were chosen arbitrarily.
All AES round instructions use bits 27 : 26 to distinguish between encrypt/decrypt and middle/final
round variants. The RV32 and RV64 AES round instructions overlap, as they are mutually exclusive given
the base ISA. This leaves some spare encoding space on RV64, when bits 31 : 30 are non-zero, due to
the bs immediate for the RV32 AES instructions. As single-source-register instructions, aes64ks1i and
aes64im live in the OP-IMM major opcode, adjacent to the hash function instructions but distinguished
from them by bit 29.
For the SM4 and RV32 AES instructions, we introduce a new destructive instruction format, where the
destination register is sourced from the rs1 field. We call this combined source/destination register rt.
This reflects the fact that these instructions are designed to always use their source/destination registers
in this way. This takes the number of encoding points used per RV32 AES / SM4 instruction from 131072
to 4096. We source rd from rs1 (rather than the reverse) because on the smaller cores we expect these
instructions to be use on, it is more important to be able to decode rs1 quickly than rd. The former rd
field (bits 11 : 7) are re-used as encoding space.
Table 4 shows the number of encoding points taken up by each instruction individually, and all
instructions for either the RV32 or RV64 base architecture. The entire 32-bit instruction opcode space
is 230 encoding points. The instructions exclusive to the Scalar Cryptography extension (i.e. ignoring
those borrowed from Bitmanip) take up 2.12 × 10−2 % and 1.86 × 10−2 % of the RV32 and RV64 encoding
spaces respectively.
The actual encodings for all RV32 and RV64 instructions are found in sections A.2 and A.3 respectively,
or alongside each instruction description.

31
Instruction RV32 RV64 Notes
sm4ed 4096 4096 Down from 131072 when requiring rs1=rd.
sm4ks 4096 4096 Down from 131072 when requiring rs1=rd.
sm3p0 1024 1024
sm3p1 1024 1024
sha256sum0 1024 1024
sha256sum1 1024 1024
sha256sig0 1024 1024
sha256sig1 1024 1024
aes32esmi 4096 Down from 131072 when requiring rs1=rd.
aes32esi 4096 Down from 131072 when requiring rs1=rd.
aes32dsmi 4096 Down from 131072 when requiring rs1=rd.
aes32dsi 4096 Down from 131072 when requiring rs1=rd.
sha512sum0r 32768
sha512sum1r 32768
sha512sig0l 32768
sha512sig0h 32768
sha512sig1l 32768
sha512sig1h 32768
aes64ks1i 16384 Require rcon<=0xA
aes64ks2 32768
aes64im 1024
aes64esm 32768 Overlaps with aes32esmi.
aes64es 32768 Overlaps with aes32esi.
aes64dsm 32768 Overlaps with aes32dsmi.
aes64ds 32768 Overlaps with aes32dsi.
sha512sum0 1024
sha512sum1 1024
sha512sig0 1024
sha512sig1 1024
Totals 227328 199680

Table 4: Encoding point counts for each instruction. Note pollentropy and getnoise are not shown,
as they are pseudo-ops relating to CSR reads.

32
A.2 Instruction Encodings Table: RV32
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

111100010101 00000 010 rd 11100 11 pollentropy


011110101001 00000 010 rd 11100 11 getnoise
00 01000 01000 rs1 001 rd 0010011 sm3p0
00 01000 01001 rs1 001 rd 0010011 sm3p1
00 01000 00000 rs1 001 rd 0010011 sha256sum0
00 01000 00001 rs1 001 rd 0010011 sha256sum1
00 01000 00010 rs1 001 rd 0010011 sha256sig0
00 01000 00011 rs1 001 rd 0010011 sha256sig1
01 01000 rs2 rs1 000 rd 0110011 sha512sum0r
01 01001 rs2 rs1 000 rd 0110011 sha512sum1r
01 01010 rs2 rs1 000 rd 0110011 sha512sig0l
01 01110 rs2 rs1 000 rd 0110011 sha512sig0h
01 01011 rs2 rs1 000 rd 0110011 sha512sig1l
01 01111 rs2 rs1 000 rd 0110011 sha512sig1h
bs 11000 rs2 rt 000 00000 0110011 sm4ed
bs 11010 rs2 rt 000 00000 0110011 sm4ks
bs 11011 rs2 rt 000 00000 0110011 aes32esmi
bs 11001 rs2 rt 000 00000 0110011 aes32esi
bs 11111 rs2 rt 000 00000 0110011 aes32dsmi
bs 11101 rs2 rt 000 00000 0110011 aes32dsi

33
A.3 Instruction Encodings Table: RV64
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

111100010101 00000 010 rd 11100 11 pollentropy


011110101001 00000 010 rd 11100 11 getnoise
bs 11000 rs2 rt 000 00000 0110011 sm4ed
bs 11010 rs2 rt 000 00000 0110011 sm4ks
00 01000 01000 rs1 001 rd 0010011 sm3p0
00 01000 01001 rs1 001 rd 0010011 sm3p1
00 01000 00000 rs1 001 rd 0010011 sha256sum0
00 01000 00001 rs1 001 rd 0010011 sha256sum1
00 01000 00010 rs1 001 rd 0010011 sha256sig0
00 01000 00011 rs1 001 rd 0010011 sha256sig1
00 01000 00100 rs1 001 rd 0010011 sha512sum0
00 01000 00101 rs1 001 rd 0010011 sha512sum1
00 01000 00110 rs1 001 rd 0010011 sha512sig0
00 01000 00111 rs1 001 rd 0010011 sha512sig1
00 11000 1 rcon<=10 rs1 001 rd 0010011 aes64ks1i
00 11000 00000 rs1 001 rd 0010011 aes64im
01 11111 rs2 rs1 000 rd 0110011 aes64ks2
00 11011 rs2 rs1 000 rd 0110011 aes64esm
00 11001 rs2 rs1 000 rd 0110011 aes64es
00 11111 rs2 rs1 000 rd 0110011 aes64dsm
00 11101 rs2 rs1 000 rd 0110011 aes64ds

34
B Entropy Source: Rationale and Discussion
The security of cryptographic systems is based on secret bits and keys. To prevent guessing, these bits
need to be random, so they come from True Random Number Generators (TRNGs).
As a fundamental security function, the generation of random numbers is governed by numerous
standards and technical requirements.

B.1 Standards and Terminology


A driving design goal for our architecture was for it to be easy to implement, yet compatible with current
versions of FIPS 140-3 [NIS19] and NIST SP 800-90B [Tur+18], significantly updated standards that are
only coming into use in 2020. Naturally, the architecture should also support other RNG frameworks
such as German AIS 20 / 31 [KS01; KS11] which is widely used in Common Criteria evaluations.
These standards set many of the technical requirements for the design, and we use their terminology
if possible. Note that FIPS 140-3 / SP 800-90B (our main target) imposes requirements on min-entropy,
while AIS-31 discusses Shannon entropy as well.
These standards set many of the technical requirements for the design, and we use their terminology
if possible. The delineation of various components is illustrated in Figure 10.

B.1.1 Entropy Source (ES)


Physical sources of true randomness are called Entropy Sources (ES) [Tur+18]. They are built by sampling
and processing data from a noise source (Section B.3.1). Since these are directly based on natural
phenomena and are subject to environmental conditions (which may be adversarial), they require features
and sensors that monitor the “health” and quality of those sources. See Section 4.4 for a discussion about
such security controls.

B.1.2 Conditioning
Raw physical randomness (noise) sources are rarely statistically perfect and some generate very large
amounts of bits, which need to be “debiased” and reduced to a smaller number of bits. This process is
called conditioning. A secure hash function is an example of a cryptographic conditioner. It is important
to note that even though hashing may make the output look more random, it does not increase its entropy
content.
Non-cryptographic conditioners and extractors such as von Neumann’s “debiased coin tossing” [Neu51]
are easier to implement efficiently but may reduce entropy content (in individual bits removed) more than
cryptographic hashes which mix the input entropy very efficiently. However, they are not based on
computational hardness assumptions and are therefore inherently more future proof. See Section B.3.4
for a more detailed discussion.

B.1.3 Pseudorandom Number Generator (PRNG)


Pseudorandom Number Generators (PRNGs) use deterministic mathematical formulas to create a large
amount of random numbers from a smaller amount of “seed” randomness. PRNGs are divided into
cryptographic and non-cryptographic ones.
Non-cryptographic PRNGs, such as the linear-congruential generators found in many programming
libraries, may generate statistically satisfactory random numbers but must never be used for cryptographic
keying. This is because they are not designed to resist cryptanalysis; it is usually possible to take some
output and mathematically derive the “seed” or the internal state of the PRNG from it. This is a security
problem since knowledge of the state allows the attacker to compute future or past outputs.

35
Noise Source

“analog” shot, quantum, ..

Entropy Source (ES)


Sampling

Hardware:
Health
raw bits
Tests
Conditioning

pre-processed Entropy(X) > 8 bits


OK?
ISA PollEntropy
Not OK:

( ES16 ) seed ( BIST / WAIT / DEAD )


Driver + DRBG(s)

Entropy Pool
Software:

processed “full entropy”

Crypto DRBG

Library API Kernel syscall

Application

Figure 10: PollEntropy provides an Entropy Source (ES) only, not a stateful random number generator.
As a result, it can support arbitrary security levels. Cryptographic (AES, SHA-2/3) ISA Extension
instructions can be used to construct high-speed DRBGs that are seeded from the entropy source.

B.1.4 Deterministic Random Bit Generator (DRBG)


Cryptographic PRNGs are also known as Deterministic Random Bit Generators (DRBGs), a term used
by SP 800-90A [BK15]. A strong cryptographic algorithm such as AES [NIS01] or SHA-2/3 [NIS15b;
NIS15a] is used to produce random bits from a seed. The secret seed material is like a cryptographic key;
determining the seed from the DRBG output is as hard as breaking AES or a strong hash function. This
also illustrates that the seed/key needs to be long enough and come from a trusted Entropy Source. The
DRBG should still be frequently refreshed (reseeded) for forward and backward security.

B.2 Specific Rationale and Considerations


(Sect. 4.1) PollEntropy: An entropy source does not require a high-bandwidth interface; a single
DRBG source initialization only requires 512 bits (256 bits of entropy) and the DRBG state can be shared
by any number of callers. Once initiated, a DRBG requires new randomness only for the purposes of
forward security.
Without a polling-style mechanism the entropy source could hang for thousands of cycles under some
circumstances. The wfi mechanism (at least potentially) allows energy-saving sleep on MCUs and
context switching on a higher-end CPUs.
The reason for the particular OPST two-bit mechanism is to provide redundancy. The “fault” bit
combinations 11 (and 00) are more likely for electrical reasons if feature discovery fails and the entropy
source is actually not available (this has happened to AMD [Sal19]).
The 16-bit bandwidth was a compromise motivated by the desire to provide redundancy in the return

36
value, some protection against potential Power/EM leakage (further alleviated by the 2:1 cryptographic
conditioning discussed in Section 4.3), and the desire to have all of the bits “in the same place” on both
RV32 and RV64 architectures for programming convenience.

(Sect. 4.3) §E1, Entropy Requirement: Rather than attempting to mathematically define the
properties that the entropy source output must satisfy, we define that it should pass SP 800-90B evaluation
and certification when conditioned cryptographically (“perfectly”) in ratio 2:1. This is our “safety margin”
for non-cryptographic conditioners.
Note that the min-entropy assessment methodology in SP 800-90B [Tur+18] also has a safety margin
in its confidence intervals, and therefore there must be consistently more than 8 bits of entropy per
16-bit word. In practice, we recommend the distribution to be significantly closer to uniform to satisfy
possible additional use cases and AIS 20 / 31 [KS11] requirements (if those can’t be met with a software
conditioner).
Note that the usage of a vetted conditioner (such as SHA-2/3) was specified for technical reasons
related to SP 800-90B itself; non-vetted conditioners may offer similar security.
The 128-bit output block size was selected because that is the output size of the CBC-MAC conditioner
specified in [Tur+18] and also the smallest key size we expect to see in applications.

(Sect. 4.3) §E2, IID Requirement: IID is an optional requirement in SP 800-90B [Tur+18] but it
is needed to prevent information leakage between different entities that possibly share the same entropy
source. It also significantly simplifies certification and vendor-independent driver development. The
pollentropy instruction itself can be later expanded to support non-IID sources (e.g. via a different
immediate constant).

(Sect. 4.3) §E3, Secret State Requirement: DRBGs can be used to feed other (virtual) DRBGs
but that does not increase the absolute amount of entropy in the system. The entropy source must be
able to support current and future security standards and applications. The 256-bit requirement maps
to “Category 5” of NIST Post-Quantum Cryptography (4.A.5 “Security Strength Categories” in [NIS16])
and TOP SECRET schemes in Suite B and the newer U.S. Government CNSA Suite [NSA15].
Source anonymization. In some cases, an entropy source (or the circuit that interfaces it) may have a
uniquely identifiable hardware “signature.” This can be harmless or even useful in some applications (as
random sources may exhibit PUF-like features) but highly undesirable in others (anonymized virtualized
environments and enclaves). A DRBG masks such statistical features.

(Sect. 4.4) Security Controls: Our approach is informed by the experience of designing and im-
plementing cryptographic protocols. Some of the most devastating practical attacks against real-life
cryptosystems have used inconsequential-looking additional information, such as padding error messages
[Bar+12] or timing information [Mog+20]. In cryptography, such out-of-band information sources are
called “oracles.”
This also applies to the raw noise source. The raw source interface has been delegated to an op-
tional vendor-specific test interface. Importantly the test interface and the main interface should not be
operational at the same time.

“The noise source state shall be protected from adversarial knowledge or influence to the great-
est extent possible. The methods used for this shall be documented, including a description of
the (conceptual) security boundary’s role in protecting the noise source from adversarial ob-
servation or influence.”

–Noise Source Requirements, NIST SP 800-90B [Tur+18].

37
The role of the RISC-V ISA implementation is to try to ensure that the hardware-software interface
minimizes avenues for adversarial information flow; all status information that is unnecessary in normal
operation should be eliminated. We specifically urge implementers against creating unnecessary informa-
tion flows (“status oracles”) via the custom bits or to allow the instruction to disable or affect the TRNG
output in any significant way. All information flows and interaction mechanisms must be considered from
an adversarial viewpoint and implemented only if they are truly necessary and their security impact can
be fully understood.
For example, the entropy polling interface may not be “constant time.” The polling mechanism can
be modeled as a rejection sampler; such a timing oracle can reveal information about the noise source
and the rejection criteria, but usually not the random output itself. If these are correlated, additional
countermeasures are necessary.

(Sect. 4.4) §T1, On-demand testing: Interaction with hardware self-test mechanisms from the
software side should be minimal; the term “on-demand” does not mean that the end-user or application
program should be able to invoke them in the field (the term is a throwback to an age of discrete,
non-autonomous crypto devices with human operators.)

(Sect. 4.4) §T2, Continuous checks: Physical attacks can occur while the device is running. The
design should avoid guiding such active attacks by revealing detailed status information. Upon detection
of an attack the default action should be aimed at damage control – to prevent weak crypto keys from
being generated.
The statistical nature of some tests makes “type-1” false positives a possibility. There may also be
requirements for signaling of non-fatal alarms; AIS 31 specifies “noise alarms” that can go off with non-
negligible probability even if the device is functioning correctly; these can be signaled with BIST. There
rarely is anything that can or should be done about a non-fatal alarm condition in an operator-free,
autonomous system.
The state of statistical runtime health checks (such as counters) is potentially correlated with some
secret keying material, hence the zeroization requirement.

(Sect. 4.4) §T3, Fatal error states: These tests can complement other integrity and tamper resis-
tance mechanisms (See Chapter 18 of [And20] for examples).
Some hardware random generators are, by their physical construction, exposed to relatively non-
adversarial environmental and manufacturing issues. However, even such “innocent” failure modes may
indicate a fault attack [KSV13] and therefore should be addressed as a system integrity failure rather
than as a diagnostic issue.
Security architects will understand to use permanent or hard-to-recover “security-fuse” lockdowns
only if the threshold of a test is such that the probability of false-positive is negligible over the entire
device lifetime.

B.3 Implementation Strategies


When considering implementation options and trade-offs one must look at the entire information flow
since each step is interconnected.

1. A Noise Source generates private, unpredictable signals from stable and well-understood physical
random events.

2. Sampling digitizes the noise signal into a raw stream of bits. This raw data also needs to be
protected by the design.

38
3. Continuous health tests ensure that the noise source and its environment meet their operational
parameters.

4. Non-cryptographic conditioners remove much of the bias and correlation in input noise: Output
entropy  4 bits/byte.

5. Cryptographic conditioners produce nearly full entropy output, completely indistinguishable


from ideal random.

6. DRBG takes in ≥ 256 bits of seed entropy as keying material and uses a “one way” cryptographic
process to rapidly generate bits on demand (without revealing the seed/state).

Steps 1-4 (possibly 5) are considered to be part of the Entropy Source (ES) and provided by the
pollentropy instruction. Adding the software-side cryptographic steps 5-6 and control logic comple-
ments it into a True Random Number Generator (TRNG).
As a general rule, RISC-V specifies the ISA only. We provide some additional requirements so that
portable, vendor-independent middleware and kernel components can be created. The actual hardware
implementation and certification is left to vendors and circuit designers; the discussion in this section is
purely informational.
While we do not require entropy source implementations to be certified designs, we do expect that
they behave in a compatible manner and do not create unnecessary security risks to users. Self-evaluation
and testing following appropriate security standards is usually needed to achieve this. NIST has made
its SP 800-90B[Tur+18] min-entropy estimation package freely available11 and similar free tools are also
available12 for AIS 31 [KS11].

B.3.1 Noise Sources


The theory of random signals and electrical noise became well established in the post-World War II period
[Ric44; Ric45; JR58]. We will give some examples of common noise sources that can be implemented in
the processor itself (using standard cells).

Ring Oscillators. The most common entropy source type in production use today is based on “free
running” ring oscillators and their timing jitter. Here, an odd number of inverters is connected into a
loop from which noise source bits are sampled in relation to a reference clock [Bau+11]. The sampled bit
sequence may be expected to be relatively uncorrelated (close to IID) if the sample rate is suitably low
[KS11]. However further processing is usually required.
AMD [AMD17], ARM [ARM17], and IBM [Lib+13] are examples of ring oscillator TRNGs intended
for high-security applications.
There are related metastability-based generator designs such as Transition Effect Ring Oscillator
(TERO) [VD10]. The differential/feedback Intel construction [HKM12] is slightly different but also falls
into the same general metastable oscillator-based category.
The main benefits of ring oscillators are: (1) They can be implemented with standard cell libraries
without external components – and even on FPGAs [Val+10], (2) there is an established theory for their
behavior [HL98; HLL99; Bau+11], and (3) ample precedent exists for testing and certifying them at the
highest security levels.
Ring oscillators also have well-known implementation pitfalls. Their output is sometimes highly de-
pendent on temperature, which must be taken into account in testing and modeling. If the ring oscillator
construction is parallelized, it is important that the number of stages and/or inverters in each chain is
11
EntropyAssessment: https://github.com/usnistgov/SP800-90B_EntropyAssessment
12
(In German) AIS 31-Implementierung in JAVA: https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/
Zertifizierung/Interpretationen/AIS_31_testsuit_zip

39
coprime to avoid entropy reduction due to harmonic “Huyghens synchronization” [Bak86]. Such harmon-
ics can also be inserted maliciously in a frequency injection attack, which can have devastating results
[MM09]. Countermeasures are related to circuit design; environmental sensors, electrical filters, and usage
of a differential oscillator may help.

Shot Noise. A category of random sources consisting of discrete events and modeled as a Poisson process
is called “shot noise.” There’s a long-established precedent of certifying them; the AIS 31 document
[KS11] itself offers reference designs based on noisy diodes. Shot noise sources are often more resistant
to temperature changes than ring oscillators. Some of these generators can also be fully implemented
with standard cells (The Rambus / Inside Secure generic TRNG IP [Ram20] is described as a Shot Noise
generator).

Other types of noise. It may be possible to certify more exotic noise sources and designs, although
their stochastic model needs to be equally well understood and their CPU interfaces must be secure. See
Section B.4 for a discussion of Quantum entropy sources.

B.3.2 Samplers and GetNoise


It is necessary to verify that the noise source and sampler output matches with their stochastic models.
This is usually done in a laboratory setting since NIST SP 800-90B [Tur+18] requires that the noise
source in protected in production devices. We are leaving access as a vendor-specific matter but we urge
them to protect the raw source and to make it unavailable to casual users.
Rationale: Samplers can generate vast amounts of data. NIST SP 800-90B [Tur+18] defines a concep-
tual interface GetNoise() for the raw output and also anticipates that the actual interfaces “will depend
on the entropy source deployed.”
Building data paths to make the raw noise available through the ISA would be problematic as it
is unclear how to “sample” possibly up to several gigabits of information per second in a way that is
appropriately representative of its properties.

“The vendor may use special methods (or devices, such as an oscilloscope) that require detailed
knowledge of the source to collect raw data. The testing laboratory is required [...] to present a
rationale why the data collections methods will not alter the statistical properties of the noise
source or explain how to account for any change in the source’s statistical characteristics [...]”

– FIPS 140 Implementation Guidance, 2020 [NC20]

B.3.3 Continuous Health Tests


If NIST SP 800-90B certification is required, the hardware should implement at least the health tests
defined in Section 4.4 of [Tur+18]: repetition count test and adaptive proportion test.
Health monitoring requires some state information related to the noise source to be maintained. The
tests should be designed in a way that a specific number of samples guarantees a state flush (no hung
states). We suggest flush size W ≤ 1024 to match with the NIST SP 800-90B required tests (See Section
4.4 in [Tur+18]). The state is also fully zeroized in a system reset.
Rationale: The two mandatory tests can be built with minimal circuitry. Full histograms are not
required, only simple counter registers: repetition count, window count, and sample count. Repetition
count is reset every time the output sample value changes; if the count reaches a certain cutoff limit,
a noise alarm (BIST) or failure (DEAD) is signaled. Window counter is used to save every W ’th output
(typically W ∈ 512, 1024.) The frequency of this reference sample in the following window is counted;
cutoff values are defined in the standard. We see that the structure of the mandatory tests is such that,
if well implemented, no information is carried beyond a limit of W samples.

40
Section 4.5 of [Tur+18] explicitly permits additional developer-defined tests and several more were
defined in early versions of FIPS 140-1 before being “crossed out.” The choice of additional tests depends
on the nature and implementation of the physical source.
Especially if a non-cryptographic conditioner is used in hardware, it is possible that the AIS 31 [KS11]
online tests are implemented by driver software. They can also be implemented in hardware. For some
security profiles AIS 31 mandates that their tolerances are set in a way that the probability of an alarm
is at least 10−6 yearly under “normal usage.” Such requirements are problematic in modern applications
since their probability is too high for critical systems13 . There rarely is anything that can or should be
done about a non-fatal alarm condition in an operator-free, autonomous system. However, AIS 31 allows
the DRBG component to keep running despite a failure in its Entropy Source, so we suggest re-entering
temporary BIST state (Section 4.4) to signal a non-fatal statistical error if such (non-actionable) signaling
is necessary. Drivers and applications can react to this appropriately (or simply log it) but it will not
directly affect the availability of the TRNG. A permanent error condition should result in DEAD state.

B.3.4 Non-cryptographic Conditioners


As noted in Section B.1.2, physical randomness sources generally require a post-processing step called
conditioning to meet the desired quality requirements, which are outlined in Section 4.3.
The approach taken in this interface is to allow a combination of non-cryptographic and cryptographic
filtering to take place. The first stage (hardware) merely needs to be able to distill the entropy comfortably
above 4 bits per byte (Sect. 4.3, Entropy) and to guarantee that the samples are independent (Sect. 4.3,
IID).

ˆ One may take a set of bits from a noise source and XOR them together to produce a less biased
(and more independent) bit. If the source model is well understood, such a construction lends itself
well to analysis and entropy estimation [Dav02].

ˆ The von Neumann extractor [Neu51] looks at consecutive pairs of bits, rejects 00 and 11, and outputs
0 or 1 for 01 and 10, respectively. It will reduce the number of bits to less than 25% of original but
the output is provably unbiased (assuming independence).

ˆ Blum’s extractor [Blu86] can be used on sources whose behavior resembles n-state Markov chains.
If its assumptions hold, it also removes dependencies, creating an IID source.

ˆ Other linear and non-linear correctors such as those discussed by Dichtl and Lacharme [Lac08].

Note that the hardware may also implement a full cryptographic conditioner to in the entropy source,
even though the software driver still needs a cryptographic conditioner too (Sect. 4.3).
Rationale: The main advantage of non-cryptographic filters is in their energy efficiency, relative sim-
plicity, and amenability to mathematical analysis. If well designed, they can be evaluated in conjunction
with a stochastic model of the noise source itself. They do not require computational hardness assump-
tions.
In some cases, an entropy source (and the circuit that implements it) may have a uniquely identifiable
hardware “signature.” This can be harmless or even useful in some applications (as random sources may
exhibit PUF-like features) but highly undesirable in others (anonymized virtualized environments and
enclaves).
Such virtualized environments are probably better off just using /dev/urandom of the host rather
than sharing the host’s hardware-backed Entropy Source to the guest environment. Also note the source
entropy requirement (Sect. 4.3, Secret State) when sharing such generators.
13
Currently (2020) about 1010 secure elements are shipped yearly, many in critical applications and with TRNGs, according
to https://www.eurosmart.com.

41
B.3.5 Cryptographic Conditioners
Cryptographic conditioners are always required on the software side of the PollEntropy ISA boundary.
They may be also implemented on the hardware side if necessary. In any case, the PollEntropy output
must always be compressed 2:1 (or more) before being used as keying material or considered “full entropy.”
Examples of cryptographic conditioners include the random pool of the Linux operating system, secure
hash functions (SHA-2/3, SHAKE [NIS15b; NIS15a] ), and the AES-based CBC-MAC construction of
SP 800-90B [Tur+18].
In some constructions, such as the Linux RNG and SHA-3/SHAKE [NIS15b] based generators the
cryptographic conditioning and output (DRBG) generation is provided by the same component.
Rationale: For many low-power targets constructions such as Intel’s [Mec18] and AMD’s [AMD17]
hardware AES CBC-MAC conditioner would be too complex and expensive to implement solely to serve
pollentropy . On the other hand, simpler non-cryptographic conditioners may be too wasteful on input
entropy if very high-quality random output is required – ARM TrustZone TRBG [ARM17] outputs only
10Kbit/sec at 200 MHz. Hence a resource-saving compromise is made between hardware and software
generation that allows an implementation to use the RISC-V cryptographic ISA.

B.3.6 The Final Random: DRBGs


All random bits reaching end users and applications must come from a cryptographic DRBG. These are
generally implemented by the driver component in software. The RISC-V AES and SHA instruction set
extensions should be used if available, since they offer additional security features such as timing attack
resistance.
Currently recommended DRBGs are defined in NIST SP 800-90A (Rev 1) [BK15]: CTR_DRBG, Hash_DRBG,
and HMAC_DRBG. Certification often requires known answer tests (KATs) for the symmetric components
and the DRBG as a whole. These are significantly easier to implement in software than in hardware. In
addition to the directly certifiable SP 800-90A DRBGs, a Linux-style random pool construction based on
ChaCha20 [M2̈0] can be used, or an appropriate construction based on SHAKE256 [NIS15b].
These are just recommendations; programmers can adjust the usage of the CPU Entropy Source to
meet future requirements.

B.4 Quantum vs Classical Random


“The NCSC believes that classical RNGs will continue to meet our needs for government and
military applications for the foreseeable future.”

– U.K. QRNG Guidance, March 2020 [NCS20].

A Quantum Random Number Generator (QRNG) is a TRNG whose source of randomness can be
unambiguously identified to be a specific quantum phenomenon such as quantum state superposition,
quantum state entanglement, Heisenberg uncertainty, quantum tunneling, spontaneous emission, or ra-
dioactive decay [ITU19].
Direct quantum entropy is theoretically the best possible kind of entropy. A typical TRNG based on
electronic noise is also largely based on quantum phenomena and is equally unpredictable - the difference
is that the relative amount of quantum and classical physics involved is difficult to quantify for a classical
TRNG.
QRNGs are designed in a way that allows the amount of quantum-origin entropy to be modeled and
estimated. This distinction is important in the security model used by QKD (Quantum Key Distribution)
security mechanisms which can be used to protect the physical layer (such as fiber optic cables) against
interception by using quantum mechanical effects directly.

42
This security model means that many of the available QRNG devices do not use cryptographic condi-
tioning and may fail cryptographic statistical requirements [HSHC20]. Many implementers may consider
them to be entropy sources instead.
Relatively little research has gone into QRNG implementation security, but many QRNG designs are
arguably more susceptible to leakage than classical generators (such as ring oscillators) as they tend to
employ external components and mixed materials.

Post-Quantum Cryptography. The classical/quantum origin of randomness is not important in NIST


Post-Quantum Cryptography (PQC) [NIS16]. Recall that cryptography aims to protect the confidentiality
and integrity of data itself and does not place any requirements on the physical communication channel
(like QKD). Classical good-quality TRNGs are perfectly suitable for generating the secret keys for PQC
protocols that are hard for quantum computers to break, but implementable on classical computers. What
matters in cryptography is that the secret keys have enough true randomness (entropy) and that they are
generated and stored securely.
Of course one must avoid DRBGs that are based on problems that are easily solvable with quantum
computers, such as factoring [Sho94] in the case of Blum-Blum-Shub generator [BBS86]. However most
symmetric algorithms are less affected as the best quantum attacks are still exponential to key size [Gro96].
As an example, the original Intel RNG [Mec18], whose output generation is based on AES-128 can be
attacked using Grover’s algorithm with approximately square-root effort [Jaq+20]. While even “64-bit”
quantum security is extremely difficult to break, many applications specify a higher security requirement.
NIST [NIS16] defines AES-128 to be “Category 1” equivalent post-quantum security, while AES-256 is
“Category 5” (highest). We avoid this possible future issue by exposing a more direct access to the
entropy source, which can derive its security from information-theoretic assumptions only.

43
C Supplementary Materials, Code and RTL
While ths document contains the specifications for the RISC-V cryptography extensions, numerous sup-
plementary materials and example codes have also been developed. All of the materials related to the
RISC-V Cryptography extension live in a Github Repository, located at
https://github.com/riscv/riscv-crypto/.

ˆ doc/tex/ - Contains the LATEX source code for this document.

ˆ doc/supp/ - Contains supplementary information and recommendations for implementers of soft-


ware and hardare.

ˆ benchmarks/ - Example software implementations.

ˆ rtl/ - Example Verilog implementations of each instruction.

ˆ sail/ - Formal model implementations in Sail.

44
D Supporting Sail Code
This section contains the supporting Sail code referenced by the instruction descriptions throughout the
specification. The Sail Manual14 is recommended reading in order to best understand the supporting
code.
1 v a l r o r 3 2 : ( b i t s ( 3 2 ) , i n t ) −> b i t s ( 3 2 )
2 function ror32 (x , y) = {
3 ( x>>t o b i t s ( 5 , y ) ) | ( x<<t o b i t s (5 ,32 − y ) )
4 }
5
6 v a l r o r 6 4 : ( b i t s ( 6 4 ) , i n t ) −> b i t s ( 6 4 )
7 function ror64 (x , y) = {
8 ( x>>t o b i t s ( 6 , y ) ) | ( x<<t o b i t s (6 ,64 − y ) )
9 }
10
11 v a l r o l 3 2 : ( b i t s ( 3 2 ) , i n t ) −> b i t s ( 3 2 )
12 function rol32 (x , y) = {
13 ( x<<t o b i t s ( 5 , y ) ) | ( x>>t o b i t s (5 ,32 − y ) )
14 }
15
16 / * A u x i l i a r y f u n c t i o n f o r p e r f o r m i n g GF m u l t i p l i c a i t o n * /
17 v a l xt2 : b i t s ( 8 ) −> b i t s ( 8 )
18 f u n c t i o n xt2 ( x ) = {
19 ( x << 1 ) ˆ ( match ( b i t t o b o o l ( x [ 7 ] ) ) {
20 f a l s e => 0 x00 ,
21 t r u e => 0x1B
22 })
23 }
24
25 v a l xt3 : b i t s ( 8 ) −> b i t s ( 8 )
26 f u n c t i o n xt3 ( x ) = {
27 x ˆ xt2 ( x )
28 }
29
30 / * M u l t i p l y 8− b i t f i e l d e l e m e n t by 4− b i t v a l u e f o r AES MixCols s t e p * /
31 v a l gfmul : ( b i t s ( 8 ) , b i t s ( 4 ) ) −> b i t s ( 8 )
32 f u n c t i o n gfmul ( x , y ) = {
33 ( i f b i t t o b o o l ( y [ 0 ] ) then x else 0 x00 ) ˆ
34 ( i f b i t t o b o o l ( y [ 1 ] ) then xt2 ( x) else 0 x00 ) ˆ
35 ( i f b i t t o b o o l ( y [ 2 ] ) then xt2 ( xt2 ( x) ) else 0 x00 ) ˆ
36 ( i f b i t t o b o o l ( y [ 3 ] ) then xt2 ( xt2 ( xt2 ( x ) ) ) e l s e 0 x00 )
37 }
38
39 / * 8− b i t t o 32− b i t p a r t i a l AES Mix Colum − f o r w a r d s * /
40 v a l a e s m i x c o l u m n b y t e f w d : b i t s ( 8 ) −> b i t s ( 3 2 )
41 f u n c t i o n aes mixcolumn byte fwd ( so ) = {
42 gfmul ( so , 0 x3 ) @ s o @ s o @ gfmul ( so , 0 x2 )
43 }
44
45 / * 8− b i t t o 32− b i t p a r t i a l AES Mix Colum − i n v e r s e * /
46 v a l a e s m i x c o l u m n b y t e i n v : b i t s ( 8 ) −> b i t s ( 3 2 )
47 f u n c t i o n aes mixcolumn byte inv ( so ) = {
48 gfmul ( so , 0xb ) @ gfmul ( so , 0xd ) @ gfmul ( so , 0 x9 ) @ gfmul ( so , 0 xe )
49 }
50
51 / * 32− b i t t o 32− b i t AES f o r w a r d MixColumn * /
52 v a l a e s m i x c o lumn fwd : b i t s ( 3 2 ) −> b i t s ( 3 2 )
53 f u n c t i o n a e s mixcolumn fwd ( x ) = {
54 l e t s0 : b i t s (8) = x [ 7 . . 0 ] ;
14
https://github.com/rems-project/sail/blob/sail2/manual.pdf

45
55 l e t s1 : b i t s (8) = x[15.. 8];
56 l e t s2 : b i t s (8) = x[23..16];
57 l e t s3 : b i t s (8) = x[31..24];
58 l e t b0 : b i t s ( 8 ) = xt2 ( s 0 ) ˆ xt3 ( s 1 ) ˆ ( s2 ) ˆ ( s3 ) ;
59 l e t b1 : b i t s ( 8 ) = ( s 0 ) ˆ xt2 ( s 1 ) ˆ xt3 ( s 2 ) ˆ ( s3 ) ;
60 l e t b2 : b i t s ( 8 ) = ( s0 ) ˆ ( s1 ) ˆ xt2 ( s 2 ) ˆ xt3 ( s 3 ) ;
61 l e t b3 : b i t s ( 8 ) = xt3 ( s 0 ) ˆ ( s1 ) ˆ ( s2 ) ˆ xt2 ( s 3 ) ;
62 b3 @ b2 @ b1 @ b0 / * Return v a l u e * /
63 }
64
65 / * 32− b i t t o 32− b i t AES i n v e r s e MixColumn * /
66 v a l a e s m i x c o l u m n i n v : b i t s ( 3 2 ) −> b i t s ( 3 2 )
67 function aes mixcolumn inv (x ) = {
68 l e t s0 : b i t s (8) = x [ 7 . . 0 ] ;
69 l e t s1 : b i t s (8) = x [ 1 5 . . 8 ] ;
70 l e t s2 : b i t s (8) = x [ 2 3 . . 1 6 ] ;
71 l e t s3 : b i t s (8) = x [ 3 1 . . 2 4 ] ;
72 l e t b0 : b i t s ( 8 ) = gfmul ( s0 , 0 xE ) ˆ gfmul ( s1 , 0 xB) ˆ gfmul ( s2 , 0 xD) ˆ gfmul ( s3 , 0 x9 ) ;
73 l e t b1 : b i t s ( 8 ) = gfmul ( s0 , 0 x9 ) ˆ gfmul ( s1 , 0 xE ) ˆ gfmul ( s2 , 0 xB) ˆ gfmul ( s3 , 0 xD) ;
74 l e t b2 : b i t s ( 8 ) = gfmul ( s0 , 0 xD) ˆ gfmul ( s1 , 0 x9 ) ˆ gfmul ( s2 , 0 xE ) ˆ gfmul ( s3 , 0 xB) ;
75 l e t b3 : b i t s ( 8 ) = gfmul ( s0 , 0 xB) ˆ gfmul ( s1 , 0 xD) ˆ gfmul ( s2 , 0 x9 ) ˆ gfmul ( s3 , 0 xE ) ;
76 b3 @ b2 @ b1 @ b0 / * Return v a l u e * /
77 }
78
79 v a l a e s d e c o d e r c o n : b i t s ( 4 ) −> b i t s ( 3 2 )
80 function aes decode rcon ( r ) = {
81 match r {
82 0 x0 => 0 x00000001 ,
83 0 x1 => 0 x00000002 ,
84 0 x2 => 0 x00000004 ,
85 0 x3 => 0 x00000008 ,
86 0 x4 => 0 x00000010 ,
87 0 x5 => 0 x00000020 ,
88 0 x6 => 0 x00000040 ,
89 0 x7 => 0 x00000080 ,
90 0 x8 => 0 x0000001b ,
91 0 x9 => 0 x00000036 ,
92 0xA => 0 x00000000 ,
93 0xB => 0 x00000000 ,
94 0xC => 0 x00000000 ,
95 0xD => 0 x00000000 ,
96 0xE => 0 x00000000 ,
97 0xF => 0 x00000000
98 }
99 }
100
101 / * SM4 SBox − o n l y one sbox f o r f o r w a r d s and i n v e r s e * /
102 l e t sm4 sbox table : l i s t ( b i t s (8) ) = [ |
103 0xD6 , 0 x90 , 0xE9 , 0xFE , 0xCC, 0xE1 , 0x3D , 0xB7 , 0 x16 , 0xB6 , 0 x14 , 0xC2 , 0 x28 ,
104 0xFB , 0x2C , 0 x05 , 0x2B , 0 x67 , 0x9A , 0 x76 , 0x2A , 0xBE , 0 x04 , 0xC3 , 0xAA, 0 x44 ,
105 0 x13 , 0 x26 , 0 x49 , 0 x86 , 0 x06 , 0 x99 , 0x9C , 0 x42 , 0 x50 , 0xF4 , 0 x91 , 0xEF , 0 x98 ,
106 0x7A , 0 x33 , 0 x54 , 0x0B , 0 x43 , 0xED, 0xCF , 0xAC, 0 x62 , 0xE4 , 0xB3 , 0x1C , 0xA9 ,
107 0xC9 , 0 x08 , 0xE8 , 0 x95 , 0 x80 , 0xDF , 0 x94 , 0xFA , 0 x75 , 0x8F , 0x3F , 0xA6 , 0 x47 ,
108 0 x07 , 0xA7 , 0xFC , 0xF3 , 0 x73 , 0 x17 , 0xBA, 0 x83 , 0 x59 , 0x3C , 0 x19 , 0xE6 , 0 x85 ,
109 0x4F , 0xA8 , 0 x68 , 0x6B , 0 x81 , 0xB2 , 0 x71 , 0 x64 , 0xDA, 0x8B , 0xF8 , 0xEB , 0x0F ,
110 0x4B , 0 x70 , 0 x56 , 0x9D , 0 x35 , 0x1E , 0 x24 , 0x0E , 0x5E , 0 x63 , 0 x58 , 0xD1 , 0xA2 ,
111 0 x25 , 0 x22 , 0x7C , 0x3B , 0 x01 , 0 x21 , 0 x78 , 0 x87 , 0xD4 , 0 x00 , 0 x46 , 0 x57 , 0x9F ,
112 0xD3 , 0 x27 , 0 x52 , 0x4C , 0 x36 , 0 x02 , 0xE7 , 0xA0 , 0xC4 , 0xC8 , 0x9E , 0xEA, 0xBF ,
113 0x8A , 0xD2 , 0 x40 , 0xC7 , 0 x38 , 0xB5 , 0xA3 , 0xF7 , 0xF2 , 0xCE , 0xF9 , 0 x61 , 0 x15 ,
114 0xA1 , 0xE0 , 0xAE, 0x5D , 0xA4 , 0x9B , 0 x34 , 0x1A , 0 x55 , 0xAD, 0 x93 , 0 x32 , 0 x30 ,
115 0xF5 , 0x8C , 0xB1 , 0xE3 , 0x1D , 0xF6 , 0xE2 , 0x2E , 0 x82 , 0 x66 , 0xCA, 0 x60 , 0xC0 ,

46
116 0 x29 , 0 x23 , 0xAB, 0x0D , 0 x53 , 0x4E , 0x6F , 0xD5 , 0xDB, 0 x37 , 0 x45 , 0xDE, 0xFD ,
117 0x8E , 0x2F , 0 x03 , 0xFF , 0x6A , 0 x72 , 0x6D , 0x6C , 0x5B , 0 x51 , 0x8D , 0x1B , 0xAF ,
118 0 x92 , 0xBB , 0xDD, 0xBC , 0x7F , 0 x11 , 0xD9 , 0x5C , 0 x41 , 0x1F , 0 x10 , 0x5A , 0xD8 ,
119 0x0A , 0xC1 , 0 x31 , 0 x88 , 0xA5 , 0xCD, 0x7B , 0xBD, 0x2D , 0 x74 , 0xD0 , 0 x12 , 0xB8 ,
120 0xE5 , 0xB4 , 0xB0 , 0 x89 , 0 x69 , 0 x97 , 0x4A , 0x0C , 0 x96 , 0 x77 , 0x7E , 0 x65 , 0xB9 ,
121 0xF1 , 0 x09 , 0xC5 , 0x6E , 0xC6 , 0 x84 , 0 x18 , 0xF0 , 0x7D , 0xEC , 0x3A , 0xDC, 0x4D ,
122 0 x20 , 0 x79 , 0xEE , 0x5F , 0x3E , 0xD7 , 0xCB , 0 x39 , 0 x48
123 |]
124
125 / * Lookup f u n c t i o n − t a k e s an i n d e x and a l i s t , and r e t r i e v e s t h e
126 * x ’ th e l e m e n t o f t h a t l i s t .
127 */
128 v a l s b o x l o o k u p : ( b i t s ( 8 ) , l i s t ( b i t s ( 8 ) ) ) −> b i t s ( 8 )
129 function sbox lookup (x , table ) = {
130 match ( x , t a b l e ) {
131 ( 0 x00 , t 0 : : tn ) => t0 ,
132 ( y , t 0 : : tn ) => s b o x l o o k u p ( x − 0 x01 , tn )
133 }
134 }
135
136 / * Easy f u n c t i o n t o perform a f o r w a r d AES SBox o p e r a t i o n on 1 byte . * /
137 v a l a e s s b o x f w d : b i t s ( 8 ) −> b i t s ( 8 )
138 function aes sbox fwd (x) = {
139 match ( x ) {
140 0 x00 => 0 x63 , 0 x01 => 0 x7c , 0 x02 => 0 x77 , 0 x03 => 0x7b , 0 x04 => 0 xf2 ,
141 0 x05 => 0x6b , 0 x06 => 0 x6f , 0 x07 => 0 xc5 , 0 x08 => 0 x30 , 0 x09 => 0 x01 ,
142 0 x0a => 0 x67 , 0 x0b => 0x2b , 0 x0c => 0 x f e , 0 x0d => 0xd7 , 0 x0e => 0xab ,
143 0 x 0 f => 0 x76 , 0 x10 => 0 xca , 0 x11 => 0 x82 , 0 x12 => 0 xc9 , 0 x13 => 0x7d ,
144 0 x14 => 0 xfa , 0 x15 => 0 x59 , 0 x16 => 0 x47 , 0 x17 => 0 xf0 , 0 x18 => 0xad ,
145 0 x19 => 0xd4 , 0 x1a => 0 xa2 , 0 x1b => 0 xaf , 0 x1c => 0 x9c , 0 x1d => 0 xa4 ,
146 0 x1e => 0 x72 , 0 x 1 f => 0 xc0 , 0 x20 => 0xb7 , 0 x21 => 0 xfd , 0 x22 => 0 x93 ,
147 0 x23 => 0 x26 , 0 x24 => 0 x36 , 0 x25 => 0 x3f , 0 x26 => 0 xf7 , 0 x27 => 0 xcc ,
148 0 x28 => 0 x34 , 0 x29 => 0 xa5 , 0 x2a => 0 xe5 , 0 x2b => 0 xf1 , 0 x2c => 0 x71 ,
149 0 x2d => 0xd8 , 0 x2e => 0 x31 , 0 x 2 f => 0 x15 , 0 x30 => 0 x04 , 0 x31 => 0 xc7 ,
150 0 x32 => 0 x23 , 0 x33 => 0 xc3 , 0 x34 => 0 x18 , 0 x35 => 0 x96 , 0 x36 => 0 x05 ,
151 0 x37 => 0 x9a , 0 x38 => 0 x07 , 0 x39 => 0 x12 , 0 x3a => 0 x80 , 0 x3b => 0 xe2 ,
152 0 x3c => 0 xeb , 0 x3d => 0 x27 , 0 x3e => 0xb2 , 0 x 3 f => 0 x75 , 0 x40 => 0 x09 ,
153 0 x41 => 0 x83 , 0 x42 => 0 x2c , 0 x43 => 0 x1a , 0 x44 => 0x1b , 0 x45 => 0 x6e ,
154 0 x46 => 0 x5a , 0 x47 => 0 xa0 , 0 x48 => 0 x52 , 0 x49 => 0x3b , 0 x4a => 0xd6 ,
155 0 x4b => 0xb3 , 0 x4c => 0 x29 , 0 x4d => 0 xe3 , 0 x4e => 0 x2f , 0 x4f => 0 x84 ,
156 0 x50 => 0 x53 , 0 x51 => 0xd1 , 0 x52 => 0 x00 , 0 x53 => 0 xed , 0 x54 => 0 x20 ,
157 0 x55 => 0 x f c , 0 x56 => 0xb1 , 0 x57 => 0x5b , 0 x58 => 0 x6a , 0 x59 => 0 xcb ,
158 0 x5a => 0 xbe , 0 x5b => 0 x39 , 0 x5c => 0 x4a , 0 x5d => 0 x4c , 0 x5e => 0 x58 ,
159 0 x 5 f => 0 x c f , 0 x60 => 0xd0 , 0 x61 => 0 x e f , 0 x62 => 0 xaa , 0 x63 => 0 xfb ,
160 0 x64 => 0 x43 , 0 x65 => 0x4d , 0 x66 => 0 x33 , 0 x67 => 0 x85 , 0 x68 => 0 x45 ,
161 0 x69 => 0 xf9 , 0 x6a => 0 x02 , 0 x6b => 0 x7f , 0 x6c => 0 x50 , 0 x6d => 0 x3c ,
162 0 x6e => 0 x9f , 0 x 6 f => 0 xa8 , 0 x70 => 0 x51 , 0 x71 => 0 xa3 , 0 x72 => 0 x40 ,
163 0 x73 => 0 x8f , 0 x74 => 0 x92 , 0 x75 => 0x9d , 0 x76 => 0 x38 , 0 x77 => 0 xf5 ,
164 0 x78 => 0 xbc , 0 x79 => 0xb6 , 0 x7a => 0xda , 0 x7b => 0 x21 , 0 x7c => 0 x10 ,
165 0 x7d => 0 x f f , 0 x7e => 0 xf3 , 0 x 7 f => 0xd2 , 0 x80 => 0 xcd , 0 x81 => 0 x0c ,
166 0 x82 => 0 x13 , 0 x83 => 0 xec , 0 x84 => 0 x5f , 0 x85 => 0 x97 , 0 x86 => 0 x44 ,
167 0 x87 => 0 x17 , 0 x88 => 0 xc4 , 0 x89 => 0 xa7 , 0 x8a => 0 x7e , 0 x8b => 0x3d ,
168 0 x8c => 0 x64 , 0 x8d => 0x5d , 0 x8e => 0 x19 , 0 x 8 f => 0 x73 , 0 x90 => 0 x60 ,
169 0 x91 => 0 x81 , 0 x92 => 0 x4f , 0 x93 => 0 xdc , 0 x94 => 0 x22 , 0 x95 => 0 x2a ,
170 0 x96 => 0 x90 , 0 x97 => 0 x88 , 0 x98 => 0 x46 , 0 x99 => 0 xee , 0 x9a => 0xb8 ,
171 0 x9b => 0 x14 , 0 x9c => 0 xde , 0 x9d => 0 x5e , 0 x9e => 0x0b , 0 x9f => 0xdb ,
172 0 xa0 => 0 xe0 , 0 xa1 => 0 x32 , 0 xa2 => 0 x3a , 0 xa3 => 0 x0a , 0 xa4 => 0 x49 ,
173 0 xa5 => 0 x06 , 0 xa6 => 0 x24 , 0 xa7 => 0 x5c , 0 xa8 => 0 xc2 , 0 xa9 => 0xd3 ,
174 0 xaa => 0 xac , 0 xab => 0 x62 , 0 xac => 0 x91 , 0 xad => 0 x95 , 0 xae => 0 xe4 ,
175 0 x a f => 0 x79 , 0 xb0 => 0 xe7 , 0 xb1 => 0 xc8 , 0 xb2 => 0 x37 , 0 xb3 => 0x6d ,
176 0 xb4 => 0x8d , 0 xb5 => 0xd5 , 0 xb6 => 0 x4e , 0 xb7 => 0 xa9 , 0 xb8 => 0 x6c ,

47
177 0 xb9 => 0 x56 , 0 xba => 0 xf4 , 0xbb => 0 xea , 0 xbc => 0 x65 , 0xbd => 0 x7a ,
178 0 xbe => 0 xae , 0 xbf => 0 x08 , 0 xc0 => 0xba , 0 xc1 => 0 x78 , 0 xc2 => 0 x25 ,
179 0 xc3 => 0 x2e , 0 xc4 => 0 x1c , 0 xc5 => 0 xa6 , 0 xc6 => 0xb4 , 0 xc7 => 0 xc6 ,
180 0 xc8 => 0 xe8 , 0 xc9 => 0xdd , 0 xca => 0 x74 , 0 xcb => 0 x1f , 0 xcc => 0x4b ,
181 0 xcd => 0xbd , 0 xce => 0x8b , 0 xcf => 0 x8a , 0 xd0 => 0 x70 , 0 xd1 => 0 x3e ,
182 0 xd2 => 0xb5 , 0 xd3 => 0 x66 , 0 xd4 => 0 x48 , 0 xd5 => 0 x03 , 0 xd6 => 0 xf6 ,
183 0 xd7 => 0 x0e , 0 xd8 => 0 x61 , 0 xd9 => 0 x35 , 0 xda => 0 x57 , 0xdb => 0xb9 ,
184 0 xdc => 0 x86 , 0xdd => 0 xc1 , 0 xde => 0x1d , 0 xdf => 0 x9e , 0 xe0 => 0 xe1 ,
185 0 xe1 => 0 xf8 , 0 xe2 => 0 x98 , 0 xe3 => 0 x11 , 0 xe4 => 0 x69 , 0 xe5 => 0xd9 ,
186 0 xe6 => 0 x8e , 0 xe7 => 0 x94 , 0 xe8 => 0x9b , 0 xe9 => 0 x1e , 0 xea => 0 x87 ,
187 0 xeb => 0 xe9 , 0 xec => 0 xce , 0 xed => 0 x55 , 0 xee => 0 x28 , 0 xef => 0 xdf ,
188 0 xf0 => 0 x8c , 0 xf1 => 0 xa1 , 0 xf2 => 0 x89 , 0 xf3 => 0x0d , 0 xf4 => 0 xbf ,
189 0 xf5 => 0 xe6 , 0 xf6 => 0 x42 , 0 xf7 => 0 x68 , 0 xf8 => 0 x41 , 0 xf9 => 0 x99 ,
190 0 xfa => 0x2d , 0 xfb => 0 x0f , 0 xfc => 0xb0 , 0 xfd => 0 x54 , 0 xfe => 0xbb ,
191 0 xff => 0 x16
192 }
193 }
194
195 / * Easy f u n c t i o n t o perform an i n v e r s e AES SBox o p e r a t i o n on 1 byte . * /
196 v a l a e s s b o x i n v : b i t s ( 8 ) −> b i t s ( 8 )
197 function aes sbox inv (x) = {
198 match ( x ) {
199 0 x00 => 0 x52 , 0 x01 => 0 x09 , 0 x02 => 0 x6a , 0 x03 => 0xd5 , 0 x04 => 0 x30 ,
200 0 x05 => 0 x36 , 0 x06 => 0 xa5 , 0 x07 => 0 x38 , 0 x08 => 0 xbf , 0 x09 => 0 x40 ,
201 0 x0a => 0 xa3 , 0 x0b => 0 x9e , 0 x0c => 0 x81 , 0 x0d => 0 xf3 , 0 x0e => 0xd7 ,
202 0 x 0 f => 0 xfb , 0 x10 => 0 x7c , 0 x11 => 0 xe3 , 0 x12 => 0 x39 , 0 x13 => 0 x82 ,
203 0 x14 => 0x9b , 0 x15 => 0 x2f , 0 x16 => 0 x f f , 0 x17 => 0 x87 , 0 x18 => 0 x34 ,
204 0 x19 => 0 x8e , 0 x1a => 0 x43 , 0 x1b => 0 x44 , 0 x1c => 0 xc4 , 0 x1d => 0 xde ,
205 0 x1e => 0 xe9 , 0 x 1 f => 0 xcb , 0 x20 => 0 x54 , 0 x21 => 0x7b , 0 x22 => 0 x94 ,
206 0 x23 => 0 x32 , 0 x24 => 0 xa6 , 0 x25 => 0 xc2 , 0 x26 => 0 x23 , 0 x27 => 0x3d ,
207 0 x28 => 0 xee , 0 x29 => 0 x4c , 0 x2a => 0 x95 , 0 x2b => 0x0b , 0 x2c => 0 x42 ,
208 0 x2d => 0 xfa , 0 x2e => 0 xc3 , 0 x 2 f => 0 x4e , 0 x30 => 0 x08 , 0 x31 => 0 x2e ,
209 0 x32 => 0 xa1 , 0 x33 => 0 x66 , 0 x34 => 0 x28 , 0 x35 => 0xd9 , 0 x36 => 0 x24 ,
210 0 x37 => 0xb2 , 0 x38 => 0 x76 , 0 x39 => 0x5b , 0 x3a => 0 xa2 , 0 x3b => 0 x49 ,
211 0 x3c => 0x6d , 0 x3d => 0x8b , 0 x3e => 0xd1 , 0 x 3 f => 0 x25 , 0 x40 => 0 x72 ,
212 0 x41 => 0 xf8 , 0 x42 => 0 xf6 , 0 x43 => 0 x64 , 0 x44 => 0 x86 , 0 x45 => 0 x68 ,
213 0 x46 => 0 x98 , 0 x47 => 0 x16 , 0 x48 => 0xd4 , 0 x49 => 0 xa4 , 0 x4a => 0 x5c ,
214 0 x4b => 0 xcc , 0 x4c => 0x5d , 0 x4d => 0 x65 , 0 x4e => 0xb6 , 0 x 4 f => 0 x92 ,
215 0 x50 => 0 x6c , 0 x51 => 0 x70 , 0 x52 => 0 x48 , 0 x53 => 0 x50 , 0 x54 => 0 xfd ,
216 0 x55 => 0 xed , 0 x56 => 0xb9 , 0 x57 => 0xda , 0 x58 => 0 x5e , 0 x59 => 0 x15 ,
217 0 x5a => 0 x46 , 0 x5b => 0 x57 , 0 x5c => 0 xa7 , 0 x5d => 0x8d , 0 x5e => 0x9d ,
218 0 x 5 f => 0 x84 , 0 x60 => 0 x90 , 0 x61 => 0xd8 , 0 x62 => 0xab , 0 x63 => 0 x00 ,
219 0 x64 => 0 x8c , 0 x65 => 0 xbc , 0 x66 => 0xd3 , 0 x67 => 0 x0a , 0 x68 => 0 xf7 ,
220 0 x69 => 0 xe4 , 0 x6a => 0 x58 , 0 x6b => 0 x05 , 0 x6c => 0xb8 , 0 x6d => 0xb3 ,
221 0 x6e => 0 x45 , 0 x 6 f => 0 x06 , 0 x70 => 0xd0 , 0 x71 => 0 x2c , 0 x72 => 0 x1e ,
222 0 x73 => 0 x8f , 0 x74 => 0 xca , 0 x75 => 0 x3f , 0 x76 => 0 x0f , 0 x77 => 0 x02 ,
223 0 x78 => 0 xc1 , 0 x79 => 0 xaf , 0 x7a => 0xbd , 0 x7b => 0 x03 , 0 x7c => 0 x01 ,
224 0 x7d => 0 x13 , 0 x7e => 0 x8a , 0 x 7 f => 0x6b , 0 x80 => 0 x3a , 0 x81 => 0 x91 ,
225 0 x82 => 0 x11 , 0 x83 => 0 x41 , 0 x84 => 0 x4f , 0 x85 => 0 x67 , 0 x86 => 0 xdc ,
226 0 x87 => 0 xea , 0 x88 => 0 x97 , 0 x89 => 0 xf2 , 0 x8a => 0 x c f , 0 x8b => 0 xce ,
227 0 x8c => 0 xf0 , 0 x8d => 0xb4 , 0 x8e => 0 xe6 , 0 x 8 f => 0 x73 , 0 x90 => 0 x96 ,
228 0 x91 => 0 xac , 0 x92 => 0 x74 , 0 x93 => 0 x22 , 0 x94 => 0 xe7 , 0 x95 => 0xad ,
229 0 x96 => 0 x35 , 0 x97 => 0 x85 , 0 x98 => 0 xe2 , 0 x99 => 0 xf9 , 0 x9a => 0 x37 ,
230 0 x9b => 0 xe8 , 0 x9c => 0 x1c , 0 x9d => 0 x75 , 0 x9e => 0 xdf , 0 x 9 f => 0 x6e ,
231 0 xa0 => 0 x47 , 0 xa1 => 0 xf1 , 0 xa2 => 0 x1a , 0 xa3 => 0 x71 , 0 xa4 => 0x1d ,
232 0 xa5 => 0 x29 , 0 xa6 => 0 xc5 , 0 xa7 => 0 x89 , 0 xa8 => 0 x6f , 0 xa9 => 0xb7 ,
233 0 xaa => 0 x62 , 0 xab => 0 x0e , 0 xac => 0 xaa , 0 xad => 0 x18 , 0 xae => 0 xbe ,
234 0 x a f => 0x1b , 0 xb0 => 0 x f c , 0 xb1 => 0 x56 , 0 xb2 => 0 x3e , 0 xb3 => 0x4b ,
235 0 xb4 => 0 xc6 , 0 xb5 => 0xd2 , 0 xb6 => 0 x79 , 0 xb7 => 0 x20 , 0 xb8 => 0 x9a ,
236 0 xb9 => 0xdb , 0 xba => 0 xc0 , 0xbb => 0 x f e , 0 xbc => 0 x78 , 0xbd => 0 xcd ,
237 0 xbe => 0 x5a , 0 x b f => 0 xf4 , 0 xc0 => 0 x1f , 0 xc1 => 0xdd , 0 xc2 => 0 xa8 ,

48
238 0 xc3 => 0 x33 , 0 xc4 => 0 x88 , 0 xc5 => 0 x07 , 0 xc6 => 0 xc7 , 0 xc7 => 0 x31 ,
239 0 xc8 => 0xb1 , 0 xc9 => 0 x12 , 0 xca => 0 x10 , 0 xcb => 0 x59 , 0 xcc => 0 x27 ,
240 0 xcd => 0 x80 , 0 xce => 0 xec , 0 xcf => 0 x5f , 0 xd0 => 0 x60 , 0 xd1 => 0 x51 ,
241 0 xd2 => 0 x7f , 0 xd3 => 0 xa9 , 0 xd4 => 0 x19 , 0 xd5 => 0xb5 , 0 xd6 => 0 x4a ,
242 0 xd7 => 0x0d , 0 xd8 => 0x2d , 0 xd9 => 0 xe5 , 0 xda => 0 x7a , 0xdb => 0 x9f ,
243 0 xdc => 0 x93 , 0xdd => 0 xc9 , 0 xde => 0 x9c , 0 xdf => 0 xef , 0 xe0 => 0 xa0 ,
244 0 xe1 => 0 xe0 , 0 xe2 => 0x3b , 0 xe3 => 0x4d , 0 xe4 => 0 xae , 0 xe5 => 0 x2a ,
245 0 xe6 => 0 xf5 , 0 xe7 => 0xb0 , 0 xe8 => 0 xc8 , 0 xe9 => 0 xeb , 0 xea => 0xbb ,
246 0 xeb => 0 x3c , 0 xec => 0 x83 , 0 xed => 0 x53 , 0 xee => 0 x99 , 0 xef => 0 x61 ,
247 0 xf0 => 0 x17 , 0 xf1 => 0x2b , 0 xf2 => 0 x04 , 0 xf3 => 0 x7e , 0 xf4 => 0xba ,
248 0 xf5 => 0 x77 , 0 xf6 => 0xd6 , 0 xf7 => 0 x26 , 0 xf8 => 0 xe1 , 0 xf9 => 0 x69 ,
249 0 xfa => 0 x14 , 0 xfb => 0 x63 , 0 xfc => 0 x55 , 0 xfd => 0 x21 , 0 xfe => 0 x0c ,
250 0 xff => 0 x7d
251 }
252 }
253
254 / * AES SubWord f u n c t i o n used i n t h e key e x p a n s i o n
255 * − A p p l i e s t h e f o r w a r d sbox t o each byte i n t h e i n p u t word .
256 */
257 v a l a e s s u b w o r d f w d : b i t s ( 3 2 ) −> b i t s ( 3 2 )
258 function aes subword fwd ( x ) = {
259 aes sbox fwd (x [ 3 1 . . 2 4 ] ) @
260 aes sbox fwd (x [ 2 3 . . 1 6 ] ) @
261 aes sbox fwd (x [ 1 5 . . 8 ] ) @
262 aes sbox fwd (x [ 7 . . 0 ] )
263 }
264
265 / * AES I n v e r s e SubWord f u n c t i o n .
266 * − A p p l i e s t h e i n v e r s e sbox t o each byte i n t h e i n p u t word .
267 */
268 v a l a e s s u b w o r d i n v : b i t s ( 3 2 ) −> b i t s ( 3 2 )
269 function aes subword inv (x) = {
270 aes sbox inv (x [ 3 1 . . 2 4 ] ) @
271 aes sbox inv (x [ 2 3 . . 1 6 ] ) @
272 aes sbox inv (x [ 1 5 . . 8]) @
273 aes sbox inv (x [ 7 . . 0])
274 }
275
276 / * Easy f u n c t i o n t o perform an SM4 SBox o p e r a t i o n on 1 byte . * /
277 v a l sm4 sbox : b i t s ( 8 ) −> b i t s ( 8 )
278 f u n c t i o n sm4 sbox ( x ) = {
279 sbox lookup (x , sm4 sbox table )
280 }
281
282 v a l a e s g e t c o l u m n : ( b i t s ( 1 2 8 ) , nat ) −> b i t s ( 3 2 )
283 function aes get column ( state , c ) = {
284 ( s t a t e >> ( t o b i t s ( 7 , 3 2 * c ) ) ) [ 3 1 . . 0 ]
285 }
286
287 / * 64− b i t t o 64− b i t f u n c t i o n which a p p l i e s t h e AES f o r w a r d sbox t o each byte
288 * i n a 64− b i t word .
289 */
290 v a l a e s a p p l y f w d s b o x t o e a c h b y t e : b i t s ( 6 4 ) −> b i t s ( 6 4 )
291 function aes apply fwd sbox to each byte (x) = {
292 aes sbox fwd (x [ 6 3 . . 5 6 ] ) @
293 aes sbox fwd (x [ 5 5 . . 4 8 ] ) @
294 aes sbox fwd (x [ 4 7 . . 4 0 ] ) @
295 aes sbox fwd (x [ 3 9 . . 3 2 ] ) @
296 aes sbox fwd (x [ 3 1 . . 2 4 ] ) @
297 aes sbox fwd (x [ 2 3 . . 1 6 ] ) @
298 aes sbox fwd (x [ 1 5 . . 8 ] ) @

49
299 aes sbox fwd (x [ 7 . . 0 ] )
300 }
301
302 / * 64− b i t t o 64− b i t f u n c t i o n which a p p l i e s t h e AES i n v e r s e sbox t o each byte
303 * i n a 64− b i t word .
304 */
305 v a l a e s a p p l y i n v s b o x t o e a c h b y t e : b i t s ( 6 4 ) −> b i t s ( 6 4 )
306 function aes apply inv sbox to each byte (x) = {
307 aes sbox inv (x [ 6 3 . . 5 6 ] ) @
308 aes sbox inv (x [ 5 5 . . 4 8 ] ) @
309 aes sbox inv (x [ 4 7 . . 4 0 ] ) @
310 aes sbox inv (x [ 3 9 . . 3 2 ] ) @
311 aes sbox inv (x [ 3 1 . . 2 4 ] ) @
312 aes sbox inv (x [ 2 3 . . 1 6 ] ) @
313 aes sbox inv (x [ 1 5 . . 8]) @
314 aes sbox inv (x [ 7 . . 0])
315 }
316
317 /*
318 * AES f u l l −round t r a n s f o r m a t i o n f u n c t i o n s .
319 */
320
321 v a l g e t b y t e : ( b i t s ( 6 4 ) , i n t ) −> b i t s ( 8 )
322 function getbyte (x , i ) = {
323 ( x >> t o b i t s ( 6 , i * 8 ) ) [ 7 . . 0 ]
324 }
325
326 v a l a e s r v 6 4 s h i f t r o w s f w d : ( b i t s ( 6 4 ) , b i t s ( 6 4 ) ) −> b i t s ( 6 4 )
327 f u n c t i o n a e s r v 6 4 s h i f t r o w s f w d ( rs2 , rs1 ) = {
328 getbyte ( rs1 , 3) @
329 getbyte ( rs2 , 6) @
330 getbyte ( rs2 , 1) @
331 getbyte ( rs1 , 4) @
332 getbyte ( rs2 , 7) @
333 getbyte ( rs2 , 2) @
334 getbyte ( rs1 , 5) @
335 getbyte ( rs1 , 0)
336 }
337
338 v a l a e s r v 6 4 s h i f t r o w s i n v : ( b i t s ( 6 4 ) , b i t s ( 6 4 ) ) −> b i t s ( 6 4 )
339 f u n c t i o n a e s r v 6 4 s h i f t r o w s i n v ( rs2 , rs1 ) = {
340 getbyte ( rs2 , 3) @
341 getbyte ( rs2 , 6) @
342 getbyte ( rs1 , 1) @
343 getbyte ( rs1 , 4) @
344 getbyte ( rs1 , 7) @
345 getbyte ( rs2 , 2) @
346 getbyte ( rs2 , 5) @
347 getbyte ( rs1 , 0)
348 }
349
350 / * 128− b i t t o 128− b i t i m p l e m e n t a t i o n o f t h e f o r w a r d AES ShiftRows t r a n s f o r m .
351 * Byte 0 o f s t a t e i s i n p u t column 0 , b i t s 7..0.
352 * Byte 5 o f s t a t e i s i n p u t column 1 , b i t s 1 5 . . 8 .
353 */
354 v a l a e s s h i f t r o w s f w d : b i t s ( 1 2 8 ) −> b i t s ( 1 2 8 )
355 function aes shift rows fwd (x) = {
356 l e t i c 3 : b i t s (32) = aes get column (x , 3) ;
357 l e t i c 2 : b i t s (32) = aes get column (x , 2) ;
358 l e t i c 1 : b i t s (32) = aes get column (x , 1) ;
359 l e t i c 0 : b i t s (32) = aes get column (x , 0) ;

50
360 l e t oc0 : b i t s ( 3 2 ) = ic0 [ 3 1 . . 2 4] @ ic1 [2 3 . . 1 6 ] @ ic2 [15.. 8] @ ic3 [ 7.. 0];
361 l e t oc1 : b i t s ( 3 2 ) = ic1 [ 3 1 . . 2 4] @ ic2 [2 3 . . 1 6 ] @ ic3 [15.. 8] @ ic0 [ 7.. 0];
362 l e t oc2 : b i t s ( 3 2 ) = ic2 [ 3 1 . . 2 4] @ ic3 [2 3 . . 1 6 ] @ ic0 [15.. 8] @ ic1 [ 7.. 0];
363 l e t oc3 : b i t s ( 3 2 ) = ic3 [ 3 1 . . 2 4] @ ic0 [2 3 . . 1 6 ] @ ic1 [15.. 8] @ ic2 [ 7.. 0];
364 ( oc3 @ oc2 @ oc1 @ oc0 ) / * Return v a l u e * /
365 }
366
367 / * 128− b i t t o 128− b i t i m p l e m e n t a t i o n o f t h e i n v e r s e AES ShiftRows t r a n s f o r m .
368 * Byte 0 o f s t a t e i s i n p u t column 0 , b i t s 7..0.
369 * Byte 5 o f s t a t e i s i n p u t column 1 , b i t s 1 5 . . 8 .
370 */
371 v a l a e s s h i f t r o w s i n v : b i t s ( 1 2 8 ) −> b i t s ( 1 2 8 )
372 function aes shift rows inv (x) = {
373 l e t i c 3 : b i t s ( 3 2 ) = a e s g e t c o l u m n ( x , 3 ) ; / * In column 3 * /
374 l e t i c 2 : b i t s (32) = aes get column (x , 2) ;
375 l e t i c 1 : b i t s (32) = aes get column (x , 1) ;
376 l e t i c 0 : b i t s (32) = aes get column (x , 0) ;
377 l e t oc0 : b i t s ( 3 2 ) = i c 0 [ 3 1 . . 2 4 ] @ i c 3 [ 2 3 . . 1 6 ] @ i c 2 [ 1 5 . . 8 ] @ i c 1 [ 7 . . 0 ] ;
378 l e t oc1 : b i t s ( 3 2 ) = i c 1 [ 3 1 . . 2 4 ] @ i c 0 [ 2 3 . . 1 6 ] @ i c 3 [ 1 5 . . 8 ] @ i c 2 [ 7 . . 0 ] ;
379 l e t oc2 : b i t s ( 3 2 ) = i c 2 [ 3 1 . . 2 4 ] @ i c 1 [ 2 3 . . 1 6 ] @ i c 0 [ 1 5 . . 8 ] @ i c 3 [ 7 . . 0 ] ;
380 l e t oc3 : b i t s ( 3 2 ) = i c 3 [ 3 1 . . 2 4 ] @ i c 2 [ 2 3 . . 1 6 ] @ i c 1 [ 1 5 . . 8 ] @ i c 0 [ 7 . . 0 ] ;
381 ( oc3 @ oc2 @ oc1 @ oc0 ) / * Return v a l u e * /
382 }
383
384 / * A p p l i e s t h e f o r w a r d sub−b y t e s s t e p o f AES t o a 128− b i t v e c t o r
385 * representation of i t s state .
386 */
387 v a l a e s s u b b y t e s f w d : b i t s ( 1 2 8 ) −> b i t s ( 1 2 8 )
388 function aes subbytes fwd (x) = {
389 l e t oc0 : b i t s ( 3 2 ) = a e s s u b w o r d f w d ( a e s g e t c o l u m n ( x , 0 ) ) ;
390 l e t oc1 : b i t s ( 3 2 ) = a e s s u b w o r d f w d ( a e s g e t c o l u m n ( x , 1 ) ) ;
391 l e t oc2 : b i t s ( 3 2 ) = a e s s u b w o r d f w d ( a e s g e t c o l u m n ( x , 2 ) ) ;
392 l e t oc3 : b i t s ( 3 2 ) = a e s s u b w o r d f w d ( a e s g e t c o l u m n ( x , 3 ) ) ;
393 ( oc3 @ oc2 @ oc1 @ oc0 ) / * Return v a l u e * /
394 }
395
396 / * A p p l i e s t h e i n v e r s e sub−b y t e s s t e p o f AES t o a 128− b i t v e c t o r
397 * representation of i t s state .
398 */
399 v a l a e s s u b b y t e s i n v : b i t s ( 1 2 8 ) −> b i t s ( 1 2 8 )
400 function aes subbytes inv (x) = {
401 l e t oc0 : b i t s ( 3 2 ) = a e s s u b w o r d i n v ( a e s g e t c o l u m n ( x , 0 ) ) ;
402 l e t oc1 : b i t s ( 3 2 ) = a e s s u b w o r d i n v ( a e s g e t c o l u m n ( x , 1 ) ) ;
403 l e t oc2 : b i t s ( 3 2 ) = a e s s u b w o r d i n v ( a e s g e t c o l u m n ( x , 2 ) ) ;
404 l e t oc3 : b i t s ( 3 2 ) = a e s s u b w o r d i n v ( a e s g e t c o l u m n ( x , 3 ) ) ;
405 ( oc3 @ oc2 @ oc1 @ oc0 ) / * Return v a l u e * /
406 }
407
408 / * A p p l i e s t h e f o r w a r d MixColumns s t e p o f AES t o a 128− b i t v e c t o r
409 * representation of i t s state .
410 */
411 v a l a e s m i x c o l u m n s f w d : b i t s ( 1 2 8 ) −> b i t s ( 1 2 8 )
412 f u n c t i o n aes mixcolumns fwd ( x ) = {
413 l e t oc0 : b i t s ( 3 2 ) = aes mixcolumn fwd ( a e s g e t column (x , 0) ) ;
414 l e t oc1 : b i t s ( 3 2 ) = aes mixcolumn fwd ( a e s g e t column (x , 1) ) ;
415 l e t oc2 : b i t s ( 3 2 ) = aes mixcolumn fwd ( a e s g e t column (x , 2) ) ;
416 l e t oc3 : b i t s ( 3 2 ) = aes mixcolumn fwd ( a e s g e t column (x , 3) ) ;
417 ( oc3 @ oc2 @ oc1 @ oc0 ) / * Return v a l u e * /
418 }
419
420 / * A p p l i e s t h e i n v e r s e MixColumns s t e p o f AES t o a 128− b i t v e c t o r

51
421 * representation of i t s state .
422 */
423 v a l a e s m i x c o l u m n s i n v : b i t s ( 1 2 8 ) −> b i t s ( 1 2 8 )
424 function aes mixcolumns inv (x) = {
425 l e t oc0 : b i t s ( 3 2 ) = a e s m i x c o l u m n i n v ( a e s g e t column (x , 0) ) ;
426 l e t oc1 : b i t s ( 3 2 ) = a e s m i x c o l u m n i n v ( a e s g e t column (x , 1) ) ;
427 l e t oc2 : b i t s ( 3 2 ) = a e s m i x c o l u m n i n v ( a e s g e t column (x , 2) ) ;
428 l e t oc3 : b i t s ( 3 2 ) = a e s m i x c o l u m n i n v ( a e s g e t column (x , 3) ) ;
429 ( oc3 @ oc2 @ oc1 @ oc0 ) / * Return v a l u e * /
430 }

52

You might also like