Implementation and Functional Verification of RISC-V Core For Secure IoT Applications

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

2021 International Conference on Microelectronics (ICM)

Implementation and Functional Verification of


RISC-V Core for Secure IoT Applications
Abdelrahman Adel1, Dina Saad1, Mahmoud Abd El Mawgoed1, Mohamed Sharshar1, Zyad Ahmed1, Hala Ibrahim2, Hassan Mostafa1,3
1
Electronics and Electrical Communications Engineering Department, Cairo University, Egypt
2
Siemens EDA, Cairo, Egypt
3
University of Science and Technology, Nanotechnology and Nanoelectronics Program, Zewail City of Science and Technology, October Gardens, 6th of October,
Giza 12578, Egypt.
2021 International Conference on Microelectronics (ICM) | 978-1-6654-0839-4/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICM52667.2021.9664926

Email: hmostafa@uwaterloo.ca

Abstract— In the world of technology we live in, there is a huge


increase in the number of internet of things (IoT) devices leading
to a tremendous amount of data being sent. This wireless data is
prone to eavesdropping and being hacked. The contribution of this
work is the design of a System on Chip (SoC) with a processor
based on the instruction set architecture (ISA) of reduced
instruction-set computer (RISC-V). The system focuses on the
security of data between IoT end-nodes. For SoC verification, a
Universal Verification Methodology (UVM) environment is used
for covering most of the functionality and security aspects to
guarantee a sufficient level of trust in the implemented SoC.
Keywords—RISC-V, pipeline, UVM, verification, SoC, IoT.
Fig.1: System on Chip block diagram
I. INTRODUCTION
Nowadays, most of the data is being sent wirelessly. Some of this
III. RISC-V CORE DESIGN
data is confidential. So, data needs to be secured through A. Unpipelined RV32I.
communication. This work proposes a solution, which integrates a RISC-V’s base ISA has six base instruction formats: R-type for
hardware security intellectual property (IP), a UART IP for
register-register instructions, I-type of immediate values and load
communication with the external world, and a core based on the RISC-
V ISA. The full system implements a system-on-chip to act as an end instructions, S-type for store instructions, B-type for conditional branch
node for IoT systems as shown in fig.1. The main advantages of RISC- instructions, U-type for long immediate instructions, and J-type for
V ISA is being an open-source ISA, simple, stable, extensible, and unconditional branch instructions.
modular [1][5]. Regarding the verification phase, a UVM environment There are some basic blocks that any processor should contain:
was built to verify the functionality of the core. UVM is a set of class • Program counter (PC): holds the address of the next instruction to
libraries developed using SystemVerilog's syntax and semantics. be executed.
UVM's major goal is creating modular, reusable, and scalable testbench • Instruction memory: holds the instructions (to be executed)
frameworks that are used to validate numerous designs usually named accessed by the address stored in the program counter.
Design Under Test (DUT) [2].
• Control unit: acts as the mind of the processor that generates control
II. BACKGROUND AND RELATED WORKS signals to control the data path of each instruction.
When it comes to RISC-V SoC-based designs, there are many • ALU: performs arithmetic and logical operations with the aid of the
contributions directed towards this field. This is because the RISC-V control signals generated from the control unit.
ISA is open source and easy to deal with. Most of these contributions • Data memory: stores the temporary data used by the processor
are either directed towards the design and optimization as illustrated in during its operation.
[3] and [4] or directed towards Verification solutions as illustrated in • Register file: contains all the general-purpose registers of the
[6]. However, the verification solutions being illustrated in the microprocessor that holds data and addresses. It consists of thirty-
previously mentioned works are dynamic functional verification. In our two registers each of 32-bits width.
work, we also illustrate the importance of static verification techniques
such as source code linting, clock domain crossing (CDC), and reset
domain crossing (RDC) verification in modern SoC designs. This work B. M-extension (Multiply and divide extension)
also tries to achieve a balance between the focus on the design phase One of the most important advantages that RISC-V offers is
and multiple phases of verification. So, it is more of an A-to-Z supporting multiple extensions, depending on the required application.
workflow. Moreover, there are very few RISC-V based SoCs that are For our application, the multiplication/division extension (M-
targeting hardware security. Most proposed security solutions are extension) is essential to have as any simple code may require at least
targeting software security with the aid of the RISC-V based SoCs such one multiplication/division instruction.
as [3], which is still prone to bugs, as well as not being targeted for Low An implementation of multiplication and division operations is to
power applications. In our work, hardware security for IoT applications be implemented behaviorally using FPGA’s Digital signal processor
is the target. Thus, the ACORN algorithm (hardware implementation of
(DSP), which will end up having a combinational logic. Having 32-bit
a lightweight cryptography algorithm) is used to save power. ACORN
operands for the multiplication and division instructions, the generated
algorithm is one of the two finalists in the lightweight category in the
Competition for Authenticated Encryption: Security, Applicability and block will consume a large area, power, and delay.
Robustness (CAESAR).

978-1-6654-0839-4/21/$31.00 ©2021
Authorized licensed use limited to: ANNA IEEE
UNIVERSITY. Downloaded254
on August 24,2022 at 08:29:04 UTC from IEEE Xplore. Restrictions apply.
2021 International Conference on Microelectronics (ICM)

A common solution for this is to always predict branches as not taken.


In case it is taken, a flush signal will be asserted to reset the first three
stages of the pipeline architecture.

IV. SYSTEM ON CHIP


Most real-life applications are implemented using System on
chips, not using stand-alone cores. Our SoC is designed to act as an end
Fig.2: five-stage pipeline
node in an IoT system that needs to transfer its data securely over a
wired or a wireless communication channel while maintaining the
The second implementation is sequential multipliers/dividers
algorithms. For division, the “restoring division” algorithm is chosen security of data throughout the communication process. For a simple
while for multiplication “radix-2 booth multiplier” for signed end-node, our RISC-V core acts as a controller for reading and writing
data from other IPs on the SoC. These IPs are HW security IP
multiplications is used.
(Hardware implementation of ACORN algorithms) and a UART IP.
After implementing both combinational and sequential
implementations, it was observed that sequential implementation is A. SoC integration (MIMO)
advantageous over combinational one concerning area, power, and The integration between the Core and the IPs has been implemented
speed as illustrated in Table III. However, nothing comes with no by redirecting the core’s load and store instructions to access internal
tradeoffs, the core now has instructions that consume more than one registers of the IPs using an address decoder. This solution is known as
clock cycle which is called “multi-cycle instructions”. So, to maintain memory-mapped input-output (MMIO).
the order of the instructions, the processor’s pipeline must be frozen
for thirty-two clock cycles until Multiplication/Division unit is done B. SoC integration issues and proper solutions
with results calculations. This is done using a Multiplication and One of the most critical problems that need to be caught and fixed
Division stalling unit (MDSU). when designing modern SoCs is the clock domain crossing issue. In our
C. Pipelined RV32IM SoC, we have three clock domains, one for the core, one for the Acorn
security IP, and another for the UART IP. In multiple clock domains, if
Pipelining is a process of arrangement of hardware elements of the data is being sent from one domain to another, and the transmitting and
CPU such that its overall performance is increased. It basically breaks receiving clocks are not an integer multiple of each other, data from the
up our single instruction path into a distinct number of stages, allowing first domain will inevitably arrive at the setup or hold time of the second
simultaneous execution of more than one instruction in the pipelined domain leading to metastability.
processor. In this work, the processor is designed to have a five-stage The most suitable solution to avoid metastability is using an
pipeline with fetch, decode, execute, memory, and write-back stages in Asynchronous FIFO. Asynchronous FIFO is a memory block with read
this sequence as shown in fig.2. In the pipeline, some situations that and write pointers for reading and writing data, the transmitting and
prevent the next instruction -in the instruction stream- from executing receiving domains have different clocks. Two-flop synchronizers are
during its designated clock cycles. This is known as a hazard [7]. used to synchronize the read pointer to the writing domain and
Hazards can be classified into three categories, which are: synchronize the write pointer to the reading domain. These pointers
1) Structural hazards: occurs due to resource contention, this was should be gray encoded (so that only one-bit changes from a count to
avoided using separate instruction and data memories (Harvard the next one) as the Two-flop synchronizers do not synchronize multi-
architecture). bit buses due to its one-clock cycle uncertainty.
Another problem is the reset domain crossing (RDC). The term
2) Data hazards: occurs due to data dependency between
instructions, where a value that has not been updated yet is required RDC refers to a design method in which the source and destination parts
in another stage. Data hazards are solved either by forwarding data (flops, latches, and clock gates) operate on different independent resets.
Metastability occurs when an asynchronous reset from one reset
and/or by stalling the core. Both solutions are done using different
domain causes a transition too close to the clock edge of a flip-flop in
units embedded in the core. These units are:
another reset domain or without a reset, causing a non-deterministic
• Forwarding unit at execute stage (third stage) flip-flop value that propagates throughout the design resulting in
If one of the source registers (Rs) addresses of the third stage is functional failures. Having used asynchronous FIFO to eliminate
equal to the destination register (Rd) address of the fourth or the fifth metastability due to CDC, this problem has also been avoided, as when
stage, this means that the instruction in stage three needs a value exchanging the pointers between the reading and writing domains, these
from stage four or stage five. So, this value should be forwarded. pointers are the only signals that are crossing reset domains, the pointers
• Forwarding unit at memory stage (fourth stage): are gray encoded, and a synchronizer is added. Thus, eliminating the
If one of the Rs addresses of store instructions at the fifth stage is metastability.
equal to the Rd address of load instructions at the fourth stage, this
means that a value from stage five should be forwarded to stage V. FUNCTIONAL VERIFICATION
four. Functional verification ensures that the implementation conforms
• Hazard detection unit at decode stage (second stage): to the specification from a functional perspective. Because of the rapid
It is responsible for catching data dependency between instructions growth of both design size and complexity, functional verification has
where the data needed is not yet calculated to be forwarded. So, this become one of the key bottlenecks in the design process. In this work,
unit generates a stall signal to freeze the first two stages for one the functional verification is done using static verification which will
clock cycle until a value is calculated, then forwards the data using be discussed in (SoC section) and dynamic verifications using the
the forwarding unit. Universal Verification Methodology (UVM) [2].
3) Control hazards: occurs due to branch instructions. In our As designs grow with respect to size and complexity, it takes time
implementation, the branch decision is taken at the memory stage to verify them completely with the conventional test benches which are
(fourth stage). This delay in determining the correct instruction to not reusable.
be fetched allows three new instructions to enter the pipeline.

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded255


on August 24,2022 at 08:29:04 UTC from IEEE Xplore. Restrictions apply.
2021 International Conference on Microelectronics (ICM)

TABLE I. TRACEABILITY MATRIX


Test Features UT_1 UT_2 UT_3 UT_4 UT_5 UT_6 UT_7 UT_8 UT_9 UT_10 UT_11
Instruction R-TYPE
Types U-TYPE
variations S-TYPE
J-TYPE
B-TYPE
I-TYPE
Hazards Structural hazards
variations Data hazards Forwarding
Stall generation
Control hazards Flush generation
PC update
M-extension Multiplication Signed
unsigned
Division Signed
unsigned
Operand sign Positive
variation Negative
Zero
Corner cases Wrong opcode

The problem with these designs is testing all possible combinations of TABLE II. TEST FEATURES
their signals, which takes too long to simulate. UVM addresses this Unit Test. Description
issue by constraining the stimulus to certain values and randomizing
(UT_1) Testing of each type’s basic operation separately.
them. Accordingly, some advantages of the UVM are:
(UT_2) Testing of the M-extension basic operation with a
• Reduction of coding effort and enabling a high level of code reuse combination of different inputs and testing the data
• Support of constrained random verification methodology hazard (forwarding unit stage 3).
• Extensive stimulus generation (UT_3) Testing of the division by zero.
• Decrease of verification cycle (UT_4) Testing of the data hazard (forwarding unit stage 4).
A. UVM components and objects (UT_5) Testing of the data hazard (stalls generation) for two
dependent instructions as in Load -> Add.
• Driver: drives transactions from the sequencer to the DUT.
(UT_6) Testing of the data hazard with M-extension.
• Monitor: captures signals from the DUT and converts them from (UT_7) Testing of two consecutive stalls (Load -> Add ->
pin level to transaction level. Input monitor ensures that the
Load -> Add).
transaction has been sent correctly to the DUT.
(UT_8) Testing of the Conditional jump and Unconditional
• Sequencer: controls the flow of transactions. jump, check the flush, and the PC value updates.
• Sequence: generates a transaction and randomizes it. (UT_9) Testing of a combination of hazards like control
• C++ reference model: mimics the functionality of the DUT. hazards (B-type/ J-type) with data hazards (stall)
• Scoreboard: checks that everything has worked by comparing the and checking FLUSH and the PC value updates.
output signals from the DUT and the reference model. (UT_10) Testing Branch instructions with M-extension
B. Reactive Agent instructions.
Stalls should prevent the core from fetching a new instruction. So, (UT_11) Fault injection with wrong opcodes (corner case).
the randomization of the sequence items in the sequence needs to be
stalled and no new instructions should be driven to the DUT. A similar The reactive agent-based verification approach is used to verify a
case is when the MULDIV instructions enter the core, the core is frozen design that works as follows: Device-1 and Device-2 are
for multiple clock cycles. This randomization process of the sequence communicating with each other, where Device-2 is generating a request
items sent to the DUT is controlled by a signal monitored from the DUT (which is the stall signal) whereas Device-1 is responding to the request
itself. A Reactive Agent should be used in this case instead of an Active (By sending a new random sequence item if no stalls occurred or by
Agent to control when exactly an instruction should be sent to the DUT freezing the randomization process if stalls occurred). Device-2 is the
[8]. DUT and Device-1 is the Reactive Agent. There are several approaches
to implement a reactive agent. In this work, the Monitor-sequencer
approach is used. The monitor serves as the sampler in this method. The
monitor delivers the request to the sequencer after sampling it. After
sampling the stall from the DUT, the sequence could stop the
randomization process according to the value of the sampled stall
signal.
VI. THE BEAUTY OF VERIFICATION
At first, a test plan was implemented for direct testing to hit specific
interesting cases, which is illustrated in Table I and Table II, and then
the UVM environment is implemented as shown in fig.3.

Due to the usage of constrained random stimulus to the DUT,


unexpected bugs that were not considered in the test plan have been
revealed. Some of these bugs are illustrated in the following section.

Fig.3: UVM environment

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded256


on August 24,2022 at 08:29:04 UTC from IEEE Xplore. Restrictions apply.
2021 International Conference on Microelectronics (ICM)

A. Core Bugs TABLE III. DIFFERENT IMPLEMENTATION RESULTS


1) First Bug: Shift Arithmetic Right (SRA) and Shift Arithmetic Core Implementation Un-Pipelined Pipelined
Left (SRL) instructions were not implemented correctly (when
comparing with a RISC-V simulator). The core was shifting the first MUL/DIV Combinational Sequential
operand by the whole value of the second operand. Implementation
2) Second Bug: Branch Less Than (BLT). This instruction subtracts Speed 2.4 MHz 12 MHz 125 MHz
the first operand from the second operand, then checks the sign of the IX. FUTURE WORK
result to determine branching decision. The bug was that in case an
The UVM environment was designed for testing the RISC-V Core.
overflow occurred in the subtraction, the sign bit of the result is not
The UVM environment can be furthermore improved to be able to test
calculated correctly, which leads to a wrong branching decision.
the whole SoC.
3) Third Bug: Branch instructions with M-extension instructions. If
the M-extension instruction is at the execute stage and the branch at the X. ACKNOWLEDGMENT
memory stage, two conflicting control signals will be asserted This work was partially funded by ONE Lab at Zewail City of
simultaneously. Firstly, the stall signal of the MDSU unit will disable Science and Technology and Cairo University, Siemens EDA
all the five pipeline registers for thirty-two clock cycles and the UVM (Mentor Graphics), ASRT, NTRA, and ITAC.
sequence will be blocked due to the reactive agent, which will cause the
instruction at the decode stage to be duplicated in the fetch stage in the XI. CONCLUSION
next cycle. Secondly, the FLUSH signal will reset the first three IoT security is a very important aspect that should be
pipeline registers. In this case, the MDSU stall signal should not be accomplished. Hardware security has proven to be more robust than
asserted. software security. This is due to being faster and immune to software
B. Bugs Validation bugs and having dedicated hardware to execute, which provides more
security to the IoT data. The huge increase in the complexity of SoCs
1) The first bug was fixed by modifying the ALU to do shifting led EDA companies to develop powerful tools to deal with the issues
by the value of the least-significant five bits of the second operand. that may arise while designing and discover these issues in the early
2) The second bug was fixed by adding an overflow flag and phases to save time and money. In this work, an SoC was designed for
XORing it with the sign flag to decide whether a branch will be taken. IoT hardware security applications, and a generic UVM environment
3) The third bug was solved by modifying the priority of the was designed to test our implementation of the RISC-V based Core
FLUSH signal to be greater than the MDSU unit stall signal. and can test other implementations of other RISC-V cores as well.
VII. QUESTA TOOLS REFERENCES
Nowadays, the Complexity of SoC design increases rapidly. [1] Patterson, David, and Andrew Waterman. The RISC-V Reader: an
Accordingly, tools serving the digital field are indispensable. Through open architecture Atlas. Strawberry Canyon, 2017.
our project, most of the tools used were Siemens EDA tools, such as: [2] Accellera Systems Initiative. “Universal Verification Methodology
Questa Sim: provides you with simulation, debug, and verification (UVM) 1.2 User’s Guide” October 2015.
platforms. The tool was used in the design and verification phases [9]. [3] Wu, Wenjuan, Dongchu Su, Bo Yuan, and Yong Li. "Intelligent
Questa Lint: provides an easy solution that enables you to perform Security Monitoring System Based on RISC-V SoC." Electronics 10,
extensive checks on the early stages of RTL development to avoid bad no. 11 (2021): 1366.
coding techniques that may end up with RTL synthesis issues.[10] [4] Kumar, Vinay BY, Anupam Chattopadhyay, Jawad Haj-Yahya,
Questa CDC/RDC: very powerful tool, using only RTL (and UPF and Avi Mendelson. "Itus: A secure risc-v system-on-chip." In 2019
power intent file), Questa CDC/RDC used to identify chip-killing 32nd IEEE International System-on-Chip Conference (SOCC), pp.
clock/reset-domain crossing issues rapidly [11]. 418-423. IEEE, 2019
VIII. RESULTS [5] M. Bahnasawi, A., K. Ibrahim, A. Mohamed, M. Khalifa, A.
Moustafa, K. Abelmonim, Y. ismail, and H. Mostafa, “ASIC-Oriented
According to the previously mentioned core and
Multiplication/Division extension implementations, Table III shows Comparative Review of Hardware Security Algorithms for the Internet
the results of these different implementations with respect to the speed of Things Applications”, IEEE International Conference on
of the core. After performing both direct testing and constrained Microelectronics (ICM 2016), Cairo, Egypt, pp. 285-288, 2016.
random testing with the UVM environment on the core, the resulting [6] Oleksiak, Adrian, Sebastian Cieślak, Krzysztof Marcinek, and
bugs were fixed and validated. This was an acceptable exit criterion Witold A. Pleskacz. "Design and´ verification environment for risc-v
for the testing phase. As shown in fig.4, a snippet from the UVM report
processor cores." In 2019 MIXDES-26th International Conference"
shows the results obtained when comparing the output of the C++
reference model and the output of the DUT. After implementing, Mixed Design of Integrated Circuits and Systems", pp. 206-209. IEEE,
verifying, and integrating the core with the hardware security and 2019.
UART IPs, an assembly code using RISC-V instructions were written [7] Hennessy, J. L., and D. A. Patterson. "Computer Organization and
to test the system-on-chip functionality. The code was written to send Design RISC-V Edition: The Hardware Software Interface." (2017).
a string “Mentor Graphics” from the core to the hardware security IP
for encryption, then sending it back to the core upon finishing [8] A. Hussien, S. Mohamed, M. Soliman, Hagar Mostafa, K. Salah,
encryption. Then the encrypted string "#`Nÿ ra-_-/@&Ï" was sent M. Dessouky, and Hassan Mostafa, “Development of a Generic and a
to the UART to be transmitted to the external world safely. This Reconfigurable UVM-Based Verification Environment for SoC
encrypted data was then sent back to the hardware security IP to Buses, ” IEEE International Conference on Microelectronics (ICM
perform data decryption, the decrypted data was sent back to the core 2019), Cairo, Egypt, pp. 195-198, 2019.
to ensure it was decrypted successfully to “Mentor Graphics”.
[9] Questa® Sim User Guide version 2021.3, 2021.
[10] Questa® Lint User Guide version 2021.3, 2021.
[11] Questa® CDC User Guide version 2021.3, 2021.
Fig.4: UVM results

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded257


on August 24,2022 at 08:29:04 UTC from IEEE Xplore. Restrictions apply.

You might also like