Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

PASC: Physically Authenticated Stable-C

Clocked SoC
Platforrm on Low-Cost FPGA As
Aydin Aysu and Patrick Schaumont
Electriical and Computer Engineering Department
Virginia Tech
Blacksburg, VA, USA
e-mail: { aydinay, schaum }@vt.edu
Abstract— Generation of device-unique digitaal signatures using
Physically Unclonable Functions (PUFs) is an active area of
research for the last decade. However, most PUFs
P are conceived
and designed as stand-alone hardware modulees. In contrast, this
paper proposes a PUF architecture that is tighhtly integrated into
the core of a system-on-chip (SoC), with the pu
urpose of creating a
Figure 1. Critical timing Δt is the timme it takes for a signal to reach from the
physical SoC authentication mechanism. Thee proposed PUF is output of register R1 passing throughh the Data Path into the register R2. If
integrated into the custom instruction interfaace of the NIOS-II clock period τ < Δt, the critical timingg is violated on this path.
processor. Therefore, PUF challenges caan be issued by
instruction calls which allows run-time authenntication and which
enables implementation of flexible post-processsing mechanisms in
software. The proposed PUF utilizes crittical timing path
violations of a custom instruction execution to generate digital
signatures which are unique for individual ch hips due to random
process variations. We implement PASC on a low-cost Altera
DE0-Nano Development Board and we validatte the quality of the
authentication keys on 15 Boards.
Keywords—Physical Uncloneable Function
ns; System-on-Chip
Integration; HW/SW Co-design; FPGA
I. INTRODUCTION
Physical Unclonable Functions (PUFs)) utilize random
process variations during manufacturing of an Integrated
Circuit (IC) to generate device-unique elecctronic signatures.
The basic model of a PUF is a challenge/ressponse mechanism Figure 2. Generic structure of a PUF thhat can utilize critical timing violations
where the challenge is a distinct input and a method to trigger using a fixed clock source
the PUF and the response is the correspondding output. If the The major contributions of thiss work are:
responses are device-unique and unclonablee, then it could be • To propose a PUF architecture that can generate
used for applications such as bitstream protection
p [1] and device-unique responses at run-time, using only the
tamper-resistant key storage [2]. nominal clock frequenncy of the processor.
The core idea of the proposed PUF is to t use the critical
timing violation frequency (1/Δt) of data paths; this quantity is • To demonstrate a physically authenticated SoC
expected to be device-unique [3]. Fig. 1 showss the generic platform (PASC) thatt can do run-time authentication
structure of these PUFs. Initially, both R1 and
a R2 are set to a with the integrated PUF
P architecture. The PUF is
known state. At time 0, a value is launchedd from R1. After a integrated as a custom
m-instruction into the processor.
device-dependent, finite amount of time (Δt), this value The extraction of authhentication key, including all the
propagates through the datapath and is captuured at R2. To find required post-proceessing, can be completely
the exact value of Δt, we require a mechanissm that can sweep supported by softwarre. This solution combines the
the frequency of the clock input of R1 andd R2 to determine uniqueness of hardw ware with the flexibility of
when the expected value of R2 is not satisfiedd [3]. software.
Fig. 2 shows the generic architecture of a PUF that avoids The rest of the paper is orrganized as follows: Section II
the need for clock sweeping. Instead using one data path, we gives an overview of the literatture of PUFs on FPGAs and our
use several data paths which have different ΔtΔ values. If we set motivation to design a new onne. Section III demonstrates the
the critical timing of these paths as Δt1<Δtt2<Δt3<…<Δtn we principle of operation. Sectioon IV presents our PUF, its
could use a stable clock frequency and cheeck at which data integration to the SoC architeccture, and some SW-based post-
path, executions start to fail. The proposedd PUF structure is processing methods to generatee authentication keys out of raw
built on this key idea. The sweeping mechhanism is replaced PUF responses. Section V quantifies
q the uniqueness and
with a custom instruction executed by a proocessor that causes reliability of the authenticationn keys. Section VI concludes the
critical timing violations at the nominal clockk frequency. paper.

This research was support in part through NSF


N Grant 0964680

9788-1-4799-2079-2/13/$31.00 ©2013 IEEE


TABLE I. OVERVIEW OF FPGA PUF CONSTRUCTIONS back to the input of the path, creating a RO. Process
variations causes delay variations, so ROs will oscillate at
Measured Technology
Resource
Variation Requirement
Reference different frequencies.
Power cycling, Given the number of available PUFs, why do we need yet
Power-up Values,
Memory
Write Collisions
Bitstream Manipulation, [5],[6] another one? Currently known PUF constructions are deeply
True Dual-Port SRAMs entrenched into the FPGA fabric, and require tight control
Delay Timing of Delay Strict Control over [7],[8],[9], over logic placement or power. In contrast, our solution
Elements Paths Placement and Routing [10] avoids backend-specific constructions.
Critical Timing External High-
Micro- Table I gives an overview of FPGA PUF constructions.
Violation Precision Clock [3]
processor
Frequency Generator Memory based PUFs [5], [6] require manipulation of
Custom Critical Timing Custom Instruction
This paper bitstream, power cycling or true-dual port SRAMs. The
Instruction Violation Path Execution uniqueness and reliability performance of Arbiter-based
PUFs [7], Glitch-based PUFs [8], and RO-based PUFs [9]
are drastically reduced if the control over the placement and
II. PREVIOUS WORK AND MOTIVATION
routing tools are limited (eg. Altera FPGAs). Most recently,
There have been many PUF proposals on many platforms a PUF that utilizes scan-chain registers of the Xilinx FPGA
and the reader can refer to Maes et al. for a detailed survey is presented [10]. This PUF is a very high-cost
on PUFs [4]. In this paper, we focus on the portable FPGA implementation (7k LUT, 58 BRAMs) because it utilizes
based PUF implementation as the number of viable the paths delays of an AES implementation and several
solutions are limited. These types of PUFs do not require post-processing techniques in hardware.
strict control over place and routing tools or do not use
bitstream manipulations. Maiti presents a solution based on critical timing
violations [3]. However, this PUF requires a sweeping
A popular group of PUFs are memory based mechanism using an external clock source to generate
constructions. These PUFs utilize transistor mismatches of critical timing path violations which is not applicable for
SRAM cells. Maes et al. presents a memory based PUF on low-cost devices that have a fixed and stable clock source.
FPGAs [5]. This PUF reads out the initial power-up values
of flip-flops. Another PUF is presented by Güneysu which Given the number of available PUF constructs, there is
utilizes memory-write collisions [6]. also a lack of prototype test platforms for physical
authentication on reconfigurable devices. In this paper, we
Another group of PUFs are delay based PUFs. These propose PASC; a Physically Authenticated Stable-Clocked
PUFs are further divided into arbiter, ring oscillator and SoC platform with a novel PUF that uses a fixed clock
glitch PUFs. Arbiter based PUFs on FPGAs are presented in source and that offers a seamless integration to the
[7]. The basic idea is to compare two paths. Paths designed architecture of the test platform. The proposed PUF is
with the same intended delays, will have slightly different integrated into the custom instruction interface of the NIOS-
delays when manufactured. An arbiter circuit is used to II processor. Therefore, PUF responses can be collected by
select only one of these paths with a predetermined criteria instruction calls which allows run-time authentication and
(eg. always select the faster one). Anderson proposes the enables implementation of post-processing in software.
glitch PUF as an alternative delay based PUF [8], which
utilizes the timing variation of an artificially generated III. PRINCIPLE OF OPERATION
glitch. Ring-oscillator (RO) based delay PUFs have been Fig 3(a) shows the key idea of Maiti's clock-sweeping
proposed in [9]. The output of a path is inverted and fed PUF [3]. In this Figure, the X axis represents the input clock

Figure 3. The principle of operation for PUF construction in [3] (a) and the proposed (b). In [3], the input clock frequency is swept and the chip identity is measured
as a function of frequency, whereas the proposed construction uses a fixed clock input and the chip identity is measured as a function of percentage of correct
executions.
frequency, and the Y axis represents thee proportion of send from the processor using the custom instruction
correct executions. Due to process variatiions, instruction interface. The value dataaa sent from the processor is
executions will have a different critical timing t violation registered when the value off synchronization signal start is
frequency on each device. We can check thhe correctness of ‘1’. In one clock cycle, first, the
t values of these input registes
the return value of instruction by comparring it with the are fed into the delay-registerr chain and then stored inside 32
golden value. Therefore, this work sweepss the frequency registers.
range of the clock input to find at whichh frequency the It takes a finite amount of
o time to read out the value of a
processor fails to execute instructions. The key observation register and to transfer it to the delay-register chain
here is that, it is not feasible to utilize nativee instructions of 0 , to propagate it through thhe chain 0 and to write
a processor under a fixed clock input, becauuse the variation it into a register 0 . , , are device-unique
of the % of correct executions are too widde and it drops random values. If the totaal amount of time required to
down rapidly to 0% causing most of the chipps with the same complete this path is shorter than the operating clock period,
violation behavior. For example, if we seleect 100 MHz as a critical timing violation occurs.
o Equation (1) formulates
the fixed frequency of the input clock in Fiig. 3 (a), Chip 3 this condition where τ denootes the clock period. If (1) is
and Chip 4 will have different % of correctt executions, but satisfied, the values of somee output registers might not get
Chip 1 and Chip 2 will both have 100% correct executions updated due to timing violatioon.
and Chip 5 and Chip 6 will both havve 0% correct
executions.
τ 1
Fig. 3 (b) shows the expected behavior of o the proposed
stable clocked PUF. We use a fixed input clock and a
custom instruction. The custom instructionn is designed to
By choosing the numberr of LUTs to generate the chain
drop down slowly from 100% correct executions to 0%
carefully (the example in the figure assumes 48 LUT, they
correct executions. In contrast to the behavvior in Fig. 3(a), are LUT_0 to LUT_47) , wee can set the value to satisfy
the chips in Fig. 3(b) would have differennt % of correct the conditions of (2) and (3).
executions at the fixed clock frequency of 1000 MHz.
IV. PHYSICAL AUTHENTICATTION
τ 2
The proposed physical authentication solution is not an
obvious combination of PUF and softwarre, but rather a
system-level, tight integration of both. In this
t section, we
first discuss the PUF structure followed by its system
τ 3
integration and how to process the raw PU UF responses to
generate the authentication keys.
A. Novel PUF If (2) and (3) are satisfiedd we can utilize the fact that the
Fig. 4 shows the architecture of a propoosed PUF block. value of the th register willl be updated properly whereas
The PUF block architecture consists of a seriial delay-register the value of the (n+1)th reegister will not, because of the
chain. In our architecture, we configured the LUT_0 to critical timing violation. Since , , are device-
LUT_47 as a buffer. The output of LUT_16 to LUT_47 is unique random values depennding on process variation, the
captured in a register. These 32 registers generate a 32-bit value of n will be random.
output response that could be sent back to t the processor Now we define how to formalize
fo the challenge response
using the custom instruction interface. Thee complete PUF pairs and the rationale behinnd this construction. If the PUF
architecture consists of 64 copies of the PUF
P block from is not activated by the proceessor, all registers are preset to
Figure 4. ‘0’. The challenge dataa inpuut is a value where 1-bit input is
The input of the PUF block is 1-bit inputt dataa which is set to ‘1’.The scan-register chain enables us to generate a

Figure 4 Architecture of a proposed PUF block. 64 of thhese blocks are used to generate the authentication key.
Figure 6. High-level PUF block diagram

Fig. 6 shows the high-level block diagram of the


complete PUF architecture. We used 64 of the proposed PUF
blocks in our prototype platform. The output of these are
multiplexed and we can select the final 32-bit response. The
custom instruction interface input signal datab is used for the
selection. Since there are 64 PUF blocks in our architecture,
Figure 5. Number of ‘1’s at the output registers vs. operating frequency of the total number of raw responses is 64. The number of PUF
the processor blocks used can be scaled up or down depending on the
timing violation and the chain structure ensures that we number of responses that the designer wants to generate. The
utilize random variations of , , . Our experiments prototype can easily support this operation by using
showed that (for i = 0, 1, … 31) log2(datab) bits for muxing the output of these blocks.
which means that after a LUT inside the delay-register chain B. System Integration
generates the output, the time it takes to write it to the
Fig. 7 shows the high-level architecture of the PASC.
register is less than generating the next value of the chain
The complete system consists of a NIOS-II processor, a 32-
and writing it to the register at that level. Therefore, the Kb SRAM, a JTAG interface for programming, Avalon BUS
resulting 32-bit final value will be a string of ‘1’s followed for communication and the proposed timing violation based
by a string of ‘0’s (eg. 11111111111000000000000000000 PUF for physical authentication. The proposed PUF is
000). Since the delay values are device-unique random integrated to the custom instruction interface of the NIOS II
values, the resulting value can be used as a PUF. A done processor. The PUF can be issued by an instruction using a C
signal is also generated to enable the processor to capture code as in (4)
the output.
Fig 5 shows the operation of the proposed PUF. The X _ _ , ; 4
axis represents the operating frequency of the chip and the Y
axis is the number of ‘1’s observerd at the output registers. If
the frequency is low (the clock period τ is high), all output When the function violate_path_puf is being used in
registers are set to ‘1’ since there are no timing violations on software, a custom instruction is issued by the NIOS II
critical paths and all values can reach and be written inside processor. dataa and datab are the input and
32 output registers. If the operating clock frequency is very response is the output of this function. dataa, datab, and
high (the clock period τ is low), none of these values could response are 32-bit integers.
reach and be written to the output. The registers stays in their
preset value of ‘0’. If the clock frequency is something in
between (eg. 100MHz), then we could observe different
output on different devices depending on the process
variations. This shows that using a fixed clock we can still
utilize the critical timing violations as a PUF on FPGA.

Figure 8. Timing diagram of the communication between processor and


Figure 7. The high-level architecture of the PASC PUF using the custom instruction interface
Figure 9. Block diagram of PUF HW, and SW post-processing

Fig. 8 shows the timing diagram of the custom


instruction interface [11]. When the PUF instruction is issued
by the processor, the start signal gets the value of ‘1’ for one
clock cycle. The two data inputs, data and datab stays valid
until the PUF generates output values. When the PUF
response is ready the done signal gets the value ‘1’ and at the
same clock cycle processor reads out the value result and
stores it in response. The processor stays idle until the done
value is raised high after issuing the PUF instruction.
C. Generating Authentication Keys From Raw PUF
Responses
After the value of the PUF output is sent to processor, we
could use several post-processing techniques to increase the
uniqueness, entropy and the reliability of the PUF, and
generate the final physical authentication key. Since the
output of the PUF is transmitted to the processor by the
custom instruction interface, we can apply all post-
processing in software. Using software enables us to have Figure 10. The pair-wise difference histogram. All 30240 possible pair-wise
flexible post-processing mechanisms. The purpose of the differences have a Gaussian-like distribution centered at 0.
paper is not to demonstrate the optimum post-processing After Pair-wise Difference operation, we apply a simple
techniques, but to show a proof-of-concept post-processing encoding scheme that maps arithmetic values to a 2-bit
operations that could be performed using the PASC. encoded symbol to increase entropy. The default setting for
Fig. 9 shows the high level block diagram of the software encoding map is ‘00’, ‘11’ and ‘01’, for ‘-1’, ‘0’ and ‘+1’
post-processing operations. We applied five simple respectively. All other possibilities are mapped to ‘10’.
techniques; Majority Voting, Mapping, Pair-wise Difference, Finally, we could use any block cipher implementation in
Encoding and the Block Cipher. First we apply a simple software to further enhance the uniqueness of the generated
Majority Voting on raw PUF responses. The processor issues PUF response. We decided to use 128-bit Advanced
several PUF executions using the same challenge, and the Encryption Standard (AES) as the block cipher. We used the
results are stored in a table. Then, the most frequently encoded data as the key of the block cipher and a software
occurring PUF response is selected. This process filters out challenge can be the plaintext of this cipher. This enables a
low frequency noise of the response. The same process space of 2128 challenge-response pairs. The effects of using
repeats for all 64 challenges. The user can dynamically these techniques will be discussed in Results section.
configure the number of issued votes to trade-off reliability Flexibility of the software enables us to have multiple
for execution time. authentication key for the same encoded data. We could also
Mapping is the step where the digital responses are use a Hash function instead of a block cipher but in that case
turned into arithmetic values. For our experiments, the PUF the authentication key will be fixed to the input encoded
HW output values are between ‘0x0003FFFF’ to data.
‘0x007FFFFF’ and they are mapped from ‘0’ to ‘+5’
accordingly. Using software enables us to adjust the mapping V. RESULTS
for different technology nodes.
The proposed SoC platform is implemented on the DE0-
Following the mapping, we perform a pair-wise Nano Board which uses a Cyclone IV FPGA board. 15
difference operation. Instead of using absolute responses, we boards are used for experiments. The NIOS-II soft processor
subtract response pairs and use the resulting difference. This is configured on the SoC and clocked at a constant 100 MHz
method reduces the bias effects and correlations [12]. Since using internal PLLs.
64 PUF blocks generate 64 responses, we could gather
(64×63)/2=2016 pair-wised differences.
Fig. 10 shows the distribution of the responses after the The architecture of the PUF consists of a delay-register
Pair-wise Difference operation. Since there are 2016 pairs chain that allows for determining the critical timing
and 15 devices, total number of differences generated with violation path which has static randomness due to process
the proposed PUF architecture is 2016×15=30240. The variations. PASC is implemented on a low-cost Cyclone IV
results showed that the distribution of the pair-wised FPGA. 15 Boards are successfully authenticated with the
differences has a Gaussian-like form. proposed method. The proposed PUF outperforms other
Two metrics are used for the performance evaluation of timing violation based PUFs [3] in terms of uniqueness and
PUFs. These are uniqueness and reliability [3]. bit-length of the generated signature. Since the PUF block is
Uniqueness (U) is a metric used for estimating how well architecturally visible by the SoC, the future work could be
each chip is being authenticated. Therefore, the responses of implementing security protocols that utilizes PUF block as a
different devices are compared using equation (5). and source of non-volatile key storage or random number
stands for -bit responses to the same challenge on different generation.
chips. The total number of chips is m and the metric used for
The total number of PUF blocks used in PASC is 64 to
the comparison is Hamming Distance denoted as HD. The
ideal value of uniqueness is 50%. The uniqueness of the achieve a reasonable cost. However, the proposed PUF
authentication key generated by PASC is 38.2% without the architecture is easily parallelizable with the proposed
AES and 49.2% with the AES. integration and offers a highly scalable solution also for
applications that require larger/more keys.
2 ,
100% 5 REFERENCES
1
[1] Guajardo J., Kumar S.S., Schrijen G.J., Tuyls P.: Physical Unclonable
Functions and Public-Key Crypto for FPGA IP Protection. In: 17th
Reliability (R) is a metric used for estimating the International Conference on Field Programmable Logic and
reproducibility of the generated authentication keys. Applications, (FPL’07), pp.189-195, Amsterdam, Holland (2007)
Therefore the response is generated multiple times on the [2] Kursawe K, Sadeghi A., Schellekens D., Skoric B., Tuyls P.:
Reconfigurable Physical Unclonable Functions - Enabling technology
same chip and the generated responses are compared using for tamper-resistant storage. In: 2nd International Workshop on
equation (6). , , is the n-bit lth sample of the n-bit response Hardware-Oriented Security and Trust (HOST’09) pp. 22-29,
. The total number of samples is L and the metric used for Anaheim, CA, USA (2009)
comparison is Hamming Distance denoted as HD The ideal [3] Maiti A., Schaumont P.: A Novel Microprocessor-Intrinsic Physical
value of reliability is 100%. The reliability of the Unclonable Function. In: 22nd International Conference on Field
Programmable Logic and Applications (FPL’12), pp. 380-387. Oslo,
authentication keys generated by PASC is 99.92% without Norway (2012)
the AES and 96.7% with the AES. AES significantly reduces [4] Maes, R., Verbauwhede, I.: Physically unclonable functions: A study
the reliability because it amplifies the noise. Even one-bit of on the state of the art and future research directions. Towards
change at AES input could result a large HD at the output. Hardware-Intrinsic Security, pp. 3-37. (2010)
[5] Maes, R., Tuyls, P., Verbauwhede, I.: Intrinsic pufs from Flip-Flops
,
1 , , on recongurable devices. In: 3rd Benelux Workshop on Information
100% 100% and System Security (WISSec’08), pp. 14-31. Eindhoven, The
6 Netherlands (2008)
[6] Guneysu, T.: Using Data Contention in Dual-ported Memories for
The cost of one PUF block is 1 registers to store input Security Applications. Journal of Signal Processing Systems, vol. 67,
data, 32 registers to store output data and 48 LUTs. The pp. 15-29. (2012)
SoC platform uses 64 of these blocks, 128 LUTs to [7] Hori, Y., Yoshida T., Katashita T., and Satoh A.: Quantitative and
multiplex the output of 64 PUF blocks and 2 registers for statistical performance evaluation of arbiter physical unclonable
functions on FPGAs. In: International Conference on Reconfigurable
synchronization with the processor. Therefore, the total Computing and FPGAs (ReConFig’10), pp. 298-303. Cancun,
hardware cost for authentication is 3072 LUTs and 2114 Quintana Roo, Mexico (2010)
registers. 64 PUF blocks can be mapped into 192 LABs. An [8] Anderson, J.H.: A PUF design for secure FPGA-based embedded
LAB consists of 16 registers and 16 LUTs. systems. In: 15th Asia and South Pacific Design Automation
Conference (ASP-DAC’10), pp. 1-6. Taipei, Taiwan (2010)
VI. CONCLUSION AND FUTURE WORK [9] Maiti, A., Schaumont, P.: Improving the quality of a physical
In this paper, we propose PASC; a Physically unclonable function using configurable ring oscillators. In: 19th
International Conference on Field Programmable Logic and
Authenticated Stable Clocked SoC Platform. PASC uses a Applications (FPL’09), pp. 703-707. Czech Republic (2009)
novel architecture for PUF operation that offers a seamless [10] Aarestad, J., Ortiz P., Acharyya D., Plusquellic J.: HELP: A
integration to the SoC platform. The main advantage of the Hardware-Embedded Delay PUF, IEEE Design and Test, vol 30, pp.
proposed PUF architecture compared to similar 17-25. (2013)
constructions [3] is its stable clock operation. The proposed [11] Altera Corperation, NIOS II custom instruction user guide,
http://www.altera.com/literature/ug/ug_nios2_custom_instruction.pdf.
PUF architecture is integrated into the custom instruction January, (2011)
interface of the NIOS-II processor. Therefore, PUF [12] Ju J., Chakraborty R., Lamech C., Plusquellic J.: Stability Analysis of
responses can be issued by instruction calls which allows a Physical Unclonable Function based on Metal Resistance
run-time authentication and which enables implementation Variations. In: 6th International Workshop on Hardware-Oriented
of flexible post-processing mechanisms in software. Security and Trust (HOST’13) Austin, TX, USA (2013)

You might also like