🔐 A Physical Unclonable Function for any FPGA

Introduction
Theory of Operation
- Post-Processing
Top Entity
Evaluation
Summary
TODO

Introduction

A physical unclonable function (PUF) provides a digital fingerprint that can be used as unique identifier for security-related applications like authentication. The fpga_puf hardware module provides an unclonable 96-bit unique identifier (ID) that is defined by the target chip's semiconductor characteristics. It is implemented in a technology-independent way that does not use any device-specific macros, primitives or attributes so it can be implemented on any FPGA (verified using Intel Quartus Prime, Lattice Radiant and Xilinx Vivado).

🔑 The PUF ID is unique for the combination of bitstream and one specific FPGA. The same bitstream will lead to a different PUF ID on a different FPGA of same type. If the bitstream of a specific FPGA is changed (by changing design logic) the PUF ID of that specific FPGA will also change.

Key features

technology-, vendor- and platform-agnostic implementation
easy to use
tiny hardware footprint (less than 200 LUTs)
secure and reliable (🚧 work-in-progress)

📢 This is an ongoing research / proof-of-concept project. Feedback from the community is highly appreciated (see notes in section "TODO")!

Theory of Operation

Each bit of the 96-bit PUF ID is generated by an individual PUF cell. Each cell consists of an asynchronous element that is based on a simple ring oscillator. The oscillator is constructed from a single inverter that negates it's own output signal. This feedback loop is interrupted by a latch. If the latch is open (=transparent) the oscillator starts oscillating. If the latch is closed, it stores the last state of the oscillator (high or low). The latch also provides a reset to bring each cell into a defined state.

The frequency of each oscillator is defined by the mapping of the logic cell and the according routing. Both factors are constant for a specific bitstream. Furthermore, the frequency is also randomly (but constantly for a given setup) "tuned" by the chip's semiconductor characteristics. These are caused by tiny productions fluctuation (for example the capacitance / delay of a routing wire is affected by variation in oxide thickness). The frequency of each oscillator is considered to be fixed. Hence, sampling several periods within a fixed time window will always end in the same state (oscillator output is high or low). Temperature drift disturbs all oscillators in the (nearly) same way and has to be computationally eliminated/compensated by the post-processing.

Since fpga_puf does not use any device-specific attributes or primitives, there is a risk that the synthesis toolchain will remove the asynchronous PUF cells or collapse them into a single PUF cell ("optimize away"). Therefore, the design uses a shift register to control reset and the latch open/close phase of each cell individually and distributed over time. This concept is based on the neoTRNG True Random Number Generator and allows a platform-independent implementation.

Whenever a new sampling of the PUF ID is started, a 96+1 bit wide shift register. A single '1' is applied to the least significant bit that travels throughout the whole shift register chain during operation. The PUF cell's control signals (reset and latch control) are connected to this shift register. Each shift register bit i controls the reset of PUF cell i and the latch control of PUF cell i+1. Thus, a cell's reset is active for one clock cycle. In the next clock cycle the cell's latch is opened for exactly one clock cycle allowing the oscillator to run. In the next clock cycle the latch is closed again and has captured the oscillator's last state. By this scheme only one oscillator is active at a time reducing potential distortion by oscillator cross-talk (i.e. the oscillation of one osc. affect the oscillations of another osc.).

When the single one-bit reaches the most significant bit (bit 96) of the shift register the sampling process is completed and the latch states are sampled into a register that provides the obtained raw PUF ID.

The following figure is taken from the Intel Quartus Prime technology viewer showing a single PUF cell (cell #44). The oscillator as well as the latch are mapped to a single LUT3 element in the FPGA. The output of this asynchronous element is sampled by a simple register. Hence, a single PUF cell only requires one LUT and one FF.

(DATAA is the latch reset (high-active), DATAC is the latch control (open when set), DATAD is the feedback of the inverter)

Post-Processing

Some bits of the raw PUF ID might be quite noisy, which complicates defining a stable ID that does not change over time and over a broad range of operation conditions (e.g. temperature). A lot(!) of different approaches can be found in literatures. One promising approach is the use of error-correction codes to "stabilizes" an initially determined ID.

However, these concepts are out of (my) scope so I used a simple "averaging" concept for this example. My approach samples the raw PUF ID several times (for example 4096 times) and checks how often each bit of the ID is set across all sampled IDs. If an ID bit is set in more then half of the samples it is considered to be a static 1, otherwise it is considered to be a static 0. Since a few bits of the raw ID might be quite noisy, a hysteresis is used to eliminate those bits from the final PUF: if a bit is set or cleared more often than a certain threshold it is considered "stable" in the final ID. These lower and upper hysteresis threshold define an uncertain band between them. For the final ID all bits tha fall into this uncertainty band are masked to be always zero.

My post-processing concept can be found in sw/main.c. The source file provides more-detailed comments.

Top Entity

The top entity of the PUF IP module is fpga_puf (rtl/fpga_puf.vhd). It can be directly instantiated without the need for any special libraries.

entity fpga_puf is
  port (
    clk_i  : in  std_ulogic; -- global clock line
    rstn_i : in  std_ulogic; -- SYNC reset, low-active
    trig_i : in  std_ulogic; -- set high for one clock to trigger ID sampling
    busy_o : out std_ulogic; -- busy when set (sampling ID)
    id_o   : out std_ulogic_vector(95 downto 0) -- PUF ID (valid after sampling is done)
  );
end fpga_puf;

⚠️ Note that the PUF cannot be simulated due to it's combinatorial loops.

Evaluation

To evaluate this concept and the quality/reliability of the PUF IDs I am using the NEORV32 as processor platform. The fpga_puf IP module is added to the processor's "Custom Functions Subsystem (CFS)", which is a blank template for implementing custom application-specific co-processors.

The setup is synthesized for different FPGAs using different toolchains (tested with Intel Quartus Prime, Lattice Radiant and Xilinx Vivado). A specific bitstream generated and programmed into several FPGAs of the same type to check for chip-specific ID variations. On one chip the ID is generated several times to check if it is "stable over time" and thus, reliable.

Setup

The application-specific PUF-wrapping CFS can be found in rtl/neorv32_cfs.vhd. Make sure to use this CFS file instead of the default NEORV32 CFS source file when reproducing this setup.

The CFS uses four memory-mapped registers to interface the PUF ID module:

CSF register address (C access macro)	Access	Function
`NEORV32_CFS.REG[0]`	`r/w`	Control register
`NEORV32_CFS.REG[1]`	`r/-`	PUF ID bits 31:0
`NEORV32_CFS.REG[2]`	`r/-`	PUF ID bits 63:32
`NEORV32_CFS.REG[3]`	`r/-`	PUF ID bits 95:64

Bit 0 of the control register (PUF_CTRL_EN) control the synchronous reset signal of the fpga_puf module (rstn_i). Setting this bit will activate the module, clearing it will put the module into reset state.

Bit 1 of the control register (PUF_CTRL_SAMPLE) is used to start the sampling of the ID. Writing one to it will set the trigger signal (trig_i) high for one cycle. Reading this control register bit will return the busy state of the fpga_puf module (busy_o).

💾 The low-level hardware-accessing functions are implemented in sw/fpga_puf_neorv32_cfs.c. The header file sw/fpga_puf_neorv32_cfs.h provides the function prototypes, a PUF ID data type and also the NEORV32 CFS register mappings and bit definitions.

The PUF test program (sw/main.c) is used to sample the chip's PUF ID and transmit it via UART to a terminal program:

Physical Unclonable Functions <fpga_puf> Test
PUF implemented as NEORV32 Custom Functions Subsystem (CFS)

Press any key to start PUF test (8 runs with 4096 samples each).
Starting test...
Run 0 ID: 0x37c0480063021011988c0095
Run 1 ID: 0x37c0480063021011988c0095
Run 2 ID: 0x37c0480063021011988c0095
Run 3 ID: 0x37c0480063021011988c0095
Run 4 ID: 0x37c0480063021011988c0095
Run 5 ID: 0x37c0480063021011988c0095
Run 6 ID: 0x37c0480063021011988c0095
Run 7 ID: 0x37c0480063021011988c0095
Test completed.

💾 A pre-compiled NEORV32 executable of this test program is also available in this repository: sw/neorv32_exe.bin (compiled for a minimal rv32i NEORV32 CPU)

ID Results

So far I have tested the PUF module on three Lattice iCE40 UltraPlus FPGAs and four Intel Cyclone IV FPGAs (shout-out to @emb4fun - thank you for your help!) and just one Xilinx Artix-7 FPGA. The ID was generated 8 times on each chip to "check" if it is reproducible (drawback: just in a very short time window...). The specific FPGA types are shown in section Hardware Utilization.

FPGA	PUF ID ("Fingerprint Key")
Lattice iCE40 UltraPlus - 1	`0x37c0480063021011988c0095`
Lattice iCE40 UltraPlus - 2	`0x592063e50118040c2000112b`
Lattice iCE40 UltraPlus - 3	`0x941c82505112323600b0c221`
Intel Cyclone IV - 1	`0x9fbe33dc9021be3156cae31b`
Intel Cyclone IV - 2	`0xa726db495a2a8346b30f2000`
Intel Cyclone IV - 3	`0x413fe25557311be7f9f5edfb`
Intel Cyclone IV - 4	`0xca8ee8dc7945fa60f770e1fa`
Xilinx Artix-7 - 1	`0x6faaf93af77cf77fef91fe79`

So far, the IDs are unique for each tested FPGA!

Reliability

🚧 work in progress 🚧

A long-time test is used to sample and check raw IDs (not the pre-processed ones) for stability over time. Note that my setup is placed in pretty stable environment conditions.

The test generates an initital ID I right at the beginning of execution. In an endless loop two consecutive IDs A and B are sampled (a single "Run"). To evaluate the the "instability" the Hamming distance (number of bits that are not identical across two samples) is computed. This test computes the relative Hamming distance of the two consecutive samples A and B (= H(A,B)) and also the relative Hamming distance between the initial sample and sample A (= H(I,A)) from the current run. Furthermore, the maximum ob both distances is computed over all runs (H_max(A,B) and H_max(I,A)).

To identify noisy bits an "accumulated bit-change-mask" F is computed. This mask is computed for every run by XOR-ing the obtained samples A and B with the initial ID I. The mask from a run is OR-ed with the mask from the previous run to accumulate the noisy bits over time.

Cut-out of the test log:

...
Run 43373: I=0x9fae9dd83029bc7156cbe37b, A=0xbfbe3bfc9423beb156cae31b, B=0xbfbe3bfcb021beb156cae31b - F=0x6038be24ac1e83c080214860 (33) - H(A,B)=3, H_max(A,B)=9 - H(I,A)=19, H_max(I,A)=23
Run 43374: I=0x9fae9dd83029bc7156cbe37b, A=0xbfbe3bfc9435beb156eae31b, B=0xbfbe3bfcb431beb156eae31b - F=0x6038be24ac1e83c080214860 (33) - H(A,B)=2, H_max(A,B)=9 - H(I,A)=21, H_max(I,A)=23
Run 43375: I=0x9fae9dd83029bc7156cbe37b, A=0xbfbe3bfc9425beb156caeb1b, B=0xbfbe3bdcb421beb156cae31b - F=0x6038be24ac1e83c080214860 (33) - H(A,B)=4, H_max(A,B)=9 - H(I,A)=20, H_max(I,A)=23
Run 43376: I=0x9fae9dd83029bc7156cbe37b, A=0xbfbe3bfc9425beb156eaeb1b, B=0xbfbe3bfcb021beb156eae31b - F=0x6038be24ac1e83c080214860 (33) - H(A,B)=4, H_max(A,B)=9 - H(I,A)=21, H_max(I,A)=23
Run 43377: I=0x9fae9dd83029bc7156cbe37b, A=0xbfbe3bfc9021beb156caeb1b, B=0xbfbe3bdc9425beb156cae31b - F=0x6038be24ac1e83c080214860 (33) - H(A,B)=4, H_max(A,B)=9 - H(I,A)=18, H_max(I,A)=23
...

This very first test was run for approx. half an hours making ~20 runs per second. It shows a maximal Hamming distance of 23 bits while 33 bits (F) tend to be noisy (compared to the initial ID I sampled once right at the beginning of the test). The number of noisy bits increases slowly over time, probably because of increasing chip temperature. It tends to increase slower over time, so there might be a saturation at some point. The temperature of the PUF cells is most important because that impacts the oscillator frequencies. The PUF cells heat up due to the continuous PUF operation.

The PUF from this test setup provides 96-33=63 bits that seem to be stable over the observed time. This also means that the usable key space is reduced (63-bit instead of 96-bit).

Hardware Utilization

Mapping results for the custom function subsystem (CFS), the fpga_puf module and a single PUF cell.

Lattice ice40 UltraPlus `iCE40UP5K-SG48I` @24MHz	Logic Cells	Logic Registers
neorv32_cfs_inst_true.neorv32_cfs_inst	171 (71)	232 (35)
-fpga_puf_inst	100 (4)	197 (101)
--fpga_puf_cell_inst[0].fpga_puf_cell_inst_i	1 (1)	1 (1)

Intel Cyclone IV `EP4CE22F17C6N` @100MHz	Logic Cells	Logic Registers
neorv32_cfs:\neorv32_cfs_inst_true:neorv32_cfs_inst	241 (43)	232 (35)
-fpga_puf:fpga_puf_inst	198 (102)	197 (101)
--fpga_puf_cell:\fpga_puf_cell_inst:0:fpga_puf_cell_inst_i	1 (1)	1 (1)

Xilinx Artix-7 `XC7A35TICSG324-1L` @100MHz	Logic Cells	Logic Registers
neorv32_cfs_inst_true.neorv32_cfs_inst (neorv32_cfs)	133	328
-fpga_puf_inst (fpga_puf)	133	293
--fpga_puf_cell_inst[0].fpga_puf_cell_inst_i (fpga_puf_cell)	1	2 (Latch + FF)

ℹ️ Xilinx Vivado can detect the PUF cell's latch and infers a FF primitive, which also provides a latch mode. This does not compromise the functionality of the PUF.

Summary

The PUF IDs are unique for each FPGA and can be successfully implemented on different FPGAs (Xilinx, Intel, Lattice).

The raw PUF ID is only partly reproducible, because some bits tend to be quite noisy (see results above). A better post-processing algorithm using error-correction codes should be able to compensate for that. This is work in progress.

The latest test showed that there is a certain number of bits that are quite noisy so they cannot be used to determine the PUF IF. Obviously, this reduces the maximum key space (for example only 63-bit of the 96-bit ID are usable). To circumvent this, the PUF can be made arbitrarily wide (for example providing a 256-bit raw ID). Of course this will also introduce additional unusable noisy bits, but we expect that the percentage of noisy bits within the PUF ID is a device-specific constant. Each additional PUF ID bit adds 2 FFs (one for the SREG and one for the PUFF cell) and 1 LUT. For each additional bit the ID sampling time is increased by one clock cycle.

📢 If you have any kind of ideas or feedback (for example how to improve reliability) feel free to open a new issue. I am also happy to get more data, so if you have ported the design on another FPGA you can open a pull request and add your results.

TODO

test more FPGAs
check stability in a controlled environment (e.g. temperature chamber)
evaluate more sophisticated post-processing algorithms
sample more data from more long-time test
- faulty bits over time (and temperature)
- hamming distance over time (and temperature)
- ...
to be continued...

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

stnolting / fpga_puf

Programming Languages

Labels

Projects that are alternatives of or similar to fpga puf