ECE 4514 Lecture 23 Chip Biometrics 5:00 What is a chip biometric? Digital chips have biometrics, just like humans. Chip biometrics—or fingerprints—are unique features that distinguish one chip from another of the same type. Such a fingerprint has two important applications: 1/ It can authenticate the FPGA, remotely Provided that we chip-biometric is unique to a chip, we can use this to distinguish one FPGA from another. 2/ It can serve as an embedded, chip-unique secret inside of the FPGA, provided that you can keep the fingerprint itself truly secret. With human (fingers or faces), application (1) (authentication) is most common. Application (2) is less common because the fingerprint itself is not truly secret. We will build a biometric into our DE1SoC FPGA, such that we are able to uniquely distinguish each individual FPGA from the others. This is less trivial than it seems: if you are building digital systems, then a given bitstream should behave identically on every FPGA. Yet, we aim for the opposite: we aim for a behavior that is different depending on the FPGA. The difference is caused by the uniqueness of the FPGA. Along the way, we'll talk about a few other related items that we will need to build such a fingerprint. - Ring oscillators - Controlled place-and-route in an FPGA fabric - Physical Unclonable Functions 5:10 Physical Unclonable Functions We will start with the fundamentals. A fingerprint is an instance of a more abstract concept, called Physical Unclonable Function, which reflects exactly what we want from a fingerprint. A Physical Unclonable Function PUF is a high-level representation cof fingerprint 'behavior': 1/ It's a Function. The output of a PUF is a bitstring that represents an encoded version of the fingerprint. However, more generally, a PUF is a function with an input and an output. We can write: R = PUF(C) R is the response, a bitstring of m bits C is the challenge, a bitstring of n bits The terminology challenge/response stems from the fact that PUFs are often used in applications 2/ It's a Physical Function. Unlike a logical function such as y = x + 1, you cannot compute a PUF. The function response is defined by the physical realization of a component. Think about an FPGA and a bitstream: the function response of a PUF is defined by the physical transistors in the FPGA fabric; it's not a code embedded in the bitstream. 3/ It's Unclonable. It's hard to copy or duplicate. Of course, unclonability has to be evaluated within the proper context. Unclonable means that there is no straightforward physical process to determine R = PUF(C) for a given chip. The physical function PUF( ) must be determined by factors that are hard to control. Of course, unclonable does not mean uncopieable. One could record R for every C, and build a big lookup table that carefully reproduces a PUF. However, that is mathematically copying the PUF. It can be prevented by creating an astronomically high number of (R,C) pairs. 5:15 Properties of the Ideal PUF Assume that we have an ideal fingerprinting function R = PUF(C) What properties would we like it to have (ideally)? Define the following quantities: R has m bits; there are 2^m responses C has n bits; there are 2^n challenges The Hamming Weight of a bitstring is the number of '1' bits it contains. HW(00111) = 3 HW(01000) = 1 The Hamming Distance between two bitstrings of equal length is the number of bits that are different between the bitstrings. HD(00111, 01000) = 4 HD(00111, 10111) = 1 Note that HD(A, B) = HW(A xor B) Here are the properties of the ideal PUF. a/ UNIQUE: Given two identical C on two different PUF, then we expect R to be completely different. This means that no two fingerprints should be identical. If PUF1 and PUF2 are implemented on two different chips, then PUF1(C) != PUF2(C) We can express this as follows: the expected Hamming Distance between two different PUFs responding to the same challenge C should be m/2; half of the response bits should be different. E[HD(PUF1(C), PUF2(C))] = m/2 The HD between two different PUF on different chips is called the inter-HD (inter-chip HD) b/ NOISELESS: Given two different evaluations of the same PUF under the same challenge, then the response should be identical. This is not obvious; remember that a PUF is not the result of a computation, but rather a physical function. A noiseless PUF is hard to achieve. E[HD(PUF1(C), PUF1'(C))] = 0 where PUF1(C) and PUF1'(C) are two different evaluations of the PUF The HD between two responses from the same PUF is called the intra-HD (intra-chip HD) c/ UNBIASED: Given a typical PUF reponse, every bit has to be zero or one with the same probability (50%), and all bits are independent. E[HW(PUF(C))] = m/2 Again, this is not obvious. The response bits are determined by a physical effect. The requirement that they are unbiased and 'random' is not easy to achieve in a real PUF. 5:25 CMOS PUF When we build a real PUF, we need to do it with CMOS technology. Hence, the first question to address is: where is the fingerprinting information in CMOS technology. From our earlier definition, we cannot simply 'build' a PUF. Instead, we have to make use of intrinsic properties of the CMOS components. Indeed, given a design in Verilog (and mapped to CMOST standard cells), then for every single INSTANCE of that design (i.e. every chip), we expect a different response. The source of randomness in CMOS are the PROCESS MANUFACTURING VARIATIONS: PROCESS ---->| Variations in process | parameters | Changes in (eg oxide thickness) | Transistor Parameters | (eg Vt) | LITHOGRAPHY ---->| Variations in device | dimensions +---> Changes in ---> Changes in | Transistor Transistor AGING EFFECTS ---->| Parameters Characteristics (eg oxide wear-out) | (eg Vt) (Delay, Bias) | ENVIRONMENT ---->| (eg temperature) The CMOS manufacturing process causes every single transistor to be a little bit different, EVEN IF we try to make an identical copy. These variations in process and litography are the source of randomness that we will be using as a fingerprint. Since we are building fully digital PUF (R=PUF(C) is digital), we need to create digital circuits that change their behavior a little bit depending on the transistor characteristics. A canonical circuit that achieves this is the Ring Oscillator: enable --> and-gate --> odd-number-of-inverters ----+----> out | | +--------------------------------------- When enable=1, this circuit has not stable state, but rather has a '1' chasing around the inverters. Interestingly, the oscillation frequency is not determined by an external clock, but rather by the propagation delay through the inverters. fro = 1/2 . 1/(propagation-delay-logic) For example: enable --> and-gate --> inv --> inv --> inv -----+------> out | | +----------------------------------+ Assume that tand = 2ns and tinv = 1ns, then fro = 1/2 * 1/(2 + 1 + 1 + 1) = 100MHz However, due to process manufacturing variations, every single inverter may have a slight different propagation delay. For example, if the middle inverter has tinv = 0.99ns, then fro = 1/2 * 1/4.99 = 100,200,400 Hz 5:40 Ring Oscillator PUF There are many different constructions of PUF in CMOS technology, but we are going to discuss a specific one: the Ring Oscillator PUF. The idea is to build N RO and compare them pairwise. The challenge C is the selected pair. The response R is the response bit. RO1 RO2 RO3 RO4 RO5 RO5 .. RON For example say we select Challenge: RO1 and RO3 Response: If f(RO1) > f(RO3) then R = 1 else R = 0 How many useful comparisons can we make using N RO? In other words, what is the 'entropy' (information content) of such an RO PUF? Note that, if f(RO1) < f(RO2) and f(RO2) < f(RO3) then f(RO1) < f(RO3) Remember that we want every response R to be independent. So given some responses, it should be impossible to guess other responses. Clearly, this means that not all comparisons between RO are useful. Practically, we take N-1 comparisons for N ring oscillators. How do we count the frequency of the ring oscillator? We use a counter incremented at the rate of the ring oscillator 5:45 A Ring Oscillator in Verilog A Ring Oscillator is an asynchronous structure; you need to design at at low-level, not using hardware inference but rather using hardware instantiation. In addition, we need to tell the synthesis tools specifically that the ring oscllator has to be implemented EXACTLY as designed. While, logically, a single inverter and three chained inverters are identical, from a PUF perspective, they are not. Quartus supports the low-level modeling of logic with the predefined lut_input and lut_output modules. module inv_cell (input wire a, output wire q); wire aw; lut_input lut_a(a, aw); lut_output lut_q(~aw, q); endmodule // inv_cell module nand_cell (input wire a, input wire b, output wire q); wire aw; wire bw; lut_input lut_a(a, aw); lut_input lut_b(b, bw); lut_output lut_q(~aw | ~bw, q); endmodule // nand_cell module ro (input wire en, output wire q); parameter STAGES = 9; wire [0:(STAGES-2)] ni; wire nandout /* synthesis keep = 1 */; wire [0:(STAGES-2)] no /* synthesis keep = 1 */; nand_cell nc(en, no[(STAGES-2)], nandout); inv_cell ic[0:(STAGES-2)] (ni, no); assign ni[0] = nandout; assign ni[1:(STAGES-2)] = no[0:(STAGES-3)]; assign q = no[(STAGES-3)]; endmodule module rocell(input wire en, output wire q); ro myro(en, q); endmodule It's worthwhile to break down the module ro precisely. (drawing) We can now instantiate 16 of these ring oscillators into a structure. This will give us a 15-bit PUF. Instead of making frequency measurements on the FPGA, we implement something differently: we measure each RO for a given period, determined by an enable bit. At the same time, we measure the reference clock at 50MHz. When we disable the enable bit, we get two numbers: - K1: The number of clock periods of the RO - K2: The number of clock periods of the reference This enables us to determine the precise frequency of the RO: fro = K1 / K2 * 50 MHz module ro16(input wire clk, input wire [31:0] cmd, output reg [31:0] r50, output reg [31:0] osc); wire [0:15] ro_en; wire [0:15] ro_q; wire ro_clk; always @ (posedge clk or negedge cmd[17]) begin if (~cmd[17]) r50 <= 32'b0; else r50 <= cmd[16] ? (r50 + 32'b1) : r50; end rocell myro[0:15] (ro_en, ro_q); assign ro_en = cmd[15:0]; assign ro_clk =^ ro_q; always @ (posedge ro_clk or negedge cmd[17]) begin if (~cmd[17]) osc <= 32'b0; else osc <= (osc + 32'b1); end endmodule (drawing) 5:55 A QSYS system for RO frequency measurement Finally, we now integrate these 16 RO into a QSYS (Platform) witch an interface to a JTAG-Avalon-Master. This will help us to control and measure the RO from the laptop (cfr. platformdemo2 in Lecture 19). module ro16_mm (input wire clk, input wire reset, input wire address, input wire read, output reg [31:0] readdata, input wire write, input wire [31:0] writedata, output reg readdatavalid, output wire waitrequest ); reg [31:0] reg00; reg [31:0] readdata_read; wire [31:0] r50; wire [31:0] osc; ro16 myro16(.clk(clk), .cmd(reg00), .r50(r50), .osc(osc)); always @ (posedge clk) begin if (reset) begin reg00 <= 0; end else begin reg00 <= (write & (address == 1'b0)) ? writedata : reg00; end end assign waitrequest = 1'b0; always @* begin readdata_read = 32'b0; if (read) begin case (address) 1'b0: readdata_read = r50; 1'b1: readdata_read = osc; endcase end end always @(posedge clk) begin readdatavalid <= read; readdata <= readdata_read; end endmodule (drawing) The programming model of the design is as follows: a/ select a given RO to measure b/ turn it on for a specific time c/ read the 50MHz count and the RO count Measurement script: set jtag_master [lindex [get_service_paths master] 0] open_service master $jtag_master proc measure {jtag_master osc len} { # enable oscillator - keep counter clear master_write_32 $jtag_master 0x10 [expr (1 << $osc)] after 100 # start measuring master_write_32 $jtag_master 0x10 [expr 0x30000 + (1 << $osc)] # wait after $len # stop measuring master_write_32 $jtag_master 0x10 0x20000 # extract result and return it set r [master_read_32 $jtag_master 0x10 2] scan $r "%lx %lx" dec1 dec2 return [list $dec1 $dec2] } for {set i 0} {$i<5} {incr i} { for {set j 0} {$j<16} {incr j} { master_write_32 $jtag_master 0x0 [expr $osc + $i * 16] set m [measure $jtag_master $j 1000] lappend t [expr [lindex $m 1] * 50000000 / [lindex $m 0]] }} # write CSV file set freq [open "freq.csv" w] for {set j 0} {$j<16} {incr j} { puts -nonewline $freq "OSC " puts -nonewline $freq [format "%2d" $j] for {set i 0} {$i<5} {incr i} { puts -nonewline $freq [format ", %11d" [lindex $t [expr $i * 16 + $j]]] } puts $freq " " } close $freq 6:00 Measurement demo Steps to compile and run the demo: 1/ Download the repository 2/ Open chipbiometrics.qpf 3/ Open qsys/platformdesigner Click Generate HDL ... Generate ... Finish 4/ Compile the design into a bitstream 5/ Download the bitstream quartus_pgm -m jtag -o "p;chipbiometrics.sof@2" 6/ Go to the measure subdirectory 7/ Open a Nios2 command shell and start system-console system-console 8/ In the lower-right pane of system console, paste the content of measure.src (it should be possible to do this directly from the command line, but this did not work in a stable manner for me) 9/ After the script completes, you should find a file freq.csv This file contains the 16 frequencies of the on-chip oscillators, each measured 5 times. These values will be unique for every board. For example, for my board they are OSC 0 162945347 162971760 162962564 162994505 162993347 OSC 1 179838750 179866156 179897952 179897244 179933949 OSC 2 240545133 240662946 240601212 240665378 240675456 OSC 3 213768999 213820565 213819042 213805373 213810417 OSC 4 207916893 207977267 208004772 207997242 208011657 OSC 5 183626711 183676345 183684079 183717380 183709903 OSC 6 175299781 175387966 175374948 175404475 175406753 OSC 7 200316138 200408572 200395518 200435972 200502588 OSC 8 166853149 166883404 166914687 166945238 166924448 OSC 9 219487059 219514424 219565248 219615929 219593669 OSC 10 268784640 268847726 268910130 268921884 268958831 OSC 11 215506044 215548012 215552379 215572410 215593732 OSC 12 182252356 182281210 182338524 182351899 182350863 OSC 13 196091759 196147519 196138847 196183126 196247944 OSC 14 176320454 176398192 176432259 176424113 176443814 OSC 15 227313689 227329137 227371986 227375590 227411109 In our case, you will see different values. This demonstrates that each instance of the same verilog design still yields a different fingerprint. 6:10 Logic Lock Even with the ring oscillators defined as such low level (cfr Verilog) there is no guarantee that their physical layout will be identical. On an FPGA, this can make a big difference. To get a better control of placement, we can make use of a LogicLock region. The idea is that you can reserve a certain portion of the FPGA for exclusive use of a specific module in your design. In our case, because the RO are still small (9 stages), we can assign a single LAB (a group of 10 lookup tables) to a specific RO. Moreover, we can keep other logic out of the neighborhood, to avoid interference. To work with Logiclock, you will need a Quartus Standard Edition. I will demonstrate it using a full version of Quartus running on our design server. (demo) In your repository, there is an sof that contains the design created with logiclock - it's called logiclock.sof. If you run the measurement script on this bitstream, you will find the ring oscillators are much more similar in frequency: OSC 0 280070760 280105016 280047529 279987548 280055918 OSC 1 274904217 274915057 274924431 274876501 274897887 OSC 2 299808919 299873080 299753085 299764996 299737873 OSC 3 284840709 284854982 284857827 284851841 284804856 OSC 4 279369651 279414954 279434505 279415368 279357839 OSC 5 276124847 276173302 276158381 276135752 276117447 OSC 6 299749525 299755547 299775742 299764557 299762441 OSC 7 283469136 283547045 283507591 283570958 283578976 OSC 8 281172141 281268373 281224019 281227763 281228897 OSC 9 280647381 280710316 280603922 280671321 280684410 OSC 10 300013532 299938179 299974089 299865135 299926556 OSC 11 285668785 285635934 285609536 285678988 285655984 OSC 12 280022111 280047055 280036408 280043939 280066548 OSC 13 282473155 282534453 282474736 282569063 282508667 OSC 14 287659381 287672234 287678499 287665537 287673939 OSC 15 295906256 295887224 295897430 295910918 295953139 That justifies that some of the differences we saw before are not process manufacturing variations, but rather differences due to routing from one ring oscillator to the next. 6:15 Conclusions Chipbiometrics (or PUFs) is an important technique with applications in authentication and key generation (informatiom security).