ECE 4514 Lecture 23 Chip Biometrics

5:00 What is a chip biometric?

Digital chips have biometrics, just like humans. Chip biometrics—or fingerprints—are unique features that distinguish one chip from another of the same type. 

Such a fingerprint has two important applications:
  1/ It can authenticate the FPGA, remotely
     Provided that we chip-biometric is unique to
     a chip, we can use this to distinguish one FPGA from
     another.
  2/ It can serve as an embedded, chip-unique secret inside
     of the FPGA, provided that you can keep the fingerprint
     itself truly secret.

With human (fingers or faces), application (1) (authentication)
is most common. Application (2) is less common because the
fingerprint itself is not truly secret.

We will build a biometric into our DE1SoC FPGA, such that we
are able to uniquely distinguish each individual FPGA from
the others. This is less trivial than it seems:
if you are building digital systems, then a given bitstream
should behave identically on every FPGA. Yet, we aim for 
the opposite: we aim for a behavior that is different 
depending on the FPGA. The difference is caused by the 
uniqueness of the FPGA.

Along the way, we'll talk about a few other
related items that we will need to build such a fingerprint.

   - Ring oscillators
   - Controlled place-and-route in an FPGA fabric
   - Physical Unclonable Functions

5:10 Physical Unclonable Functions

We will start with the fundamentals. A fingerprint is an
instance of a more abstract concept, called Physical Unclonable
Function, which reflects exactly what we want from a fingerprint. 

A Physical Unclonable Function PUF is a high-level representation
cof fingerprint 'behavior':

   1/ It's a Function. The output of a PUF is a bitstring that
      represents an encoded version of the fingerprint.
      However, more generally, a PUF is a function with an
      input and an output. We can write:

            R = PUF(C)

      R is the response, a bitstring of m bits
      C is the challenge, a bitstring of n bits

      The terminology challenge/response stems from the fact that
      PUFs are often used in applications

   2/ It's a Physical Function. Unlike a logical function
      such as y = x + 1, you cannot compute a PUF. The
      function response is defined by the physical realization
      of a component.

      Think about an FPGA and a bitstream: the function response
      of a PUF is defined by the physical transistors in the
      FPGA fabric; it's not a code embedded in the bitstream.

   3/ It's Unclonable. It's hard to copy or duplicate.
      Of course, unclonability has to be evaluated within
      the proper context.

      Unclonable means that there is no straightforward
      physical process to determine

            R = PUF(C)

      for a given chip.
      The physical function PUF( ) must be determined 
      by factors that are hard to control.

      Of course, unclonable does not mean uncopieable.
      One could record R for every C, and build a big
      lookup table that carefully reproduces a PUF.
      However, that is mathematically copying the PUF.
      It can be prevented by creating an astronomically
      high number of (R,C) pairs.

5:15 Properties of the Ideal PUF

Assume that we have an ideal fingerprinting function

    R = PUF(C)

What properties would we like it to have (ideally)?

Define the following quantities:

    R has m bits; there are 2^m responses
    C has n bits; there are 2^n challenges

    The Hamming Weight of a bitstring is the number
    of '1' bits it contains.

    HW(00111) = 3
    HW(01000) = 1

    The Hamming Distance between two bitstrings 
    of equal length is the number of bits that 
    are different between the bitstrings. 

    HD(00111, 01000) = 4 
    HD(00111, 10111) = 1

    Note that HD(A, B) = HW(A xor B)

Here are the properties of the ideal PUF.

a/ UNIQUE:

   Given two identical C on two different PUF, then we expect
   R to be completely different.

   This means that no two fingerprints should be identical.
   If PUF1 and PUF2 are implemented on two different chips, then 

   PUF1(C) != PUF2(C)

   We can express this as follows: the expected Hamming Distance
   between two different PUFs responding to the same challenge
   C should be m/2; half of the response bits should be different.

   E[HD(PUF1(C), PUF2(C))] = m/2

   The HD between two different PUF on different chips
   is called the inter-HD (inter-chip HD)

 b/ NOISELESS:

    Given two different evaluations of the same PUF under the
    same challenge, then the response should be identical.

    This is not obvious; remember that a PUF is not the result
    of a computation, but rather a physical function. A noiseless
    PUF is hard to achieve.

    E[HD(PUF1(C), PUF1'(C))] = 0

    where PUF1(C) and PUF1'(C) are two different evaluations of
    the PUF

   The HD between two responses from the same PUF
   is called the intra-HD (intra-chip HD)

 c/ UNBIASED:

    Given a typical PUF reponse, every bit has to be zero or one with
    the same probability (50%), and all bits are independent.

    E[HW(PUF(C))] = m/2

    Again, this is not obvious. The response bits are determined by
    a physical effect. The requirement that they are unbiased and
    'random' is not easy to achieve in a real PUF.

5:25 CMOS PUF

When we build a real PUF, we need to do it with CMOS technology.
Hence, the first question to address is: where is the
fingerprinting information in CMOS technology.

From our earlier definition, we cannot simply 'build' a PUF. Instead,
we have to make use of intrinsic properties of the CMOS components.

Indeed, given a design in Verilog (and mapped to CMOST standard
cells), then for every single INSTANCE of that design (i.e. every chip),
we expect a different response. 

The source of randomness in CMOS are the PROCESS MANUFACTURING VARIATIONS:

 
  PROCESS                  ---->|
  Variations in process         |
  parameters                    | Changes in 
  (eg oxide thickness)          | Transistor Parameters
                                | (eg Vt)
                                |
  LITHOGRAPHY              ---->|
  Variations in device          |
  dimensions                    +---> Changes in  ---> Changes in
                                |     Transistor       Transistor
  AGING EFFECTS            ---->|     Parameters       Characteristics
  (eg oxide wear-out)           |     (eg Vt)          (Delay, Bias)
                                |
  ENVIRONMENT              ---->|
  (eg temperature)


The CMOS manufacturing process causes every single transistor to
be a little bit different, EVEN IF we try to make an identical copy.
These variations in process and litography are the source of randomness
that we will be using as a fingerprint.

Since we are building fully digital PUF (R=PUF(C) is digital), we need
to create digital circuits that change their behavior a little bit
depending on the transistor characteristics.

A canonical circuit that achieves this is the Ring Oscillator:


   enable --> and-gate --> odd-number-of-inverters ----+----> out
                |                                      |
                +---------------------------------------

When enable=1, this circuit has not stable state, but rather has
a '1' chasing around the inverters.

Interestingly, the oscillation frequency is not determined
by an external clock, but rather by the propagation delay
through the inverters.

      fro = 1/2 . 1/(propagation-delay-logic)

For example:

   enable --> and-gate --> inv --> inv --> inv -----+------> out
                 |                                  |
                 +----------------------------------+

Assume that tand = 2ns and tinv = 1ns, then 

    fro = 1/2 * 1/(2 + 1 + 1 + 1) = 100MHz

However, due to process manufacturing variations, every single
inverter may have a slight different propagation delay.
For example, if the middle inverter has tinv = 0.99ns, then

   fro = 1/2 * 1/4.99 = 100,200,400 Hz

5:40 Ring Oscillator PUF

There are many different constructions of PUF in CMOS technology,
but we are going to discuss a specific one: the Ring Oscillator
PUF.

The idea is to build N RO and compare them pairwise.
The challenge C is the selected pair.
The response R is the response bit.

  RO1 RO2 RO3 RO4 RO5 RO5 .. RON

For example say we select 

  Challenge: RO1 and RO3

  Response: If f(RO1) > f(RO3) then R = 1 else R = 0

How many useful comparisons can we make using N RO?
In other words, what is the 'entropy' (information content)
of such an RO PUF?

Note that,

  if f(RO1) < f(RO2)
  and f(RO2) < f(RO3)
  then f(RO1) < f(RO3)

Remember that we want every response R to be independent.
So given some responses, it should be impossible to
guess other responses. Clearly, this means that not
all comparisons between RO are useful.

Practically, we take N-1 comparisons for N ring oscillators.

How do we count the frequency of the ring oscillator?
We use a counter incremented at the rate of the ring oscillator

5:45 A Ring Oscillator in Verilog

A Ring Oscillator is an asynchronous structure; you need to
design at at low-level, not using hardware inference but rather
using hardware instantiation. In addition, we need to tell the
synthesis tools specifically that the ring oscllator has
to be implemented EXACTLY as designed.

While, logically, a single inverter and three chained inverters
are identical, from a PUF perspective, they are not.

Quartus supports the low-level modeling of logic with the
predefined lut_input and lut_output modules.

module inv_cell
  (input wire  a,
   output wire q);
   wire        aw;
   lut_input  lut_a(a, aw);
   lut_output lut_q(~aw, q);
endmodule // inv_cell

module nand_cell
  (input wire  a,
   input wire  b,
   output wire q);
   wire        aw;
   wire        bw;
   lut_input  lut_a(a, aw);
   lut_input  lut_b(b, bw);
   lut_output lut_q(~aw | ~bw, q);
endmodule // nand_cell

module ro
  (input wire en,
   output wire q);
   parameter STAGES = 9;
   wire [0:(STAGES-2)] ni;
   wire                nandout /* synthesis keep = 1 */;
   wire [0:(STAGES-2)] no      /* synthesis keep = 1 */;
   
   nand_cell nc(en, no[(STAGES-2)], nandout);
   inv_cell  ic[0:(STAGES-2)] (ni, no);
   
   assign ni[0]            = nandout; 
   assign ni[1:(STAGES-2)] = no[0:(STAGES-3)];
   assign q                = no[(STAGES-3)];
endmodule

module rocell(input wire en,
        output wire q);

   ro myro(en, q);

endmodule

It's worthwhile to break down the module ro precisely.

(drawing)

We can now instantiate 16 of these ring oscillators into a
structure. This will give us a 15-bit PUF.

Instead of making frequency measurements on the FPGA,
we implement something differently: we measure each
RO for a given period, determined by an enable bit.
At the same time, we measure the reference clock at
50MHz. When we disable the enable bit, we get two numbers:
  - K1: The number of clock periods of the RO
  - K2: The number of clock periods of the reference

This enables us to determine the precise frequency of the RO:

    fro = K1 / K2 * 50 MHz

module ro16(input wire        clk,
      input wire [31:0] cmd,
      output reg [31:0] r50,
      output reg [31:0] osc);
   
   wire [0:15]             ro_en;
   wire [0:15]          ro_q;
   wire                    ro_clk;
      
   always @ (posedge clk or negedge cmd[17]) begin
      if (~cmd[17])
         r50 <= 32'b0;
      else
         r50 <= cmd[16] ? (r50 + 32'b1) : r50;
   end
   
   rocell myro[0:15] (ro_en, ro_q);
   
   assign ro_en     = cmd[15:0];
   assign ro_clk    =^ ro_q;
   
   always @ (posedge ro_clk or negedge cmd[17]) begin
      if (~cmd[17])
         osc <= 32'b0;
      else
         osc <= (osc + 32'b1);
   end

endmodule     

(drawing)

5:55 A QSYS system for RO frequency measurement

Finally, we now integrate these 16 RO into a QSYS (Platform)
witch an interface to a JTAG-Avalon-Master. This will help
us to control and measure the RO from the laptop (cfr.
platformdemo2 in Lecture 19).

module ro16_mm (input wire     clk, 
    input wire     reset, 
    input wire         address, 
    input wire     read, 
    output reg [31:0] readdata, 
    input wire     write, 
    input wire [31:0]  writedata, 
    output reg     readdatavalid, 
    output wire      waitrequest 
    );
   
   reg [31:0]          reg00;
   reg [31:0]          readdata_read;
   wire [31:0]         r50;
   wire [31:0]         osc;
   
   ro16 myro16(.clk(clk),
         .cmd(reg00),
         .r50(r50),
         .osc(osc));
   
   always @ (posedge clk) begin
      if (reset) begin
   reg00 <= 0;
      end 
      else begin
   reg00 <= (write & (address == 1'b0)) ? writedata : reg00;
      end
   end
   
   assign waitrequest = 1'b0;
   
   always @* begin
      readdata_read = 32'b0;
      if (read) begin
      case (address)
        1'b0: readdata_read = r50;
        1'b1: readdata_read = osc;
      endcase
      end
   end

   always @(posedge clk)
     begin
     readdatavalid <= read;
     readdata      <= readdata_read;
     end
   
endmodule

(drawing)

The programming model of the design is as follows:
  a/ select a given RO to measure
  b/ turn it on for a specific time
  c/ read the 50MHz count and the RO count

Measurement script:

set jtag_master [lindex [get_service_paths master] 0]
open_service master $jtag_master

proc measure {jtag_master osc len} {

  # enable oscillator - keep counter clear
  master_write_32 $jtag_master 0x10 [expr (1 << $osc)]

  after 100

  # start measuring
  master_write_32 $jtag_master 0x10 [expr 0x30000 + (1 << $osc)]

  # wait
  after $len

  # stop measuring
  master_write_32 $jtag_master 0x10 0x20000

  # extract result and return it
  set r [master_read_32 $jtag_master 0x10 2]
  scan $r "%lx %lx" dec1 dec2
  return [list $dec1 $dec2]
}

for {set i 0} {$i<5} {incr i} {
  for {set j 0} {$j<16} {incr j} {
    master_write_32  $jtag_master 0x0  [expr $osc + $i * 16]
    set m [measure $jtag_master $j 1000]
     lappend t [expr [lindex $m 1] * 50000000 / [lindex $m 0]]
}}

# write CSV file

set freq [open "freq.csv" w]
for {set j 0} {$j<16} {incr j} {
  puts -nonewline $freq "OSC "
  puts -nonewline $freq [format "%2d" $j]
  for {set i 0} {$i<5} {incr i} {
  puts -nonewline $freq [format ", %11d" [lindex $t [expr $i * 16 + $j]]]
  }
  puts $freq " "
}
close $freq

6:00 Measurement demo

Steps to compile and run the demo:

1/ Download the repository

2/ Open chipbiometrics.qpf

3/ Open qsys/platformdesigner
   Click Generate HDL ... Generate ... Finish

4/ Compile the design into a bitstream

5/ Download the bitstream

quartus_pgm -m jtag -o "p;chipbiometrics.sof@2"

6/ Go to the measure subdirectory

7/ Open a Nios2 command shell and start system-console

system-console

8/ In the lower-right pane of system console, paste the
content of measure.src

(it should be possible to do this directly from the
command line, but this did not work in a stable manner for me)

9/ After the script completes, you should find a file

freq.csv

This file contains the 16 frequencies of the on-chip oscillators,
each measured 5 times.

These values will be unique for every board.
For example, for my board they are

OSC  0  162945347 162971760 162962564 162994505 162993347
OSC  1  179838750 179866156 179897952 179897244 179933949
OSC  2  240545133 240662946 240601212 240665378 240675456
OSC  3  213768999 213820565 213819042 213805373 213810417
OSC  4  207916893 207977267 208004772 207997242 208011657
OSC  5  183626711 183676345 183684079 183717380 183709903
OSC  6  175299781 175387966 175374948 175404475 175406753
OSC  7  200316138 200408572 200395518 200435972 200502588
OSC  8  166853149 166883404 166914687 166945238 166924448
OSC  9  219487059 219514424 219565248 219615929 219593669
OSC 10  268784640 268847726 268910130 268921884 268958831
OSC 11  215506044 215548012 215552379 215572410 215593732
OSC 12  182252356 182281210 182338524 182351899 182350863
OSC 13  196091759 196147519 196138847 196183126 196247944
OSC 14  176320454 176398192 176432259 176424113 176443814
OSC 15  227313689 227329137 227371986 227375590 227411109

In our case, you will see different values.

This demonstrates that each instance  of the same verilog
design still yields a different fingerprint.

6:10 Logic Lock

Even with the ring oscillators defined as such low level
(cfr Verilog) there is no guarantee that their physical layout
will be identical.

On an FPGA, this can make a big difference.
To get a better control of placement, we can make use
of a LogicLock region. The idea is that you can reserve a
certain portion of the FPGA for exclusive use of a specific
module in your design.

In our case, because the RO are still small (9 stages), we
can assign a single LAB (a group of 10 lookup tables) to a specific
RO. Moreover, we can keep other logic out of the neighborhood,
to avoid interference.

To work with Logiclock, you will need a Quartus Standard Edition.
I will demonstrate it using a full version of Quartus running
on our design server.

(demo)

In your repository, there is an sof that contains the design
created with logiclock - it's called logiclock.sof.
If you run the measurement script on this bitstream,
you will find the ring oscillators are much more similar
in frequency:


OSC  0  280070760 280105016 280047529 279987548 280055918
OSC  1  274904217 274915057 274924431 274876501 274897887
OSC  2  299808919 299873080 299753085 299764996 299737873
OSC  3  284840709 284854982 284857827 284851841 284804856
OSC  4  279369651 279414954 279434505 279415368 279357839
OSC  5  276124847 276173302 276158381 276135752 276117447
OSC  6  299749525 299755547 299775742 299764557 299762441
OSC  7  283469136 283547045 283507591 283570958 283578976
OSC  8  281172141 281268373 281224019 281227763 281228897
OSC  9  280647381 280710316 280603922 280671321 280684410
OSC 10  300013532 299938179 299974089 299865135 299926556
OSC 11  285668785 285635934 285609536 285678988 285655984
OSC 12  280022111 280047055 280036408 280043939 280066548
OSC 13  282473155 282534453 282474736 282569063 282508667
OSC 14  287659381 287672234 287678499 287665537 287673939
OSC 15  295906256 295887224 295897430 295910918 295953139

That justifies that some of the differences we saw before
are not process manufacturing variations, but rather
differences due to routing from one ring oscillator to the
next.

6:15 Conclusions

Chipbiometrics (or PUFs) is an important technique with
applications in authentication and key generation (informatiom
security).