ECE 4514 Lecture 20 Designing with Multiple Clocks

5:00 CLOCK SKEW IN SYNCRHONOUS CIRCUITS

Consider the following setup:

     register1 --> logic --> register2
        clk1                   clk2

The registers are characterized by:
  - tsetup
  - tcq
  - thold (we previously assumed thold = 0)

In an ideal case, clk1 and clk2 are identical copies.
What relationship can we derive with respect to the proper operation of the circuit?

                 +------------+           +-----------+
                 |            |           |           |
   clk1  --------+            +-----------+           +-------

                 |---------------->|                        tcq + tlogic 

                 +------------+           +-----------+
                 |            |           |           |
   clk2  --------+            +-----------+           +-------

                                       |<-|                 tsetup
                 |->|                                 |->|  thold

We can derive two conditions:

(1) the logic should complete computation before the next edge of clk2

        T > tcq + tlogic + tsetup

(2) the logic should NOT change during the hold time of register2

        thold < tcq + tlogic

Due to imperfections in clock distribution, the clock signal may be slightly
offset for each flop.

The offset from clk1 to clk2 is called tskew
With a positive skew, clk2 shifts to the right (later than clk1)
With a negative skew, clk2 shifts to the left (earlier than clk1)

POSITIVE SKEW:

                 +------------+           +-----------+
                 |            |           |           |
   clk1  --------+            +-----------+           +-------

                 |---------------->|                        tcq + tlogic 

                 |-->|                                      tskew

                     +------------+           +-----------+
                     |            |           |           |
   clk2      --------+            +-----------+           +-------

                                           |<-|                 tsetup
                     |->|                                 |->|  thold

This affects our two conditions as follows:

(1) the logic should complete computation before the next edge of clk2

        T + tskew > tcq + tlogic + tsetup

    or

        T > tcq + tlogic + tsetup - tskew

    This seems to be a good thing. The skew relaxes the clock period constraint,
    such that we can use a smaller clock period.

(2) the logic should NOT change during the hold time of register2

        thold + tskew < tcq + tlogic
 
    or

        thold < tcq + tlogic - tskew

    This is problematic. Assume that the skew increases, then at some point even
    a very small hold time (in the limit, 0) will no longer be met.

    This means that the skew is so large that register2 captures the data
    on the CURRENT clock edge rather than the next one.

    In the circuit, it appears as if the signal 'flies right through' a register.

NEGATIVE SKEW:

Describe the situation when skew is negative:

        T > tcq + tlogic + tsetup + |tskew|  (1)
        thold < tcq + tlogic + |tskew|       (2)

     (1) -> the clock period will need to increase to cope with negative skew
     (2) -> safe. thold will not be violated.

Therefore, controlling skew is important. Unfortunately, for realistic circuits,
clock skew can be positive as well as negative. For example:

        +---------------------------------------------+
        |                                             |
        V                                             |
    register1  -->  logic  --> register2 --> logic2 --+
       clk1                      clk2

Assume in this circuit, that the clock is distributed from clk1 to clk2.
Then we will see a positive skew for register 2, but a negative skew
for register 1.

5:20 CLOCK DISTRIBUTION

Clock distribution is therefore an important aspect of chip design.
An FPGA or ASIC may use a specific clock network.
A common type of clock network is the H-tree:

        +      +      +      +
        |      |      |      |
        +---+--+      +---+--+
        |   |  |      |   |  |
        +   |  +      +   |  +
            |             |
            +------1------+
            |             |
        +   |  +      +   |  +
        |   |  |      |   |  |
        +---+--+      +---+--+
        |      |      |      |
        +      +      +      +

The distance from (1) to any leaf of the H-tree is equally long. Therefore,
registers placed in each others' neighbourhood will experience similar skew
with respect to (1).

An alternate is a grid-like or pane-like structure, which aims to minimize
the overall delay at the expense of increased power

                   1
         D  D  D  D  D  D  D  D
         |  |  |  |  |  |  |  |
       D +--+--+--+--+--+--+--+ D
         |  |  |  |  |  |  |  |
       D +--+--+--+--+--+--+--+ D
         |  |  |  |  |  |  |  |
     1 D +--+--+--+--+--+--+--+ D 1
         |  |  |  |  |  |  |  |
       D +--+--+--+--+--+--+--+ D
         |  |  |  |  |  |  |  |
       D +--+--+--+--+--+--+--+ D
         |  |  |  |  |  |  |  |
         D  D  D  D  D  D  D  D
                   1

5:25 CLOCK DOMAINS

Maintaining a single-clock strategy is not always useful.
Even though we advocated single-clock throughout the course
when we were doing RTL design, for several applications the
use of multiple clocks is useful:

   - passing data between systems that run at different clock frequency
   - adjusting the clock frequency of each part of a system to the
     strict minimum (for low power)
   - gating clocks (turning them off) when subsystems are not used

When we say a 'clock domain', we refer to a region of a circuit
where all synchronous elements are attached to the same net (wire).

If we are using multiple clock inputs, and each clock input drives
a different net, then we get a circuit with 2 clock domains.

Naturally, in a system with N clock domains, there are N.(N-1) transitions.

First, let's consider what can do wrong when we go from one
clock domain to the other.

        register1 --> logic --> register2
          clk1                    clk2

                 +------------+           +-----------+
                 |            |           |           |
   clk1  --------+            +-----------+           +-------
                 |
                 |---|   TT
                     |
                     +-------+      +-------+      
                     |       |      |       |
   clk2  ------------+       +------+       +-------


        TT > tcq + tlogic + tsetup

If two clocks are unrelated, it's quite possible that you get
a setup time violation.

If that happens, then register2 can experience metastability.

5:35 METASTABILITY

A metastable flop is a flop in an invalid logic state - neither 1 nor 0

                      +----------------------------
                      |
     Din -------------+          data changes within

                   |/////////|   setup/hold region around clock edge

                        +-------------------
                        |
     Clk ---------------+

                                 may result in metastability

                                   +-------------
                         ???????????              metastable region
     Dout ---------------+

Electrically, a metastable flop is 'floating' between ground and vdd
Eventually, the flop may converge to a stable 1 or a stable 0.
HOWEVER:
  1/ the time needed to recover from metastability is unkown
  2/ the eventual value (0 or 1) is unknown

In fact, metastable flops are sometimes used as a source of randomness
in random number generators!

Metastability is an adverse operating condition for digital circuits,
and should be avoided at all costs.

The classic solution to the problem of metastability is to use
a synchronizer circuit:

                                  (1)             (2)
     async_signal ---> register1 ----> register2 --> 
                          clk             clk


  - async_signal is asynchronous and may cause metastability on register1
    from time to time. However, register2 always has a valid bit.
  - As long as the metastability resolves within a clock cycle,
    then the above circuit will convert the async_signal into a sync_signal
    within a latency of two cycles.

Sometimes additional registers are used (to cope with the rare case that
a metastable event extends over more than a clock cycle):
 
                                  (1)             (2)
     async_signal ---> register1 ----> register2 --> register3 -->
                          clk             clk           clk

5:45 CLOCK DOMAIN CROSSING

When we design circuits with multiple clock regions, we need to use
techniques that help us cross clock domains.


Solution 1: synchronizers on every clock transition

     register1 ------> register2 ----> register3 ---> logic_region_2
        clk               clk2            clk2

In this case, we minimize the chance of injecting metastable values
into the logic of region2, by inserting a synchronizer circuit
before every clock domain transition.

This is, however, a rather rough approach, since it is not clear
when exactly the value will transition from region1 to region2;
there's uncertainty of 1 clock period.

Solution 2: phase control of clocks

When we have several related clocks (eg a 1X and a 2X clock), it may
be possible to minimize the change of setup/hold violation
by carefully controlling the relative phase of the clocks


                register1 -----> logic -----> register2
                   clk1                          clk2


                   clk1 --> DLL circuit --> clk2

                 +------------+           +-----------+
                 |            |           |           |
   clk1  --------+            +-----------+           +-------

                 +-----+      +----+      
                 |     |      |    |
   clk2  --------+     +------+    +-------


Since the phase of clk2 is precisely controlled with respect to clk1,
a/ the hold time of register2 can be met (logic changes after upedge clk1)
b/ the setup time of register2 can be met (logic stable before upedge clk2)

However, this is an expensive solution

Solution 3: using handshake circuits

Remember the two-way handshake logic:

                  +--------------------+
                  |                    |
    Req  ---------+                    +--------------------

                       +-----------------------+ 
                       |                       |
    Ack  --------------+                       +------------



        Slow clock domain            Fast clock domain    

                       | ----> data ---> |
            SLOW FSM   | ----> req ----> |  FAST FSM
                       | <---- ack ----> |

            1/ write data
            2/ req = 1                
                                        3/ read data
                                           ack = 1
            4/ req = 0                   
                                        5/ ack = 0       

We can build the transition from slow domain to fast domain using
synchronizer circuits:

For req, data:

  req/data ----> register1 ------> register2 ----> register3 --->
                  slowclk            fastclk        fastclk

For ack:

   ack --------> register1 ------> register2 ----> register3 -+->
                  fastclk           slowclk         slowclk   |
                                                              |
     chkack <---- register4 <---- register5 ------------------+

Challenge? the fast clock domain may remove ack to quickly, i.e.
           before the slow clock domain has had a chance to read it.

Solution? The chkack signal helps the Fast clock domain to detect when 
          the slow clock domain has received the ack signal

6:05 CLOCK GATING

Clock gating creates a clock region where the clock can be effectively
reduced to 0.

This is common in ASIC design, and complex SoC. It is not common in FPGA,
because FPGA chips have their own optimized clock nets, separate from
the 'compute' nets from FPGA.

Basic idea of clock gating:

    basic clock ----+
                    +---|  
    clkenable1 ---------|AND ---> clk1
                    |
                    +---|  
    clkenable2 ---------|AND ---> clk2
                    |
                    +---|  
    clkenable3 ---------|AND ---> clk3

Each of these clocks should be treated as a different clock region.

In an FPGA, the above circuit is not a good idea.
It is better to use flops with clock-enable inputs

module d_ff_en_1seg
   (
    input wire clk, reset,
    input wire en,
    input wire d,
    output reg q
   );

   // body
   always @(posedge clk, posedge reset)
      if (reset)
         q <= 1'b0;
      else if (en)
         q <= d;

endmodule

6:10 CONCLUSIONS

- CLOCK SKEW IN SYNCRHONOUS CIRCUITS
- CLOCK DOMAINS
- METASTABILITY
- CLOCK DOMAIN CROSSING
    Synchronizer
    Phase Control
    Handshake protocol
- CLOCK GATING