Lecture 08 - 02/14/19 - Verilog Expressions (with Application to
CORDIC)

5:00 Outline

Today we start discussing the use of Verilog operations to describe
datapaths. The following notes hit on multiple topics, and we will
cover them over the course of two lectures.

    1. Verilog operators 
    2. Signed computations 
    3. Precision
    4. Synthesis of expressions 
    5. Fixed point computations 
    6. Verilog lookup tables 
    7. CORDIC 
    8. Direct Digital Synthesis (DDS)

1. VERILOG Operators
=================================================================

The definition of the operators is given in the Verilog Standard: 
IEEE 1364

When synthesizing datapath, we can use most Verilog operators as
synthesizable targets

    Verilog operator Hardware

    Concatenation wiring

    Unary +, -              adder/subtractor (with constant 0 input)

    + -                     adder/subtractor

    *                       multiplier

    /                       divider (special module, expensive!)

    **                      exponentiator (only for constant base or exponent)

    %                       modulus module (special module)

    > < >= <=               comparator

    !                       logical negation (1-bit result) 
                            Logical operations apply to all
                            bits e.g.!a will be 1 when a
                            contains all 0

    && ||                   logical and or || 
                            (1-bit result) a && b

    == !=                   equality (comparator) 
                            (1-bit result) 
                            bit-by-bit matching

    === !==                 case equality (comparator) 
                            (1-bit result) bit-by-bit
                            matching

    ~                       bitwise negation (n-bit result)

    & | ^                   bitwise and or exor (n-bit result)

    ^~ or ~^                bitwise equivalence (n-bit result)

    & ~& | ~| ^ ~^          reduction and, nand, or, nor, xor, xnor 
                            written as, e.g. ~&a

    << >>                   logical shift right/left

    <<< >>>                 arithmetic shift right/left (provides sign extension)

    ? :                     multiplexer

(*) Treatment of operators with X and Z:

All of the above operations apply to 0, 1, X and Z. 
These operations can thus also produce X and Z.

Example: bitwise and

            0   1   x   z 
            ------------- 
          0 0   0   0   0 
          1 0   1   x   x 
          x 0   x   x   x 
          z 0   x   x   x

Keep in mind that in actual hardware, X does not exist.  The actual
implementation will be 0 or 1, but the simulation cannot tell.

This creates (sometimes) difficult simulation problems, and reveals
the limitation of simulation with 0, 1, X, and Z

    reg a; reg b; reg c;

    always @(*) 
        begin 
        a = 1'bx; 
        b = ~a; 
        c = a & b; 
        end

What is the value of c?

In any implementation, c would be 0 because it ANDs complementary
values. But the simulator will show X.

2. SIGNED computations
=================================================================

In the Verilog language, reg and wire are treated as unsigned.

    wire [1:0]  a;

    a           value
    00            0
    01            1
    10            2
    11            3

If we want them to work as SIGNED, we need to declare them as signed:

    wire signed [1:0] a;

    a           value
    00             0
    01             1
    10            -2
    11            -1

We can also write $signed(a) or $unsigned(a) to force a sign/unsign upon a.

    wire [1:0]  a;

    $signed(a)  value
    00             0
    01             1
    10            -2
    11            -1

Why do we care?

For many operators, the gate-level realization of an unsigned and a signed version is identical

    Example: a full adder behaves identically to signed and unsigned values

    wire [1:0]  a, b, c;

    a        b        a + b    unsigned           signed
    00       01       01       0 + 1 = 1           0 + 1 =  1
    10       01       11       2 + 1 = 3          -2 + 1 = -1

But for some operators, the gate-level realization of an unsigned and a signed version is NOT the same!

    Example: comparison

    wire [1:0]  a, b;

    a        b         $signed(a) < $signed(b)        a < b (unsigned)
    00       10        false (0)                      true (1)
    11       10        true (1)                       false (0)

    Example: arithmetic shift

    wire [2:0] a;

    a                 $signed(a) >> 1         a >> 1
    101               110                     010


3. PRECISION
=================================================================

    Verilog will evaluate expressions with a precision determined by
    the operands in the expression.

    See Table 5-22 in Verilog manual.

    Rule of thumb: Make sure that the result always fits in the source operand type.

        E.g. if you add a 8 bit and a 8-bit number, but you want a 9-bit result,
        then make one of the operands 9 bit:

                wire [7:0] a,b;
                wire [8:0] c;
                assign c = {0, a} + b;

        E.g., if you multiply a 16 bit and a 16-bit number, and you want a 32-bit
        result, then

    E.g., if a and b are 16 bits, then a + b will be computed using 16-bit precision in Verilog. If you want to capture the carry bit, you have to write:

        wire [15:0] a, b;
        wire [16:0] c;

        assign c = {0,a} + b;

4. SYNTHESIS OF EXPRESSIONS
=================================================================

The best method to learn about what happens from an expression is
to write a small Verilog example. Refer to the repository 
https://github.com/vt-ece4514-s19/expressions

(printout handed out in class)
- View RTL
- View Netlist

How do you find what expression is shown in the graph?
- Select a multiplexer and from the menu select 'View- Properties'
- Then check what module is driving input i from the multiplexer

- Notice the following in the synthesis result:
   * special modules for operators +, -, *, /, %
   * multiplexer and multiplexer tree for if-then-else
   * for-loops are unrolled
   * sharing of common subexpressions, e.g. for counter

                // addition
        4'd0: cnext = a + b;
        
                // unsigned comparison
        4'd1: cnext = (a === b);

                // signed comparison
        4'd2: cnext = ($signed(a) > $signed(b));

            // integer division (not smart)
        4'd3: cnext = a / b;
        
            // data multiplication        
        4'd4: cnext = a * b;
        
                // constant multiplication
        4'd5: cnext = a * 5'd11;

               // constant shift
        4'd6: cnext = a << 8'd2;
        
              // data shift
        4'd7: cnext = a << b[2:0];

              // priority mux
        4'd8: if (a == 8'd0) cnext = b;
              else if (a == 8'd1) cnext = ~b;
              else if (a == 8'd2) cnext = |b;
              else cnext = ^b;

               // multiplexer
        4'd9: cnext = (a < b) ? 8'd5 : 8'd23;
        
               // for loop illustrating shift register
        4'd10: for (n = 7; n>0; n=n-1)
                   cnext[n] = c[n-1];
        
               // two parallel adders
        4'd11: cnext = {a[7:4] + b[7:4], a[3:0] + b[3:0]};
        
               // updown counter, two adders
        4'd12: cnext = a[0] ? c + 8'd1 : c - 8'd1;
                      
               // updown counter, one adders
        4'd13: cnext = c + (a[0] ? 8'd1 : - 8'd1);
        
               // constant modulo
        4'd14: cnext = a % 8'd139;

              // arithmetic shift w sign extension
        4'd15: cnext = $signed(a) >>> 3'd5;

5. FIXED POINT computations
=================================================================

A fixed-point number <W, L> is a bit vector of W bits, from which L are
fractional bits. Signed integers are a subset of fixed-point numbers. A
signed integer can be thought of as a <32,0> number.

            Example: <10,4> data type

                                   W
            <--------------------------------------->
                                              L
                                     <-------------->
            +---+---+---+---+---+---+---+---+---+---+
            |   |   |   |   |   |   |   |   |   |   |
            +---+---+---+---+---+---+---+---+---+---+
                                    | 
                                2^0 | 2^-1
                             2^1    |    2^-2
                        2^2         |        2^-3
                    2^3             |            2^-4
                2^4                 |
            2^5                     |
                                binary point


Fixed-point numbers <W,L> of length W smaller than or equal to 32 bits can
be implemented as integer data types. 

(a) Fractional Constant -> <W, L>

To map a fractional constant v into a <W,L> fixed point number implemented
as an integer k, we compute:

        k = (v << L);

For example, in a <32,4> data type, the constant 0.25 would be represented
as the integer

        k = (0.25 << 4) = 4

(b) Value of a fixed-point number <W, L>

Similarly, we can compute the value of a <W,L> fixed point number v
represented in an integer k as follows:

        v = (k / (1 << L));

For example, in a <32,8> data type, the value integer 15 represents the value

        v = 15 / 256 = 0.05859375
        

Operations using fixed-point numbers

The advantage of fixed-point numbers is that the operations on
fixed-point numbers are integer operations, and hence they are efficient
to implement. Fixed-point numbers do not use normalization logic or
exponent-computation logic such as is found in floating-point hardware.

However, fixed-point operations have to handle the binary point at
position L, mainly if an operation would affect it.

The rules are as follows.

1.  The addition of two fixed-point numbers yields a similar 
    fixed point number

            <W,L> = <W,L> + <W,L>

2.  The multiplication of two fixed-point numbers creates a 
    fixed-point number of twice the size

            <2W, 2L> = <W,L> * <W,L>

3.  The convert a fixed-point number <W,L1> into <W,L2>, 
    where L1 > L2, use a shift operation:

            <W,L2> = (<W,L1>) >> (L1 – L2)

4.  To apply rule 3 for multiplication, we can produce a <W,L> 
    result from <W,L> operands as follows:

            <W,L> = ( <W,L> * <W,L> ) >> L

Note that this formula assumes that no overflow will occur, or in other
words, that there are no significant bits in the result between the position
W+1 and 2W. This behavior corresponds to what we are used to for integer
arithmetic.

Example

Let’s compute 0.25 * 0.35 + 1.15 using <32,8> arithmetic.

#define FRACBITS 8

int a = (int) (0.25 * (1 << FRACBITS));
int b = (int) (0.35 * (1 << FRACBITS));
int c = (int) (1.15 * (1 << FRACBITS));
int tmp1, tmp2;

tmp1 = (a * b) >> FRACBITS;
tmp2 = tmp1 + c;

This code produces the following integer values.

a       64
b       89
c       294
tmp1    22
tmp2    316

The resulting value, tmp2, corresponds to the value 316/(1 << FRACBITS) = 1.234375.

Note that the floating-point result of the expression (0.25 * 0.35 + 1.15)
is 1.2375. There is some precision loss using fixed-point computation because we do not keep all fractional bits required to store the result 
of 0.25 * 0.35,

This precision loss is called the quantization error. In fixed-point
computations, we try to minimize the quantization error. At the same time,
we try to avoid overflow. For example, the fixed-point data type <W, W-1>
cannot represent values bigger than 0.99... or smaller than -1.

6. VERILOG lookup table
=================================================================

Example of a combinational lookup table:

module cordicconst(input wire         clk,
                   input wire           reset,
                   input wire [3:0]   address,
                   output wire [17:0] data);

   reg [17:0]                   q;
   
   always @(*)
     begin
    q = 18'h0;
    case (address)
      4'h0: q = 18'h6487;
      4'h1: q = 18'h3b58;
      4'h2: q = 18'h1f5b;
      4'h3: q = 18'hfea;
      4'h4: q = 18'h7fd;
      4'h5: q = 18'h3ff;
      4'h6: q = 18'h1ff;
      4'h7: q = 18'hff;
      4'h8: q = 18'h7f;
      4'h9: q = 18'h3f;
      4'ha: q = 18'h1f;
      4'hb: q = 18'hf;
      4'hc: q = 18'h7;
      4'hd: q = 18'h3;
      4'he: q = 18'h1;
      4'hf: q = 18'h0;
      default: q = 18'h0;
    endcase
     end // always @ (*)

   assign data = q;
   
endmodule

7. CORDIC = Coordinate Rotation Digital Computer
=================================================================

CORDIC is an algorithm to compute trigonometric functions using low-complexity hardware. Similar to a bit-serial addition, it computes one bit of the result at a time.

In a nutshell, given an angle theta, the CORDIC algorithm evaluates sin(theta) and cosine(theta) using successive approximations.

The algorithm was first used in airborne digital computation in the '50s, but it has
since found widespread use in robotics and communications.

Assume an angle alpha, 0 < alpha < pi/2

Then the CORDIC algorithm will rotate the vector (1, 0) to (x1, y1) such as x1 = cos(alpha) and y1 = sin(alpha).

More general, the CORDIC algorithm will rotate (x0,y0) to (x1, y1) over an angle alpha:


        x1 = x0.cos(alpha) - y0.sin(alpha)
        y1 = x0.sin(alpha) + y0.cos(alpha)

(*) The CORDIC idea

The idea of CORDIC is to decompose the rotation by alpha in a series of rotations
by alpha0, alpha1, alpha2, ... such that the sum of all these alpha's is close
to the target alpha.

To get the iterative formulas, divide them by cos(alpha) to obtain:

        x1 = (x0 - y0.tan(alpha)) . (1/cos(alpha))
        y1 = (x0.tan(alpha) + y0) . (1/cos(alpha))

The term 1/cos(alpha) can further be written as

    1/cos(alpha) = sqrt(1 + tan(alpha)*tan(alpha)) = k

Now, imagine that we choose alpha0, alpha1, .. such that

        alpha0 = arctan(1)
        alpha1 = arctan(1/2)
        alpha2 = arctan(1/4)
        ..

And assume that we can write alpha as:

    alpha = alpha0 +/- alpha1 +/- alpha2 +/- alpha3 +/- ...

where the signs +/- are chosen such that we approximate, step by step,
the resulting value alpha

    (x1, y1) = rotate(x0, y0, alpha0)
    (x2, y2) = rotate(x1, y1, alpha1)
    ...

because of our choice of alphai, this becomes:

    x1 = (x0 - y0).k0
    y1 = (x0 + y0).k0

    x2 = (x1 - (+/- 1/2) y1).k1
    y2 = ((+/- 1/2) x1 + y1).k1

    x3 = (x2 - (+/- 1/4) y2).k2
    y3 = ((+/- 1/4) x2 + y2).k2

    ...

Each time we choose the sign to get closer to the target alpha.
Also, the k0, k1, k2 can be precomputed as a single factor.

    K = k0.k1.k2. .. = product(sqrt(1 + tan(alphai)*tan(alphai)))

The main observation is that the algorithmic kernel is extremely simple:

    x(i+1) = x(i) +/- (y(i) >> i)
    y(i+1) = +/- (x(i) >> i) + y(i)

We collect all the k'1 together and initialize the
algorithm as (k.x0, k.y0)

How to control the sign used in the iterations?

To decide on the +/- sign, we use an angle 'accumulator'
that decides, for each iteration of the algorithm, in what
direction to rotate:

    accumulator = 0

    if (target > accumulator)
        accumulator = accumulator + arctan(2^-i)
    else
        accumulator = accumulator + arctan(2^-i)

The factors arctan(2^i) are fixed and are stored in a lookup table.

(*) Computing sinus and cosine

To compute sine and cosine, we initialize the algorithm as:

    x0 = k
    y0 = 0

And then do n iterations. With each iteration, the precision
increases with a factor arctan(2^-i). Roughly speaking, you get
one bit per iteration.

Algorithm in C

fixed cordicsine(fixed inangle) {
  fixed X, Y, TargetAngle, CurrAngle;
  unsigned Step;

  X=FIXED(AG_CONST);  /* AG_CONST * cos(0) */
  Y=0;                /* AG_CONST * sin(0) */

  TargetAngle = angleadj(inangle);
  CurrAngle=0;
  for(Step=0; Step < 16; Step++) {
    fixed NewX;
    if (TargetAngle > CurrAngle) {
      NewX       =  X - (Y >> Step);
      Y          = (X >> Step) + Y;
      X          = NewX;
      CurrAngle += Angles[Step];
    } else {
      NewX       = X + (Y >> Step);
      Y          = -(X >> Step) + Y;
      X          = NewX;
      CurrAngle -= Angles[Step];
    }
  }

  if (quadrant(inangle) < 2)
    return  Y;
  else
    return -Y;
}

(*) How to deal with arbitrary angles: 0->2PI?

The CORDIC algorithm is defined in the first quadrant.

    if (0 < alpha <= PI/2)
        then sine = cordic(alpha)

The other quadrants are obtained by symmetry arguments:

    if (PI/2 < beta <= PI)
        then beta' = PI - beta is in the first quadrant
        and  sin(beta') == sin(beta)
        therefore sine = cordic(PI - beta)

    if (PI < beta <= 3PI/2)
        then beta' = beta - PI is in the first quadrant
        and  sin(beta') == -sin(beta)
        therefore sine = -cordic(beta - PI)

    if (3PI/2 < beta <= 2PI)
        then beta' = 2PI - beta is in the first quadrant
        and  sin(beta') == -sin(beta)
        therefore sine = -cordic(2PI - beta)

So we can capture this in two C functions:

fixed quadrant(fixed inangle) 
  // input: inangle in the range 0 .. 2PI
  // output: 0-3 for quadrant 0-3

and

fixed angleadj(fixed inangle) 
  // input angle is 0 .. 3pi
  // output angle in first quadrant such that
  // abs(sin(inangle)) = sin(outangle)
  // if inangle >2pi, subtract 2pi.
  // This brings inangle in the range 0 - pi and keeps the same quardrant

8. DIRECT DIGITAL SYNTHESIS
=================================================================

The idea of direct digital synthesis (DDS) is to compute waveforms at real time to generate them.

DDS contains a phase accumulator, a phase-to-output converter, and a DAC.

        phase accumulator  --> phase-to-output-converter --> DAC (analog waveform)

To generate a sine-wave, we will create a CORDIC-based DDS.

Parameters:
  - 16 bit resolution for DAC samples  FIX<16,14>
  - 16 bit resolution for phase        FIX<18,15>
  - CORDIC with 16-bit resolution      FIX<18,15>

C Reference Implementation

#include <stdio.h>
#include <math.h>

#define AG_CONST        1/1.6467602578655
#define FIXED(X)        ((long int)((X) * 32768.0))
#define FLOAT(X)        ((X) / 32768.0)
#define PI2             FIXED(atan(1.0) * 2.0)
  
typedef long int        fixed; /* <18,15> fixed-point */

static const fixed Angles[]={
                 FIXED(0.7853981633974483L),
                 FIXED(0.4636476090008061L),
                 FIXED(0.2449786631268641L),
                 FIXED(0.1243549945467614L),
                 FIXED(0.0624188099959574L),
                 FIXED(0.0312398334302683L),
                 FIXED(0.0156237286204768L),
                 FIXED(0.0078123410601011L),
                 FIXED(0.0039062301319670L),
                 FIXED(0.0019531225164788L),
                 FIXED(0.0009765621895593L),
                 FIXED(0.0004882812111949L),
                 FIXED(0.0002441406201494L),
                 FIXED(0.0001220703118937L),
                 FIXED(0.0000610351561742L),
                 FIXED(0.0000305175781155L),
                 FIXED(0.0000152587890613L),
                 FIXED(0.0000076293945311L),
                 FIXED(0.0000038146972656L)
};

//-----------------------------------------------------
void showtable() {
  unsigned Step;

  printf("Angles Table\n");
  for (Step = 0; Step < 15; Step++)
    printf("16'h%x\n", Angles[Step]);
}

//-----------------------------------------------------
fixed quadrant(fixed inangle) {
  // output: 0-3 for quadrant 0-3
  // we assume step-angle is smaller than pi
  // so that there are always at least two samples per period (alias-free)
  // hence the max input angle can be (2pi + pi)
  unsigned q = 0;

  // if inangle >2pi, subtract 2pi.
  // This brings inangle in the range 0 - pi and keeps the same quardrant
  if (inangle > 4*PI2) 
    inangle = inangle - 4*PI2;

  if (inangle > 3*PI2)
    return 3;
  else if (inangle > 2*PI2)
    return 2;
  else if (inangle > PI2)
    return 1;

  return 0;
}

//-----------------------------------------------------
fixed angleadj(fixed inangle) {

  // input angle is 0 .. 3pi
  // output angle in first quadrant such that
  // abs(sin(inangle)) = sin(outangle)

  // if inangle >2pi, subtract 2pi.
  // This brings inangle in the range 0 - pi and keeps the same quardrant
  if (inangle > 4*PI2) 
    inangle = inangle - 4*PI2;

  if (inangle > 3*PI2)
    return (4*PI2 - inangle);
  else if (inangle > 2*PI2)
    return (inangle - 2*PI2);
  else if (inangle > PI2)
    return (2*PI2 - inangle);

  return inangle;

}

//-----------------------------------------------------
fixed accumulator(fixed inangle, fixed inangleadd) {

  inangle = inangle + inangleadd;

  if (inangle > 4*PI2)
    inangle = inangle - 4*PI2;

  return inangle;

}

//-----------------------------------------------------
fixed cordicsine(fixed inangle) {

  fixed X, Y, TargetAngle, CurrAngle;
  unsigned Step;

  X=FIXED(AG_CONST);  /* AG_CONST * cos(0) */
  Y=0;                /* AG_CONST * sin(0) */
  
  TargetAngle = angleadj(inangle);
  CurrAngle=0;
  for(Step=0; Step < 16; Step++) {
    fixed NewX;
    if (TargetAngle > CurrAngle) {
      NewX       =  X - (Y >> Step);
      Y          = (X >> Step) + Y;
      X          = NewX;
      CurrAngle += Angles[Step];
    } else {
      NewX       = X + (Y >> Step);
      Y          = -(X >> Step) + Y;
      X          = NewX;
      CurrAngle -= Angles[Step];
    }
  }

  if (quadrant(inangle) < 2) 
    return  Y;
  else
    return -Y;

}

//-----------------------------------------------------
int main(void) {

  fixed angle, angleadd;
  fixed sine;
  unsigned i;

  double fsine;

  printf("2pi   %8x\n", 4*PI2);
  printf("3pi/2 %8x\n", 3*PI2);
  printf(" pi   %8x\n", 2*PI2);
  printf(" pi/2 %8x\n",   PI2);
  
  angleadd = PI2/11;

  printf(" inc  %8x\n",   angleadd);
  
  angle = angleadd;
  for (i=0; i<100; i++) {  
    sine = cordicsine(angle);

    if (0)
      printf("a %5x s %10x ( sin(%8.5f) = %8.5f ) sin %8.5f err %18.15f\n",
         angle,
         sine,
         FLOAT(angle),
         FLOAT(sine),
         sin(FLOAT(angle)),
         FLOAT(sine) - sin(FLOAT(angle)));
    else
      printf("%x\n", sine);
    
    angle = accumulator(angle, angleadd);    
  }

}