Lecture 09 - 02/14/19 - The CORDIC Algorithm
CORDIC)

5:00 Outline

Continuing on from last week, we will now discuss the design and
implementation of the CORDIC algorithm, and its use in Direct Digital
Synthesis.

(last week -
    1. Verilog operators 
    2. Signed computations 
    3. Precision
    4. Synthesis of expressions 
)    
    5. Fixed point computations 
    6. Verilog lookup tables 
    7. CORDIC 
    8. Direct Digital Synthesis (DDS)

5. FIXED POINT computations
=================================================================

A fixed-point number <W, L> is a bit vector of W bits, from which L are
fractional bits. Signed integers are a subset of fixed-point numbers. A
signed integer can be thought of as a <32,0> number.

            Example: <10,4> data type

                                   W
            <--------------------------------------->
                                              L
                                     <-------------->
            +---+---+---+---+---+---+---+---+---+---+
            |   |   |   |   |   |   |   |   |   |   |
            +---+---+---+---+---+---+---+---+---+---+
                                    | 
                                2^0 | 2^-1
                             2^1    |    2^-2
                        2^2         |        2^-3
                    2^3             |            2^-4
                2^4                 |
            2^5                     |
                                binary point


Fixed-point numbers <W,L> of length W smaller than or equal to 32 bits can
be implemented as integer data types. 

(a) Fractional Constant -> <W, L>

To map a fractional constant v into a <W,L> fixed point number implemented
as an integer k, we compute:

        k = (v << L);

For example, in a <32,4> data type, the constant 0.25 would be represented
as the integer

        k = (0.25 << 4) = 4

(b) Value of a fixed-point number <W, L>

Similarly, we can compute the value of a <W,L> fixed point number v
represented in an integer k as follows:

        v = (k / (1 << L));

For example, in a <32,8> data type, the value integer 15 represents the value

        v = 15 / 256 = 0.05859375
        

Operations using fixed-point numbers

The advantage of fixed-point numbers is that the operations on
fixed-point numbers are integer operations, and hence they are efficient
to implement. Fixed-point numbers do not use normalization logic or
exponent-computation logic such as is found in floating-point hardware.

However, fixed-point operations have to handle the binary point at
position L, mainly if an operation would affect it.

The rules are as follows.

1.  The addition of two fixed-point numbers yields a similar 
    fixed point number

            <W,L> = <W,L> + <W,L>

2.  The multiplication of two fixed-point numbers creates a 
    fixed-point number of twice the size

            <2W, 2L> = <W,L> * <W,L>

3.  The convert a fixed-point number <W,L1> into <W,L2>, 
    where L1 > L2, use a shift operation:

            <W,L2> = (<W,L1>) >> (L1 – L2)

4.  To apply rule 3 for multiplication, we can produce a <W,L> 
    result from <W,L> operands as follows:

            <W,L> = ( <W,L> * <W,L> ) >> L

Note that this formula assumes that no overflow will occur, or in other
words, that there are no significant bits in the result between the position
W+1 and 2W. This behavior corresponds to what we are used to for integer
arithmetic.

Example

Let’s compute 0.25 * 0.35 + 1.15 using <32,8> arithmetic.

#define FRACBITS 8

int a = (int) (0.25 * (1 << FRACBITS));
int b = (int) (0.35 * (1 << FRACBITS));
int c = (int) (1.15 * (1 << FRACBITS));
int tmp1, tmp2;

tmp1 = (a * b) >> FRACBITS;
tmp2 = tmp1 + c;

This code produces the following integer values.

a       64
b       89
c       294
tmp1    22
tmp2    316

The resulting value, tmp2, corresponds to the value 316/(1 << FRACBITS) = 1.234375.

Note that the floating-point result of the expression (0.25 * 0.35 + 1.15)
is 1.2375. There is some precision loss using fixed-point computation because we do not keep all fractional bits required to store the result 
of 0.25 * 0.35,

This precision loss is called the quantization error. In fixed-point
computations, we try to minimize the quantization error. At the same time,
we try to avoid overflow. For example, the fixed-point data type <W, W-1>
cannot represent values bigger than 0.99... or smaller than -1.

<Handout - quadratic expression>


6. VERILOG lookup table
=================================================================

Example of a combinational lookup table:

module cordicconst(input wire         clk,
                   input wire           reset,
                   input wire [3:0]   address,
                   output wire [17:0] data);

   reg [17:0]                   q;
   
   always @(*)
     begin
    q = 18'h0;
    case (address)
      4'h0: q = 18'h6487;
      4'h1: q = 18'h3b58;
      4'h2: q = 18'h1f5b;
      4'h3: q = 18'hfea;
      4'h4: q = 18'h7fd;
      4'h5: q = 18'h3ff;
      4'h6: q = 18'h1ff;
      4'h7: q = 18'hff;
      4'h8: q = 18'h7f;
      4'h9: q = 18'h3f;
      4'ha: q = 18'h1f;
      4'hb: q = 18'hf;
      4'hc: q = 18'h7;
      4'hd: q = 18'h3;
      4'he: q = 18'h1;
      4'hf: q = 18'h0;
      default: q = 18'h0;
    endcase
     end // always @ (*)

   assign data = q;
   
endmodule

7. CORDIC = Coordinate Rotation Digital Computer
=================================================================

CORDIC is an algorithm to compute trigonometric functions using low-complexity hardware. Similar to a bit-serial addition, it computes one bit of the result at a time.

In a nutshell, given an angle theta, the CORDIC algorithm evaluates sin(theta) and cosine(theta) using successive approximations.

The algorithm was first used in airborne digital computation in the '50s, but it has
since found widespread use in robotics and communications.

Assume an angle alpha, 0 < alpha < pi/2

Then the CORDIC algorithm will rotate the vector (1, 0) to (x1, y1) such as x1 = cos(alpha) and y1 = sin(alpha).

More general, the CORDIC algorithm will rotate (x0,y0) to (x1, y1) over an angle alpha:


        x1 = x0.cos(alpha) - y0.sin(alpha)
        y1 = x0.sin(alpha) + y0.cos(alpha)

(*) The CORDIC idea

The idea of CORDIC is to decompose the rotation by alpha in a series of rotations
by alpha0, alpha1, alpha2, ... such that the sum of all these alpha's is close
to the target alpha.

To get the iterative formulas, divide them by cos(alpha) to obtain:

        x1 = (x0 - y0.tan(alpha)) . (1/cos(alpha))
        y1 = (x0.tan(alpha) + y0) . (1/cos(alpha))

The term 1/cos(alpha) can further be written as

    1/cos(alpha) = sqrt(1 + tan(alpha)*tan(alpha)) = k

Now, imagine that we choose alpha0, alpha1, .. such that

        alpha0 = arctan(1)
        alpha1 = arctan(1/2)
        alpha2 = arctan(1/4)
        ..

And assume that we can write alpha as:

    alpha = alpha0 +/- alpha1 +/- alpha2 +/- alpha3 +/- ...

where the signs +/- are chosen such that we approximate, step by step,
the resulting value alpha

    (x1, y1) = rotate(x0, y0, alpha0)
    (x2, y2) = rotate(x1, y1, alpha1)
    ...

because of our choice of alphai, this becomes:

    x1 = (x0 - y0).k0
    y1 = (x0 + y0).k0

    x2 = (x1 - (+/- 1/2) y1).k1
    y2 = ((+/- 1/2) x1 + y1).k1

    x3 = (x2 - (+/- 1/4) y2).k2
    y3 = ((+/- 1/4) x2 + y2).k2

    ...

Each time we choose the sign to get closer to the target alpha.
Also, the k0, k1, k2 can be precomputed as a single factor.

    K = k0.k1.k2. .. = product(sqrt(1 + tan(alphai)*tan(alphai)))

The main observation is that the algorithmic kernel is extremely simple:

    x(i+1) = x(i) +/- (y(i) >> i)
    y(i+1) = +/- (x(i) >> i) + y(i)

We collect all the k'1 together and initialize the
algorithm as (k.x0, k.y0)

How to control the sign used in the iterations?

To decide on the +/- sign, we use an angle 'accumulator'
that decides, for each iteration of the algorithm, in what
direction to rotate:

    accumulator = 0

    if (target > accumulator)
        accumulator = accumulator + arctan(2^-i)
    else
        accumulator = accumulator + arctan(2^-i)

The factors arctan(2^i) are fixed and are stored in a lookup table.

(*) Computing sinus and cosine

To compute sine and cosine, we initialize the algorithm as:

    x0 = k
    y0 = 0

And then do n iterations. With each iteration, the precision
increases with a factor arctan(2^-i). Roughly speaking, you get
one bit per iteration.

Algorithm in C

fixed cordicsine(fixed inangle) {
  fixed X, Y, TargetAngle, CurrAngle;
  unsigned Step;

  X=FIXED(AG_CONST);  /* AG_CONST * cos(0) */
  Y=0;                /* AG_CONST * sin(0) */

  TargetAngle = angleadj(inangle);
  CurrAngle=0;
  for(Step=0; Step < 16; Step++) {
    fixed NewX;
    if (TargetAngle > CurrAngle) {
      NewX       =  X - (Y >> Step);
      Y          = (X >> Step) + Y;
      X          = NewX;
      CurrAngle += Angles[Step];
    } else {
      NewX       = X + (Y >> Step);
      Y          = -(X >> Step) + Y;
      X          = NewX;
      CurrAngle -= Angles[Step];
    }
  }

  if (quadrant(inangle) < 2)
    return  Y;
  else
    return -Y;
}

(*) How to deal with arbitrary angles: 0->2PI?

The CORDIC algorithm is defined in the first quadrant.

    if (0 < alpha <= PI/2)
        then sine = cordic(alpha)

The other quadrants are obtained by symmetry arguments:

    if (PI/2 < beta <= PI)
        then beta' = PI - beta is in the first quadrant
        and  sin(beta') == sin(beta)
        therefore sine = cordic(PI - beta)

    if (PI < beta <= 3PI/2)
        then beta' = beta - PI is in the first quadrant
        and  sin(beta') == -sin(beta)
        therefore sine = -cordic(beta - PI)

    if (3PI/2 < beta <= 2PI)
        then beta' = 2PI - beta is in the first quadrant
        and  sin(beta') == -sin(beta)
        therefore sine = -cordic(2PI - beta)

So we can capture this in two C functions:

fixed quadrant(fixed inangle) 
  // input: inangle in the range 0 .. 2PI
  // output: 0-3 for quadrant 0-3

and

fixed angleadj(fixed inangle) 
  // input angle is 0 .. 3pi
  // output angle in first quadrant such that
  // abs(sin(inangle)) = sin(outangle)
  // if inangle >2pi, subtract 2pi.
  // This brings inangle in the range 0 - pi and keeps the same quardrant

8. DIRECT DIGITAL SYNTHESIS
=================================================================

The idea of direct digital synthesis (DDS) is to compute waveforms at real time to generate them.

DDS contains a phase accumulator, a phase-to-output converter, and a DAC.

        phase accumulator  --> phase-to-output-converter --> DAC (analog waveform)

To generate a sine-wave, we will create a CORDIC-based DDS.

Parameters:
  - 16 bit resolution for DAC samples  FIX<16,14>
  - 16 bit resolution for phase        FIX<18,15>
  - CORDIC with 16-bit resolution      FIX<18,15>


C Reference Implementation

#include <stdio.h>
#include <math.h>

#define AG_CONST        1/1.6467602578655
#define FIXED(X)        ((long int)((X) * 32768.0))
#define FLOAT(X)        ((X) / 32768.0)
#define PI2             FIXED(atan(1.0) * 2.0)
  
typedef long int        fixed; /* <18,15> fixed-point */

static const fixed Angles[]={
                 FIXED(0.7853981633974483L),
                 FIXED(0.4636476090008061L),
                 FIXED(0.2449786631268641L),
                 FIXED(0.1243549945467614L),
                 FIXED(0.0624188099959574L),
                 FIXED(0.0312398334302683L),
                 FIXED(0.0156237286204768L),
                 FIXED(0.0078123410601011L),
                 FIXED(0.0039062301319670L),
                 FIXED(0.0019531225164788L),
                 FIXED(0.0009765621895593L),
                 FIXED(0.0004882812111949L),
                 FIXED(0.0002441406201494L),
                 FIXED(0.0001220703118937L),
                 FIXED(0.0000610351561742L),
                 FIXED(0.0000305175781155L),
                 FIXED(0.0000152587890613L),
                 FIXED(0.0000076293945311L),
                 FIXED(0.0000038146972656L)
};

//-----------------------------------------------------
void showtable() {
  unsigned Step;

  printf("Angles Table\n");
  for (Step = 0; Step < 15; Step++)
    printf("16'h%x\n", Angles[Step]);
}

//-----------------------------------------------------
fixed quadrant(fixed inangle) {
  // output: 0-3 for quadrant 0-3
  // we assume step-angle is smaller than pi
  // so that there are always at least two samples per period (alias-free)
  // hence the max input angle can be (2pi + pi)
  unsigned q = 0;

  // if inangle >2pi, subtract 2pi.
  // This brings inangle in the range 0 - pi and keeps the same quardrant
  if (inangle > 4*PI2) 
    inangle = inangle - 4*PI2;

  if (inangle > 3*PI2)
    return 3;
  else if (inangle > 2*PI2)
    return 2;
  else if (inangle > PI2)
    return 1;

  return 0;
}

//-----------------------------------------------------
fixed angleadj(fixed inangle) {

  // input angle is 0 .. 3pi
  // output angle in first quadrant such that
  // abs(sin(inangle)) = sin(outangle)

  // if inangle >2pi, subtract 2pi.
  // This brings inangle in the range 0 - pi and keeps the same quardrant
  if (inangle > 4*PI2) 
    inangle = inangle - 4*PI2;

  if (inangle > 3*PI2)
    return (4*PI2 - inangle);
  else if (inangle > 2*PI2)
    return (inangle - 2*PI2);
  else if (inangle > PI2)
    return (2*PI2 - inangle);

  return inangle;

}

//-----------------------------------------------------
fixed accumulator(fixed inangle, fixed inangleadd) {

  inangle = inangle + inangleadd;

  if (inangle > 4*PI2)
    inangle = inangle - 4*PI2;

  return inangle;

}

//-----------------------------------------------------
fixed cordicsine(fixed inangle) {

  fixed X, Y, TargetAngle, CurrAngle;
  unsigned Step;

  X=FIXED(AG_CONST);  /* AG_CONST * cos(0) */
  Y=0;                /* AG_CONST * sin(0) */
  
  TargetAngle = angleadj(inangle);
  CurrAngle=0;
  for(Step=0; Step < 16; Step++) {
    fixed NewX;
    if (TargetAngle > CurrAngle) {
      NewX       =  X - (Y >> Step);
      Y          = (X >> Step) + Y;
      X          = NewX;
      CurrAngle += Angles[Step];
    } else {
      NewX       = X + (Y >> Step);
      Y          = -(X >> Step) + Y;
      X          = NewX;
      CurrAngle -= Angles[Step];
    }
  }

  if (quadrant(inangle) < 2) 
    return  Y;
  else
    return -Y;

}

//-----------------------------------------------------
int main(void) {

  fixed angle, angleadd;
  fixed sine;
  unsigned i;

  double fsine;

  printf("2pi   %8x\n", 4*PI2);
  printf("3pi/2 %8x\n", 3*PI2);
  printf(" pi   %8x\n", 2*PI2);
  printf(" pi/2 %8x\n",   PI2);
  
  angleadd = PI2/11;

  printf(" inc  %8x\n",   angleadd);
  
  angle = angleadd;
  for (i=0; i<100; i++) {  
    sine = cordicsine(angle);

    if (0)
      printf("a %5x s %10x ( sin(%8.5f) = %8.5f ) sin %8.5f err %18.15f\n",
         angle,
         sine,
         FLOAT(angle),
         FLOAT(sine),
         sin(FLOAT(angle)),
         FLOAT(sine) - sin(FLOAT(angle)));
    else
      printf("%x\n", sine);
    
    angle = accumulator(angle, angleadd);    
  }

}

CONCLUSIONS
==========================================================

In homework 4 you have to design a CORDIC verilog implementation
starting from a reference design in C.

The assignment is simulation-only (Modelsim), but it has to
match the results from the C implementation.

In homework 5, we will implement this module on the DE1-SoC.
Hence, the module created in homework 4 needs to be synthesizable.