Lecture 08 - 02/14/19 - Verilog Expressions (with Application to CORDIC) 5:00 Outline Today we start discussing the use of Verilog operations to describe datapaths. The following notes hit on multiple topics, and we will cover them over the course of two lectures. 1. Verilog operators 2. Signed computations 3. Precision 4. Synthesis of expressions 5. Fixed point computations 6. Verilog lookup tables 7. CORDIC 8. Direct Digital Synthesis (DDS) 1. VERILOG Operators ================================================================= The definition of the operators is given in the Verilog Standard: IEEE 1364 When synthesizing datapath, we can use most Verilog operators as synthesizable targets Verilog operator Hardware Concatenation wiring Unary +, - adder/subtractor (with constant 0 input) + - adder/subtractor * multiplier / divider (special module, expensive!) ** exponentiator (only for constant base or exponent) % modulus module (special module) > < >= <= comparator ! logical negation (1-bit result) Logical operations apply to all bits e.g.!a will be 1 when a contains all 0 && || logical and or || (1-bit result) a && b == != equality (comparator) (1-bit result) bit-by-bit matching === !== case equality (comparator) (1-bit result) bit-by-bit matching ~ bitwise negation (n-bit result) & | ^ bitwise and or exor (n-bit result) ^~ or ~^ bitwise equivalence (n-bit result) & ~& | ~| ^ ~^ reduction and, nand, or, nor, xor, xnor written as, e.g. ~&a << >> logical shift right/left <<< >>> arithmetic shift right/left (provides sign extension) ? : multiplexer (*) Treatment of operators with X and Z: All of the above operations apply to 0, 1, X and Z. These operations can thus also produce X and Z. Example: bitwise and 0 1 x z ------------- 0 0 0 0 0 1 0 1 x x x 0 x x x z 0 x x x Keep in mind that in actual hardware, X does not exist. The actual implementation will be 0 or 1, but the simulation cannot tell. This creates (sometimes) difficult simulation problems, and reveals the limitation of simulation with 0, 1, X, and Z reg a; reg b; reg c; always @(*) begin a = 1'bx; b = ~a; c = a & b; end What is the value of c? In any implementation, c would be 0 because it ANDs complementary values. But the simulator will show X. 2. SIGNED computations ================================================================= In the Verilog language, reg and wire are treated as unsigned. wire [1:0] a; a value 00 0 01 1 10 2 11 3 If we want them to work as SIGNED, we need to declare them as signed: wire signed [1:0] a; a value 00 0 01 1 10 -2 11 -1 We can also write $signed(a) or $unsigned(a) to force a sign/unsign upon a. wire [1:0] a; $signed(a) value 00 0 01 1 10 -2 11 -1 Why do we care? For many operators, the gate-level realization of an unsigned and a signed version is identical Example: a full adder behaves identically to signed and unsigned values wire [1:0] a, b, c; a b a + b unsigned signed 00 01 01 0 + 1 = 1 0 + 1 = 1 10 01 11 2 + 1 = 3 -2 + 1 = -1 But for some operators, the gate-level realization of an unsigned and a signed version is NOT the same! Example: comparison wire [1:0] a, b; a b $signed(a) < $signed(b) a < b (unsigned) 00 10 false (0) true (1) 11 10 true (1) false (0) Example: arithmetic shift wire [2:0] a; a $signed(a) >> 1 a >> 1 101 110 010 3. PRECISION ================================================================= Verilog will evaluate expressions with a precision determined by the operands in the expression. See Table 5-22 in Verilog manual. Rule of thumb: Make sure that the result always fits in the source operand type. E.g. if you add a 8 bit and a 8-bit number, but you want a 9-bit result, then make one of the operands 9 bit: wire [7:0] a,b; wire [8:0] c; assign c = {0, a} + b; E.g., if you multiply a 16 bit and a 16-bit number, and you want a 32-bit result, then E.g., if a and b are 16 bits, then a + b will be computed using 16-bit precision in Verilog. If you want to capture the carry bit, you have to write: wire [15:0] a, b; wire [16:0] c; assign c = {0,a} + b; 4. SYNTHESIS OF EXPRESSIONS ================================================================= The best method to learn about what happens from an expression is to write a small Verilog example. Refer to the repository https://github.com/vt-ece4514-s19/expressions (printout handed out in class) - View RTL - View Netlist How do you find what expression is shown in the graph? - Select a multiplexer and from the menu select 'View- Properties' - Then check what module is driving input i from the multiplexer - Notice the following in the synthesis result: * special modules for operators +, -, *, /, % * multiplexer and multiplexer tree for if-then-else * for-loops are unrolled * sharing of common subexpressions, e.g. for counter // addition 4'd0: cnext = a + b; // unsigned comparison 4'd1: cnext = (a === b); // signed comparison 4'd2: cnext = ($signed(a) > $signed(b)); // integer division (not smart) 4'd3: cnext = a / b; // data multiplication 4'd4: cnext = a * b; // constant multiplication 4'd5: cnext = a * 5'd11; // constant shift 4'd6: cnext = a << 8'd2; // data shift 4'd7: cnext = a << b[2:0]; // priority mux 4'd8: if (a == 8'd0) cnext = b; else if (a == 8'd1) cnext = ~b; else if (a == 8'd2) cnext = |b; else cnext = ^b; // multiplexer 4'd9: cnext = (a < b) ? 8'd5 : 8'd23; // for loop illustrating shift register 4'd10: for (n = 7; n>0; n=n-1) cnext[n] = c[n-1]; // two parallel adders 4'd11: cnext = {a[7:4] + b[7:4], a[3:0] + b[3:0]}; // updown counter, two adders 4'd12: cnext = a[0] ? c + 8'd1 : c - 8'd1; // updown counter, one adders 4'd13: cnext = c + (a[0] ? 8'd1 : - 8'd1); // constant modulo 4'd14: cnext = a % 8'd139; // arithmetic shift w sign extension 4'd15: cnext = $signed(a) >>> 3'd5; 5. FIXED POINT computations ================================================================= A fixed-point number is a bit vector of W bits, from which L are fractional bits. Signed integers are a subset of fixed-point numbers. A signed integer can be thought of as a <32,0> number. Example: <10,4> data type W <---------------------------------------> L <--------------> +---+---+---+---+---+---+---+---+---+---+ | | | | | | | | | | | +---+---+---+---+---+---+---+---+---+---+ | 2^0 | 2^-1 2^1 | 2^-2 2^2 | 2^-3 2^3 | 2^-4 2^4 | 2^5 | binary point Fixed-point numbers of length W smaller than or equal to 32 bits can be implemented as integer data types. (a) Fractional Constant -> To map a fractional constant v into a fixed point number implemented as an integer k, we compute: k = (v << L); For example, in a <32,4> data type, the constant 0.25 would be represented as the integer k = (0.25 << 4) = 4 (b) Value of a fixed-point number Similarly, we can compute the value of a fixed point number v represented in an integer k as follows: v = (k / (1 << L)); For example, in a <32,8> data type, the value integer 15 represents the value v = 15 / 256 = 0.05859375 Operations using fixed-point numbers The advantage of fixed-point numbers is that the operations on fixed-point numbers are integer operations, and hence they are efficient to implement. Fixed-point numbers do not use normalization logic or exponent-computation logic such as is found in floating-point hardware. However, fixed-point operations have to handle the binary point at position L, mainly if an operation would affect it. The rules are as follows. 1. The addition of two fixed-point numbers yields a similar fixed point number = + 2. The multiplication of two fixed-point numbers creates a fixed-point number of twice the size <2W, 2L> = * 3. The convert a fixed-point number into , where L1 > L2, use a shift operation: = () >> (L1 – L2) 4. To apply rule 3 for multiplication, we can produce a result from operands as follows: = ( * ) >> L Note that this formula assumes that no overflow will occur, or in other words, that there are no significant bits in the result between the position W+1 and 2W. This behavior corresponds to what we are used to for integer arithmetic. Example Let’s compute 0.25 * 0.35 + 1.15 using <32,8> arithmetic. #define FRACBITS 8 int a = (int) (0.25 * (1 << FRACBITS)); int b = (int) (0.35 * (1 << FRACBITS)); int c = (int) (1.15 * (1 << FRACBITS)); int tmp1, tmp2; tmp1 = (a * b) >> FRACBITS; tmp2 = tmp1 + c; This code produces the following integer values. a 64 b 89 c 294 tmp1 22 tmp2 316 The resulting value, tmp2, corresponds to the value 316/(1 << FRACBITS) = 1.234375. Note that the floating-point result of the expression (0.25 * 0.35 + 1.15) is 1.2375. There is some precision loss using fixed-point computation because we do not keep all fractional bits required to store the result of 0.25 * 0.35, This precision loss is called the quantization error. In fixed-point computations, we try to minimize the quantization error. At the same time, we try to avoid overflow. For example, the fixed-point data type cannot represent values bigger than 0.99... or smaller than -1. 6. VERILOG lookup table ================================================================= Example of a combinational lookup table: module cordicconst(input wire clk, input wire reset, input wire [3:0] address, output wire [17:0] data); reg [17:0] q; always @(*) begin q = 18'h0; case (address) 4'h0: q = 18'h6487; 4'h1: q = 18'h3b58; 4'h2: q = 18'h1f5b; 4'h3: q = 18'hfea; 4'h4: q = 18'h7fd; 4'h5: q = 18'h3ff; 4'h6: q = 18'h1ff; 4'h7: q = 18'hff; 4'h8: q = 18'h7f; 4'h9: q = 18'h3f; 4'ha: q = 18'h1f; 4'hb: q = 18'hf; 4'hc: q = 18'h7; 4'hd: q = 18'h3; 4'he: q = 18'h1; 4'hf: q = 18'h0; default: q = 18'h0; endcase end // always @ (*) assign data = q; endmodule 7. CORDIC = Coordinate Rotation Digital Computer ================================================================= CORDIC is an algorithm to compute trigonometric functions using low-complexity hardware. Similar to a bit-serial addition, it computes one bit of the result at a time. In a nutshell, given an angle theta, the CORDIC algorithm evaluates sin(theta) and cosine(theta) using successive approximations. The algorithm was first used in airborne digital computation in the '50s, but it has since found widespread use in robotics and communications. Assume an angle alpha, 0 < alpha < pi/2 Then the CORDIC algorithm will rotate the vector (1, 0) to (x1, y1) such as x1 = cos(alpha) and y1 = sin(alpha). More general, the CORDIC algorithm will rotate (x0,y0) to (x1, y1) over an angle alpha: x1 = x0.cos(alpha) - y0.sin(alpha) y1 = x0.sin(alpha) + y0.cos(alpha) (*) The CORDIC idea The idea of CORDIC is to decompose the rotation by alpha in a series of rotations by alpha0, alpha1, alpha2, ... such that the sum of all these alpha's is close to the target alpha. To get the iterative formulas, divide them by cos(alpha) to obtain: x1 = (x0 - y0.tan(alpha)) . (1/cos(alpha)) y1 = (x0.tan(alpha) + y0) . (1/cos(alpha)) The term 1/cos(alpha) can further be written as 1/cos(alpha) = sqrt(1 + tan(alpha)*tan(alpha)) = k Now, imagine that we choose alpha0, alpha1, .. such that alpha0 = arctan(1) alpha1 = arctan(1/2) alpha2 = arctan(1/4) .. And assume that we can write alpha as: alpha = alpha0 +/- alpha1 +/- alpha2 +/- alpha3 +/- ... where the signs +/- are chosen such that we approximate, step by step, the resulting value alpha (x1, y1) = rotate(x0, y0, alpha0) (x2, y2) = rotate(x1, y1, alpha1) ... because of our choice of alphai, this becomes: x1 = (x0 - y0).k0 y1 = (x0 + y0).k0 x2 = (x1 - (+/- 1/2) y1).k1 y2 = ((+/- 1/2) x1 + y1).k1 x3 = (x2 - (+/- 1/4) y2).k2 y3 = ((+/- 1/4) x2 + y2).k2 ... Each time we choose the sign to get closer to the target alpha. Also, the k0, k1, k2 can be precomputed as a single factor. K = k0.k1.k2. .. = product(sqrt(1 + tan(alphai)*tan(alphai))) The main observation is that the algorithmic kernel is extremely simple: x(i+1) = x(i) +/- (y(i) >> i) y(i+1) = +/- (x(i) >> i) + y(i) We collect all the k'1 together and initialize the algorithm as (k.x0, k.y0) How to control the sign used in the iterations? To decide on the +/- sign, we use an angle 'accumulator' that decides, for each iteration of the algorithm, in what direction to rotate: accumulator = 0 if (target > accumulator) accumulator = accumulator + arctan(2^-i) else accumulator = accumulator + arctan(2^-i) The factors arctan(2^i) are fixed and are stored in a lookup table. (*) Computing sinus and cosine To compute sine and cosine, we initialize the algorithm as: x0 = k y0 = 0 And then do n iterations. With each iteration, the precision increases with a factor arctan(2^-i). Roughly speaking, you get one bit per iteration. Algorithm in C fixed cordicsine(fixed inangle) { fixed X, Y, TargetAngle, CurrAngle; unsigned Step; X=FIXED(AG_CONST); /* AG_CONST * cos(0) */ Y=0; /* AG_CONST * sin(0) */ TargetAngle = angleadj(inangle); CurrAngle=0; for(Step=0; Step < 16; Step++) { fixed NewX; if (TargetAngle > CurrAngle) { NewX = X - (Y >> Step); Y = (X >> Step) + Y; X = NewX; CurrAngle += Angles[Step]; } else { NewX = X + (Y >> Step); Y = -(X >> Step) + Y; X = NewX; CurrAngle -= Angles[Step]; } } if (quadrant(inangle) < 2) return Y; else return -Y; } (*) How to deal with arbitrary angles: 0->2PI? The CORDIC algorithm is defined in the first quadrant. if (0 < alpha <= PI/2) then sine = cordic(alpha) The other quadrants are obtained by symmetry arguments: if (PI/2 < beta <= PI) then beta' = PI - beta is in the first quadrant and sin(beta') == sin(beta) therefore sine = cordic(PI - beta) if (PI < beta <= 3PI/2) then beta' = beta - PI is in the first quadrant and sin(beta') == -sin(beta) therefore sine = -cordic(beta - PI) if (3PI/2 < beta <= 2PI) then beta' = 2PI - beta is in the first quadrant and sin(beta') == -sin(beta) therefore sine = -cordic(2PI - beta) So we can capture this in two C functions: fixed quadrant(fixed inangle) // input: inangle in the range 0 .. 2PI // output: 0-3 for quadrant 0-3 and fixed angleadj(fixed inangle) // input angle is 0 .. 3pi // output angle in first quadrant such that // abs(sin(inangle)) = sin(outangle) // if inangle >2pi, subtract 2pi. // This brings inangle in the range 0 - pi and keeps the same quardrant 8. DIRECT DIGITAL SYNTHESIS ================================================================= The idea of direct digital synthesis (DDS) is to compute waveforms at real time to generate them. DDS contains a phase accumulator, a phase-to-output converter, and a DAC. phase accumulator --> phase-to-output-converter --> DAC (analog waveform) To generate a sine-wave, we will create a CORDIC-based DDS. Parameters: - 16 bit resolution for DAC samples FIX<16,14> - 16 bit resolution for phase FIX<18,15> - CORDIC with 16-bit resolution FIX<18,15> C Reference Implementation #include #include #define AG_CONST 1/1.6467602578655 #define FIXED(X) ((long int)((X) * 32768.0)) #define FLOAT(X) ((X) / 32768.0) #define PI2 FIXED(atan(1.0) * 2.0) typedef long int fixed; /* <18,15> fixed-point */ static const fixed Angles[]={ FIXED(0.7853981633974483L), FIXED(0.4636476090008061L), FIXED(0.2449786631268641L), FIXED(0.1243549945467614L), FIXED(0.0624188099959574L), FIXED(0.0312398334302683L), FIXED(0.0156237286204768L), FIXED(0.0078123410601011L), FIXED(0.0039062301319670L), FIXED(0.0019531225164788L), FIXED(0.0009765621895593L), FIXED(0.0004882812111949L), FIXED(0.0002441406201494L), FIXED(0.0001220703118937L), FIXED(0.0000610351561742L), FIXED(0.0000305175781155L), FIXED(0.0000152587890613L), FIXED(0.0000076293945311L), FIXED(0.0000038146972656L) }; //----------------------------------------------------- void showtable() { unsigned Step; printf("Angles Table\n"); for (Step = 0; Step < 15; Step++) printf("16'h%x\n", Angles[Step]); } //----------------------------------------------------- fixed quadrant(fixed inangle) { // output: 0-3 for quadrant 0-3 // we assume step-angle is smaller than pi // so that there are always at least two samples per period (alias-free) // hence the max input angle can be (2pi + pi) unsigned q = 0; // if inangle >2pi, subtract 2pi. // This brings inangle in the range 0 - pi and keeps the same quardrant if (inangle > 4*PI2) inangle = inangle - 4*PI2; if (inangle > 3*PI2) return 3; else if (inangle > 2*PI2) return 2; else if (inangle > PI2) return 1; return 0; } //----------------------------------------------------- fixed angleadj(fixed inangle) { // input angle is 0 .. 3pi // output angle in first quadrant such that // abs(sin(inangle)) = sin(outangle) // if inangle >2pi, subtract 2pi. // This brings inangle in the range 0 - pi and keeps the same quardrant if (inangle > 4*PI2) inangle = inangle - 4*PI2; if (inangle > 3*PI2) return (4*PI2 - inangle); else if (inangle > 2*PI2) return (inangle - 2*PI2); else if (inangle > PI2) return (2*PI2 - inangle); return inangle; } //----------------------------------------------------- fixed accumulator(fixed inangle, fixed inangleadd) { inangle = inangle + inangleadd; if (inangle > 4*PI2) inangle = inangle - 4*PI2; return inangle; } //----------------------------------------------------- fixed cordicsine(fixed inangle) { fixed X, Y, TargetAngle, CurrAngle; unsigned Step; X=FIXED(AG_CONST); /* AG_CONST * cos(0) */ Y=0; /* AG_CONST * sin(0) */ TargetAngle = angleadj(inangle); CurrAngle=0; for(Step=0; Step < 16; Step++) { fixed NewX; if (TargetAngle > CurrAngle) { NewX = X - (Y >> Step); Y = (X >> Step) + Y; X = NewX; CurrAngle += Angles[Step]; } else { NewX = X + (Y >> Step); Y = -(X >> Step) + Y; X = NewX; CurrAngle -= Angles[Step]; } } if (quadrant(inangle) < 2) return Y; else return -Y; } //----------------------------------------------------- int main(void) { fixed angle, angleadd; fixed sine; unsigned i; double fsine; printf("2pi %8x\n", 4*PI2); printf("3pi/2 %8x\n", 3*PI2); printf(" pi %8x\n", 2*PI2); printf(" pi/2 %8x\n", PI2); angleadd = PI2/11; printf(" inc %8x\n", angleadd); angle = angleadd; for (i=0; i<100; i++) { sine = cordicsine(angle); if (0) printf("a %5x s %10x ( sin(%8.5f) = %8.5f ) sin %8.5f err %18.15f\n", angle, sine, FLOAT(angle), FLOAT(sine), sin(FLOAT(angle)), FLOAT(sine) - sin(FLOAT(angle))); else printf("%x\n", sine); angle = accumulator(angle, angleadd); } }