Attention

This document was last updated Nov 25 24 at 21:59

A Coprocessor for the Bresenham Line-Drawing Algorithm

Important

The purpose of this lab is as follows.

To develop a hardware implementation of an algorithm specified as a C function.
The lab assignment covers the following tasks:
- Writing RTL code for the C function with simular functionality
- Design a hardware interface according to the specification provided
- Performing RTL simulation to verify the correctness of the design
- Synthesizing the RTL to 45nm standard cells
- Making sure that the design meets the clock period constraint
- Build a layout for the hardware implementation with pin allocation as specified
- Perform post-layout gate-level simulation to demonstrate that the design is still correct
- Summarizing the results in a written report

Attention

The due date of this lab is 11 November

The Bresenham Line Drawing Algorithm

In 1962, Jack Elton Bresenham developed a line drawing algorithm to compute which pixels of a raster scan display should be turned on to draw a straight line. The algorithm stands out in that it achieves this task using only integer arithmetic, and addition and substraction. The problem is illustrated in the following figure. Suppose we are drawing a line on a discrete grid from pixel (0,0) to pixel (tx, ty).

The line drawing algorithm needs to decide which pixels should be turned on (grey in the figure). The true mathematical line cuts across fractional pixels, making it challenging to decide when to take, in this example, a vertical step. The solution of Bresenham is to keep an error accumulator that computes how far the center of each pixel remains to the true mathematical line, and to make sure that each pixel that is turned on is never more than half a pixel away from the line.

The Bresenham algorithm is still very popular today in vector-generation algorithms, where you have to create a scalable figure based on a set of vertices. There is a family of ‘Bresenham’ algorithms that are used to draw complex curves and shapes, all of them using simple integer arithmetic.

Because the integer arithmetic used by the Bresenham algorithm, a very efficient hardware implementation is possible. The topic of this lab is to study how a C function can be transformed into a hardware module. In the final project, where you will design a coprocessor for an SoC, this design skill will come in quite handy.

The Bresenham Reference Implementation

The following is a C function that implements the line drawing algorithm. If you are interested in the logic behind the line drawing, refer for example to the Wikipedia article on Bresenham’s Algorithm

 void plotLine(int x1, int y1, int x2, int y2) {
  int dx = abs(x2 - x1);
  int dy = abs(y2 - y1);
  int sx = x1 < x2 ? 1 : -1;
  int sy = y1 < y2 ? 1 : -1;
  int err = dx - dy;

  while (1) {

    printf("%d, %d\n", x1, y1);

    if (x1 == x2 && y1 == y2) break;

    int e2 = 2 * err;

    if (e2 > -dy) {
      err -= dy;
      x1 += sx;
    }

    if (e2 < dx) {
      err += dx;
      y1 += sy;
    }
  }
}

Note the following features of the algorithm.

The algorithm decides on a direction of the line according to each quadrant. This results in the value of dx and dy being +1 or -1 to encode the general step direction of a line. For example, in the second quadrant, dx will be negative and dy will be positive.
The algorithm uses an error accumulator err that keeps track of how far the mathematical line is from the center of a pixel. The error is encoded as an integer value, since the decision on which pixel to turn on next is decided solely by the sign of err. This is the true brilliance of Bresenham’s Algorithm: by relying only on the sign of err, the algorithm can scale up err to a value proportional to the error, rather than the error itself.
The algorithm decides between taking a step in only the x-direction, only the y-direction, or in both directions. However, by running small examples, you will see that any set of pixel coordinates will always take a step in either x or else in y, depending on the octant of the line. Only a perfect 45 degree line will cause continuous stepping in both x and y directions, simultaneously.
The algorithm requires signed arithmetic for dx, dy, sx, sy, and err. The pixel coordinates x1, y1, x2, y2 can be unsigned.

For example, the following testbench generates a (slightly rotated) hexagon. We will use this testbench as the reference.

 int main() {
  plotLine(787, 590, 647, 812);
  plotLine(647, 812, 335, 748);
  plotLine(335, 748, 237, 434);
  plotLine(237, 434, 377, 212);
  plotLine(377, 212, 689, 276);
  plotLine(689, 276, 787, 590);
  return 0;
}

The Bresenham Hardware Interface

Your task is to develop a hardware implementation of the reference implementation of the line drawing algorithm in C. The hardware implementation must be programmable, and must use the following interface.

Name	Direction	Function
clk	Input	Clock
resetn	Input	Reset, negative-asserted
D	Input 10 bit	Data input to load coordinate values
ldx1	Input	When asserted, D contains the x1 coordinate
ldy1	Input	When asserted, D contains the y1 coordinate
ldx2	Input	When asserted, D contains the x2 coordinate
ldy2	Input	When asserted, D contains the y2 coordinate
start	Input	When asserted, line drawing starts
busy	Output	When asserted, line drawing in progress
Xout	Output 10 bit	Data output containing the output pixel’s X
Yout	Output 10 bit	Data output containing the output pixel’s Y
validxy	Output	When asserted, (Xout, Yout) is valid

In addition, the implementation must meet the following performance requirements.

Generate one pixel per clock cycle
Use 45nm standard cell technology
Run at a 3ns clock period (that’s 333MHz!)

Getting Started

Accept the lab2 assignment to obtain the starter files at Github Classroom. This results in a repository with the following subdirectories:

C contains a reference implementation in C
rtl contains the template outline of the module you have to design
sim contains a testbench to simulate and verify the RTL
syn contains a synthesis script
constraints containts the I/O timing constraints
layout contains a layout script
glsim containts the post-layout simulation testbench
chip contains the chip I/O assignment

You will mainly develop code in rtl and in chip.

Step 1: Run the C program

A good first step is to run the C program and play around with the algorithm. Go into the C directory and compile the program. Then, run it. Also, study the algorithm and make sure you understand how the pixel computation loop works.

cd C
gcc vplot.c
./a.out

787, 590
786, 591
786, 592
...

Step 2: Develop the RTL design

You have to design RTL code for the C reference implementation following the interface provided. Give this some careful thought.

module vplot (
   input  logic        clk,
   input  logic        resetn,
   input  logic [9:0]  D,
   input  logic        ldx1,
   input  logic        ldx2,
   input  logic        ldy1,
   input  logic        ldy2,
   input  logic        start,
   output logic [9:0]  Xout,
   output logic [9:0]  Yout,
   output logic        validxy,
   output logic        busy
);

Think of mapping the C function to hardware as two concurrent activities: implementing the control flow of the C program, and implementing the computations done by the C program. The control flow would be implemented using a finite state machine, while the computations would be implemented by a datapath.
Think of mapping the state of the C function to registers. Important state variables are the starting and ending coordinates, the coordinates of the current location, the error, the direction of the line (sx and sy). By creating registers for these variables and thinking about their update in terms of the C’s program execution, you will systematically develop the hardware design.

Step 3: Perform RTL simulation

You will verify your RTL design using a provided testbench. The following shows the testbench output when there are errors in the RTL. Obviously, your RTL design is only complete when you have no more errors. Make use of the graphical debugger to debug your HDL code. You can start the XCelium GUI by using make simg instead of make sim.

cd sim
make

xcelium> source /opt/cadence/XCELIUM2209/tools/xcelium/files/xmsimrc
xcelium> run
ERROR Expected ( 786,  592) Received ( 785,  591)
ERROR Expected ( 785,  593) Received ( 784,  591)
ERROR Expected ( 784,  594) Received ( 784,  592)
ERROR Expected ( 784,  595) Received ( 784,  593)
ERROR Expected ( 783,  596) Received ( 783,  594)
ERROR Expected ( 783,  597) Received ( 782,  594)
ERROR Expected ( 782,  598) Received ( 782,  595)
ERROR Expected ( 781,  599) Received ( 782,  596)
ERROR Expected ( 781,  600) Received ( 782,  597)
ERROR Expected ( 780,  601) Received ( 781,  598)
ERROR Expected ( 779,  602) Received ( 780,  598)
ERROR Expected ( 779,  603) Received ( 780,  599)
ERROR Expected ( 778,  604) Received ( 780,  600)
ERROR Expected ( 778,  605) Received ( 780,  601)
ERROR Expected ( 777,  606) Received ( 779,  602)
ERROR Expected ( 776,  607) Received ( 778,  602)
ERROR Expected ( 776,  608) Received ( 778,  603)
ERROR Expected ( 775,  609) Received ( 778,  604)
ERROR Expected ( 774,  610) Received ( 778,  605)
ERROR Expected ( 774,  611) Received ( 777,  606)
ERROR Expected ( 773,  612) Received ( 776,  606)
ERROR Expected ( 772,  613) Received ( 775,  606)
ERROR Expected ( 772,  614) Received ( 775,  607)
ERROR Expected ( 771,  615) Received ( 775,  608)
ERROR Expected ( 771,  616) Received ( 774,  609)
ERROR Expected ( 770,  617) Received ( 773,  609)
ERROR Expected ( 769,  618) Received ( 773,  610)
ERROR Expected ( 769,  619) Received ( 773,  611)
ERROR Expected ( 768,  620) Received ( 773,  612)
ERROR Expected ( 767,  621) Received ( 772,  613)
ERROR Expected ( 767,  622) Received ( 771,  613)
ERROR Expected ( 766,  623) Received ( 771,  614)
ERROR Expected ( 766,  624) Received ( 771,  615)
Simulation stopped via $stop(1) at time 425 NS + 0
./vplot_tb.sv:129       $stop;

Step 3: Perform RTL synthesis

Once you have a working RTL simulation, you can complete RTL synthesis.

cd syn
make

BASENAME=vplot \
CLOCKPERIOD=3 \
TIMINGPATH=/opt/cadence/libraries/gsclib045_all_v4.7/gsclib045/timing/ \
TIMINGLIB=slow_vdd1v0_basicCells.lib \
genus -f genus_script.tcl
TMPDIR is being set to /tmp/genus_temp_31271_arc-schaumont-class-vm_pschaumont_2HG8MG
Cadence Genus(TM) Synthesis Solution.
...

When synthesis completes, make sure to check the reports for timing and area. You should have no negative slack (for setup timing), and you should have no latches in the netlist. It’s a little more tricky to check your design for latches. The quickest method may be to grep the gate level netlist and look for cells of the ‘TLAT’ type (which are latches).

cd reports
cat vplot_report_timing.rpt
cd ../outputs
grep TLAT vplot_netlist.v

Step 4: Build the chip outline

Before you can implement the layout, you have to build the chip input/output outline. Create a chip/chip.io file that assigns pins as follows along the chip perimeter.

Along the top, arrange all control signals as well as clk and resetn.
Along the left side, arrange the pins of the D input
Along the right side, arrange the pins of the X and Y output

After you have created the chip I/O, create the layout.

cd layout
make syn
make layout

Inside of the innovus tool, you can open up the layout to make a screen shot.

gui_show

If you have a complete design, you should be able to produce something as follows:

Step 5: Perform post-layout gate-level simulation

Your final step is to perform a gate-level timing-accurate simulation of the design post-layout. The design must still be functionall correct.

cd glsim
make

Important

In this lab, steps 3-5 are a minor effort compared to step 1-2. Make sure that you think carefully through your hardware design. Apply the rules of purely synchronous single-clock design, and clearly partition datapath from control (FSM). Also, clearly partition combinational logic (always @* or always_comb) from sequential logic (always @(posedge clk) or always_ff). Your attention to detail in step 2 will pay off in later steps.

Writing your report

Your report should contain the following

A System Verilog Listing of your design
A statement that your design is functionally correct post-synthesis as well as post-layout
A comprehensive performance evaluation, including area, cell counts, post-synthesis timing analysis, post-layout chip area, post-layout timing analysis
A chip layout

Please format your output results and do not cut and paste from the tool output.

Attention

The following are important report requirements.

Use a typesetting tool such as latex or word. Do not submit handwritten scanned reports.
Use a screen capture tool to collect graphics such as layout information. Do not use your smartphone to take a picture from the screen.
Be clear and complete in your report. I am not looking for the correct solution; I am looking to understand your solution. Explain the steps that lead up to a result. The report does not have a minimum length nor a maximum length, as long as it is clear what you did and your answers to questions are complete.
Make sure to update, commit and push the results of your lab to your repository. You do not have to turn in anything on Canvas. All results will be communicated through your github repository.