Attention

This document was last updated Nov 25 24 at 21:59

A Coprocessor for the Bresenham Line-Drawing Algorithm

Important

The purpose of this lab is as follows.

  • To develop a hardware implementation of an algorithm specified as a C function.

  • The lab assignment covers the following tasks:

    • Writing RTL code for the C function with simular functionality

    • Design a hardware interface according to the specification provided

    • Performing RTL simulation to verify the correctness of the design

    • Synthesizing the RTL to 45nm standard cells

    • Making sure that the design meets the clock period constraint

    • Build a layout for the hardware implementation with pin allocation as specified

    • Perform post-layout gate-level simulation to demonstrate that the design is still correct

    • Summarizing the results in a written report

Attention

The due date of this lab is 11 November

The Bresenham Line Drawing Algorithm

In 1962, Jack Elton Bresenham developed a line drawing algorithm to compute which pixels of a raster scan display should be turned on to draw a straight line. The algorithm stands out in that it achieves this task using only integer arithmetic, and addition and substraction. The problem is illustrated in the following figure. Suppose we are drawing a line on a discrete grid from pixel (0,0) to pixel (tx, ty).

The line drawing algorithm needs to decide which pixels should be turned on (grey in the figure). The true mathematical line cuts across fractional pixels, making it challenging to decide when to take, in this example, a vertical step. The solution of Bresenham is to keep an error accumulator that computes how far the center of each pixel remains to the true mathematical line, and to make sure that each pixel that is turned on is never more than half a pixel away from the line.

_images/bresenpixels.png

The Bresenham algorithm is still very popular today in vector-generation algorithms, where you have to create a scalable figure based on a set of vertices. There is a family of ‘Bresenham’ algorithms that are used to draw complex curves and shapes, all of them using simple integer arithmetic.

Because the integer arithmetic used by the Bresenham algorithm, a very efficient hardware implementation is possible. The topic of this lab is to study how a C function can be transformed into a hardware module. In the final project, where you will design a coprocessor for an SoC, this design skill will come in quite handy.

The Bresenham Reference Implementation

The following is a C function that implements the line drawing algorithm. If you are interested in the logic behind the line drawing, refer for example to the Wikipedia article on Bresenham’s Algorithm

 void plotLine(int x1, int y1, int x2, int y2) {
  int dx = abs(x2 - x1);
  int dy = abs(y2 - y1);
  int sx = x1 < x2 ? 1 : -1;
  int sy = y1 < y2 ? 1 : -1;
  int err = dx - dy;

  while (1) {

    printf("%d, %d\n", x1, y1);

    if (x1 == x2 && y1 == y2) break;

    int e2 = 2 * err;

    if (e2 > -dy) {
      err -= dy;
      x1 += sx;
    }

    if (e2 < dx) {
      err += dx;
      y1 += sy;
    }
  }
}

Note the following features of the algorithm.

  • The algorithm decides on a direction of the line according to each quadrant. This results in the value of dx and dy being +1 or -1 to encode the general step direction of a line. For example, in the second quadrant, dx will be negative and dy will be positive.

  • The algorithm uses an error accumulator err that keeps track of how far the mathematical line is from the center of a pixel. The error is encoded as an integer value, since the decision on which pixel to turn on next is decided solely by the sign of err. This is the true brilliance of Bresenham’s Algorithm: by relying only on the sign of err, the algorithm can scale up err to a value proportional to the error, rather than the error itself.

  • The algorithm decides between taking a step in only the x-direction, only the y-direction, or in both directions. However, by running small examples, you will see that any set of pixel coordinates will always take a step in either x or else in y, depending on the octant of the line. Only a perfect 45 degree line will cause continuous stepping in both x and y directions, simultaneously.

  • The algorithm requires signed arithmetic for dx, dy, sx, sy, and err. The pixel coordinates x1, y1, x2, y2 can be unsigned.

For example, the following testbench generates a (slightly rotated) hexagon. We will use this testbench as the reference.

 int main() {
  plotLine(787, 590, 647, 812);
  plotLine(647, 812, 335, 748);
  plotLine(335, 748, 237, 434);
  plotLine(237, 434, 377, 212);
  plotLine(377, 212, 689, 276);
  plotLine(689, 276, 787, 590);
  return 0;
}
_images/bresengon.png

The Bresenham Hardware Interface

Your task is to develop a hardware implementation of the reference implementation of the line drawing algorithm in C. The hardware implementation must be programmable, and must use the following interface.

Name

Direction

Function

clk

Input

Clock

resetn

Input

Reset, negative-asserted

D

Input 10 bit

Data input to load coordinate values

ldx1

Input

When asserted, D contains the x1 coordinate

ldy1

Input

When asserted, D contains the y1 coordinate

ldx2

Input

When asserted, D contains the x2 coordinate

ldy2

Input

When asserted, D contains the y2 coordinate

start

Input

When asserted, line drawing starts

busy

Output

When asserted, line drawing in progress

Xout

Output 10 bit

Data output containing the output pixel’s X

Yout

Output 10 bit

Data output containing the output pixel’s Y

validxy

Output

When asserted, (Xout, Yout) is valid

In addition, the implementation must meet the following performance requirements.

  • Generate one pixel per clock cycle

  • Use 45nm standard cell technology

  • Run at a 3ns clock period (that’s 333MHz!)

Getting Started

Accept the lab2 assignment to obtain the starter files at Github Classroom. This results in a repository with the following subdirectories:

  • C contains a reference implementation in C

  • rtl contains the template outline of the module you have to design

  • sim contains a testbench to simulate and verify the RTL

  • syn contains a synthesis script

  • constraints containts the I/O timing constraints

  • layout contains a layout script

  • glsim containts the post-layout simulation testbench

  • chip contains the chip I/O assignment

You will mainly develop code in rtl and in chip.

Step 1: Run the C program

A good first step is to run the C program and play around with the algorithm. Go into the C directory and compile the program. Then, run it. Also, study the algorithm and make sure you understand how the pixel computation loop works.

cd C
gcc vplot.c
./a.out

787, 590
786, 591
786, 592
...

Step 2: Develop the RTL design

You have to design RTL code for the C reference implementation following the interface provided. Give this some careful thought.

module vplot (
   input  logic        clk,
   input  logic        resetn,
   input  logic [9:0]  D,
   input  logic        ldx1,
   input  logic        ldx2,
   input  logic        ldy1,
   input  logic        ldy2,
   input  logic        start,
   output logic [9:0]  Xout,
   output logic [9:0]  Yout,
   output logic        validxy,
   output logic        busy
);
  • Think of mapping the C function to hardware as two concurrent activities: implementing the control flow of the C program, and implementing the computations done by the C program. The control flow would be implemented using a finite state machine, while the computations would be implemented by a datapath.

  • Think of mapping the state of the C function to registers. Important state variables are the starting and ending coordinates, the coordinates of the current location, the error, the direction of the line (sx and sy). By creating registers for these variables and thinking about their update in terms of the C’s program execution, you will systematically develop the hardware design.

Step 3: Perform RTL simulation

You will verify your RTL design using a provided testbench. The following shows the testbench output when there are errors in the RTL. Obviously, your RTL design is only complete when you have no more errors. Make use of the graphical debugger to debug your HDL code. You can start the XCelium GUI by using make simg instead of make sim.

cd sim
make

xcelium> source /opt/cadence/XCELIUM2209/tools/xcelium/files/xmsimrc
xcelium> run
ERROR Expected ( 786,  592) Received ( 785,  591)
ERROR Expected ( 785,  593) Received ( 784,  591)
ERROR Expected ( 784,  594) Received ( 784,  592)
ERROR Expected ( 784,  595) Received ( 784,  593)
ERROR Expected ( 783,  596) Received ( 783,  594)
ERROR Expected ( 783,  597) Received ( 782,  594)
ERROR Expected ( 782,  598) Received ( 782,  595)
ERROR Expected ( 781,  599) Received ( 782,  596)
ERROR Expected ( 781,  600) Received ( 782,  597)
ERROR Expected ( 780,  601) Received ( 781,  598)
ERROR Expected ( 779,  602) Received ( 780,  598)
ERROR Expected ( 779,  603) Received ( 780,  599)
ERROR Expected ( 778,  604) Received ( 780,  600)
ERROR Expected ( 778,  605) Received ( 780,  601)
ERROR Expected ( 777,  606) Received ( 779,  602)
ERROR Expected ( 776,  607) Received ( 778,  602)
ERROR Expected ( 776,  608) Received ( 778,  603)
ERROR Expected ( 775,  609) Received ( 778,  604)
ERROR Expected ( 774,  610) Received ( 778,  605)
ERROR Expected ( 774,  611) Received ( 777,  606)
ERROR Expected ( 773,  612) Received ( 776,  606)
ERROR Expected ( 772,  613) Received ( 775,  606)
ERROR Expected ( 772,  614) Received ( 775,  607)
ERROR Expected ( 771,  615) Received ( 775,  608)
ERROR Expected ( 771,  616) Received ( 774,  609)
ERROR Expected ( 770,  617) Received ( 773,  609)
ERROR Expected ( 769,  618) Received ( 773,  610)
ERROR Expected ( 769,  619) Received ( 773,  611)
ERROR Expected ( 768,  620) Received ( 773,  612)
ERROR Expected ( 767,  621) Received ( 772,  613)
ERROR Expected ( 767,  622) Received ( 771,  613)
ERROR Expected ( 766,  623) Received ( 771,  614)
ERROR Expected ( 766,  624) Received ( 771,  615)
Simulation stopped via $stop(1) at time 425 NS + 0
./vplot_tb.sv:129       $stop;

Step 3: Perform RTL synthesis

Once you have a working RTL simulation, you can complete RTL synthesis.

cd syn
make

BASENAME=vplot \
CLOCKPERIOD=3 \
TIMINGPATH=/opt/cadence/libraries/gsclib045_all_v4.7/gsclib045/timing/ \
TIMINGLIB=slow_vdd1v0_basicCells.lib \
genus -f genus_script.tcl
TMPDIR is being set to /tmp/genus_temp_31271_arc-schaumont-class-vm_pschaumont_2HG8MG
Cadence Genus(TM) Synthesis Solution.
...

When synthesis completes, make sure to check the reports for timing and area. You should have no negative slack (for setup timing), and you should have no latches in the netlist. It’s a little more tricky to check your design for latches. The quickest method may be to grep the gate level netlist and look for cells of the ‘TLAT’ type (which are latches).

cd reports
cat vplot_report_timing.rpt
cd ../outputs
grep TLAT vplot_netlist.v

Step 4: Build the chip outline

Before you can implement the layout, you have to build the chip input/output outline. Create a chip/chip.io file that assigns pins as follows along the chip perimeter.

  • Along the top, arrange all control signals as well as clk and resetn.

  • Along the left side, arrange the pins of the D input

  • Along the right side, arrange the pins of the X and Y output

After you have created the chip I/O, create the layout.

cd layout
make syn
make layout

Inside of the innovus tool, you can open up the layout to make a screen shot.

gui_show

If you have a complete design, you should be able to produce something as follows:

_images/bresenlayout.png

Step 5: Perform post-layout gate-level simulation

Your final step is to perform a gate-level timing-accurate simulation of the design post-layout. The design must still be functionall correct.

cd glsim
make

Important

In this lab, steps 3-5 are a minor effort compared to step 1-2. Make sure that you think carefully through your hardware design. Apply the rules of purely synchronous single-clock design, and clearly partition datapath from control (FSM). Also, clearly partition combinational logic (always @* or always_comb) from sequential logic (always @(posedge clk) or always_ff). Your attention to detail in step 2 will pay off in later steps.

Writing your report

Your report should contain the following

  • A System Verilog Listing of your design

  • A statement that your design is functionally correct post-synthesis as well as post-layout

  • A comprehensive performance evaluation, including area, cell counts, post-synthesis timing analysis, post-layout chip area, post-layout timing analysis

  • A chip layout

Please format your output results and do not cut and paste from the tool output.

Attention

The following are important report requirements.

  • Use a typesetting tool such as latex or word. Do not submit handwritten scanned reports.

  • Use a screen capture tool to collect graphics such as layout information. Do not use your smartphone to take a picture from the screen.

  • Be clear and complete in your report. I am not looking for the correct solution; I am looking to understand your solution. Explain the steps that lead up to a result. The report does not have a minimum length nor a maximum length, as long as it is clear what you did and your answers to questions are complete.

  • Make sure to update, commit and push the results of your lab to your repository. You do not have to turn in anything on Canvas. All results will be communicated through your github repository.