Introduction

The purpose of this homework is to design and implement an interpolation filter. The following paragraph will make much more sense after Lecture 14 on 10/17. For now, just look at it as a C function that you have to implement in hardware.

The interpolation filter reads in 32-bit samples and has 4 taps. The sincx interpolation coefficients are quantized as fix<16,14> numbers. In other words, the coefficients are scaled up by a factor of (1 << 14) = 16384 so that the multiplications can be performed using integer arithmetic. The interpolation filter increases the sample-rate four-fold. For every input sample, four output samples will be produced. The output is produced as 32-bit samples which, due to the scaling of the coefficients are (1<< 14) times larger than the actual value.

The reference implementation of the interpolation filter is as follows.

void interpolate_sw(int sample, int result[4]) {
  static int tap[4] = {0, 0, 0, 0};
  int coef[16] = {    0,  -2107,  -3477, -2950, 
                      0,   4917,  10430, 14751, 
                  16383,  14751,  10430,  4917, 
                      0,  -2950,  -3477, -2107 };

  result[0] = tap[0] * coef[0] + tap[1] * coef[4] + tap[2] * coef[8] + tap[3] * coef[12];
  result[1] = tap[0] * coef[1] + tap[1] * coef[5] + tap[2] * coef[9] + tap[3] * coef[13];
  result[2] = tap[0] * coef[2] + tap[1] * coef[6] + tap[2] * coef[10] + tap[3] * coef[14];
  result[3] = tap[0] * coef[3] + tap[1] * coef[7] + tap[2] * coef[11] + tap[3] * coef[15];

  tap[3] = tap[2];
  tap[2] = tap[1];
  tap[1] = tap[0];
  tap[0] = sample;
}

Our objective is to produce a bit-accurate version of this filter in hardware. ‘Bit-accurate’ means that the hardware will compute identical results as the reference software. You will integrate the hardware as a custom-instruction design in the Nios-II, test it on the board, and compare the performance with software by a timing measurement.

The homework has three questions. You will need to turn in a PDF writeup (containing mostly screenshots) as well as a set of design files. The submission requirements are stated with each question, and they are summarized at the end.

This homework will be graded at 30 points, as it is slightly more complex than previous homework.

Getting Started

Start by accepting the homework 5 assignment from Github Classroom. Then, clone the resulting repository to your computer.

git clone https://github.com/vt-ece4530-f19/homework-5-USERNAME

This provides the following set of starter files.

.
├── reference
│   └── main.c                # Reference implementation in C
├── verilog
│   ├── interpolatecitb.v     # Verilog testbench
│   └── interpolateci.v       # Custom-instruction module (to be completed)
├── stimuli
│   ├── stimgen.c             # Stimuli generator
│   ├── testvectors.bin       # Stimuli for Verilog testbench
│   └── testvectors.txt       # Stimuli (readable version)
├── benchmark                 # Nios C program to benchmark your custom instruction
└── example-nios-ci           # Nios based SoC that will hold the custom instruction

A good starting point is the ‘reference’ directory. This directory contains one file. You can compile it in gcc or MSVC++, and you don’t need an FPGA to run the program.

gcc main.c -o main

The program has two running modes. If you run the program standalone, it will produce a text-mode graph, similar to the following. The graph stops after 256 random samples have been interpolated. The plot bars marked with x are true samples which have been provided at the input. The output points marked with . are interpolated values, and you can see there are three interpolated samples for every true sample. Furthermore, the interpolate samples form a ‘smooth’ curve, thanks to the sincx intepolation function.

Besides random samples, the test progam injects zero values for sample #27 to #32 of every group of 32 samples.

# ./main

                                        ---------------------------------x
                                        |                           .
                                        |                  .
                                        |         .
                                        -----x
                                        |  .
                                        |      .
                                        |                .
                                        ----------------------------x
                                        |                                    .
                                        |                                       .
                                        |                                     .
                                        ----------------------------x
                                        |         .
                                .       |
                .                       |
          x------------------------------
        .                               |
                .                       |
                               .        |
                                        -------x
                                        |             .
                                        |                .
                                        |               .
                                        ---------------x
                                        |               .
                                        |               .
                                        |                .
                                        ------------------x
                                        |             .
                                        |          .
                                        |          .

When you run main with an argument (any argument), it will produce a table of values instead. You can use this program to generate additional stimuli besides the ones you received in the stimuli/ directory. Each line shows (a) the output phase 0 to 3, with phase 0 a true output sample, (b) the input value which changes only once every four phases, (c) the output value, scaled up by a factor 1 << 14 and (d) the downscaled, truncated output value.

Given the input in the ‘in’ column, the custom instruction that you will design must be able to produce the same identical 32-bit output as shown in the ‘out’ column.

0 in      103 out        0 scaled        0
1 in      103 out        0 scaled        0
2 in      103 out        0 scaled        0
3 in      103 out        0 scaled        0
0 in      198 out        0 scaled        0
1 in      198 out  -217021 scaled      -13
2 in      198 out  -358131 scaled      -13
3 in      198 out  -303850 scaled      -13
0 in     -151 out        0 scaled        0
1 in     -151 out    89265 scaled        5
2 in     -151 out   385844 scaled        5
3 in     -151 out   935253 scaled        5
0 in     -141 out  1687449 scaled      103
1 in     -141 out  2811076 scaled      171
2 in     -141 out  3664457 scaled      171
3 in     -141 out  3872599 scaled      171
0 in     -175 out  3243834 scaled      198
1 in     -175 out  2171468 scaled      132
2 in     -175 out   622336 scaled      132
3 in     -175 out -1054906 scaled      132
0 in       -1 out -2473833 scaled     -151
1 in       -1 out -3136073 scaled     -191
2 in       -1 out -3125531 scaled     -191
3 in       -1 out -2723294 scaled     -191

Question 1 (15 points): Verilog Design of the Interpolation Filter

You first step is to design the interpolation filter, based on the reference C implementation. For this purpose, you have to fill out the file verilog/interpolateci.v.

You are free to design your own custom instruction-set, but I recommend the following overall approach. The following set of four custom-instruction calls will execute the four phases of the interpolation filter.

n dataa datab result Purpose
0 Don’t care Don’t care Phase0 Computes the phase 0 output
1 Don’t care Don’t care Phase1 Computes the phase 1 output
2 Don’t care Don’t care Phase2 Computes the phase 2 output
3 Sample Don’t care Phase3 Computes the phase 3 output and shifts in a new sample in taps


Refering back to the original C reference, you can see that each of these four instructions covers part of the original reference function.

  // When the n input is 0, compute result[0]
  result[0] = tap[0] * coef[0] + tap[1] * coef[4] + tap[2] * coef[8] + tap[3] * coef[12];

  // When the n input is 0, compute result[0]
  result[1] = tap[0] * coef[1] + tap[1] * coef[5] + tap[2] * coef[9] + tap[3] * coef[13];

  // When the n input is 0, compute result[0]
  result[2] = tap[0] * coef[2] + tap[1] * coef[6] + tap[2] * coef[10] + tap[3] * coef[14];

  // When the n input is 0, compute result[0] and shift in a new sample in the taps
  result[3] = tap[0] * coef[3] + tap[1] * coef[7] + tap[2] * coef[11] + tap[3] * coef[15];
  tap[3] = tap[2];
  tap[2] = tap[1];
  tap[1] = tap[0];
  tap[0] = sample;
}

There are several implementation strategies that you can use. You can use any solution that produces a working design. Be creative!

  • One strategy is to implement a custom instruction with four data multipliers, one for each tap of the delay line.

  • Another strategy is to implement a custom instruction with a single data multiplier, and multiplex the multiplier among the taps.

HINT 1

Keep in mind that the multiplication is a signed multiplication. The coefficients as well as the sample inputs are 32-bit two’s complement values. To obtain the same result as the C reference, you will need a multiplier of 32-bit precision.

HINT 2

You need a lookup table to store the coefficients. The easiest to do this quickly, is to use a Verilog case statement that uses the coefficient index as the selector, and the coefficient value as the output.

To test if your verilog works, you can make use the the testbench interpolatecitb.v. If you have not changed the custom-instruction design, you can directly run interpolatecitb. It will run a set of 16 input samples on the interpolation filter hardware, and check the 64 output values that are produced at the output. Each of the output values has to be correct, for the testbench to pass.

Here is the sample output for a correct implementation of interpolateci.v.

$ vlib work

$ vlog *v
Start time: 18:59:23 on Oct 16,2019
vlog interpolateci.v interpolatecitb.v interpolatecoef.v
Model Technology ModelSim - Intel FPGA Edition vlog 10.5b Compiler 2016.10 Oct  5 2016
-- Compiling module interpolateci
-- Compiling module interpolatecitb
-- Compiling module interpolatecoef

Top level modules:
        interpolatecitb
End time: 18:59:23 on Oct 16,2019, Elapsed time: 0:00:00
Errors: 0, Warnings: 0

$ vsim -c -do "run -all" work.interpolatecitb
Reading C:/intelFPGA_lite/18.1/modelsim_ase/tcl/vsim/pref.tcl

# 10.5b

# vsim -c -do "run -all" work.interpolatecitb
# Start time: 18:59:46 on Oct 16,2019
# Loading work.interpolatecitb
# Loading work.interpolateci
# Loading work.interpolatecoef
# run -all
# OK 00000000
# OK 00000000
# OK 00000000
# OK 00000000
# OK 00000000
... (some lines skipped)
# OK ffe37f6e
# OK ffd380b2
# OK ffdc2f44
# OK fff41dac
# Complete with           0 errors
#
# ** Note: $finish    : interpolatecitb.v(87)
#    Time: 10430 ps  Iteration: 1  Instance: /interpolatecitb
# End time: 18:59:47 on Oct 16,2019, Elapsed time: 0:00:01
# Errors: 0, Warnings: 0

What to turn in for Question 1

Commit all Verilog that you wrote for interpolateci to your repository. We will test your verilog by running it on the testbench. If you modify the custom-instruction design (e.g. new instructions), make sure to provide a modified testbench.

Question 2 (10 points): Integrate your custom instruction in Nios II

As the second step, you will integrate the verified custom instruction into a Nios II. Go to the example-nios-ci subdirectory, open exampleniosci.qpf in Quartus and open platformniosci.qsys in the Platform Designer.

You have to replace the existing bitmerge instruction with your newly designed interpolateci. Then, you have to generate verilog code for this platform, synthesize the design and download it on the board.

First, start by adding your custom instruction. In the IP catalog, click ‘New Component’, and follow the steps to wire up the interpolate custom instruction.

  1. Under ‘Component Type’, give the new custom instruction the name interpolateci

  2. Under ‘Files’, add all Verilog that you wrote (expect for the testbench!). If you have more then one file, verify that the correct top-level file is marked. Click on ‘Analyze Synthesis Files’ to verify that your custom instruction design is synthesizable.

  3. Under ‘Signals and Interfaces’, make sure that the design is wired up correctly. From the menu, select ‘Signals’ from the menu and make sure that the custom instruction is connect to a custom instruction slave. Finally, select the ‘Interfaces’ tab and click ‘Remove Interfaces with no signals’. If you look back at the ‘Signals’ tab and the ‘Signals and Interfaces’ tab, you will see the Figures below. If that all checks out, click Finish to save the new instruction.

  4. Finally, instantiate the custom instruction in the SoC and connect it to the Nios-II Core. In addition, add an interval timer module, controlled by the Nios-II. Refer to the earlier example example-nios-timer if you don’t know how to connect it. Go through all steps for hardware synthesis and bitstream download.

nios-cislave

nios-cislave2

What to turn in for Question 2

Provide a PDF that lists the following intermation, extracted from your synthesis reports. In addition, try to determine, as accurately as possible, how many of these resources are used by your custom instruction. A quick (but rough) way to do this is to remove the custom instruction and resynthesize the design.

  Overall Design My Custom Instruction
Number of block memory modules (RAM)
Number of ALMs
Number of registers
Number of DSP modules

Question 3 (5 points): Performance evaluation of your custom instruction in Nios II

The final step is to measure the performance of your custom instruction and compare it with the performance of a software implementation. This follows the steps that you have done before with Platform Designer. You can refer to Homework 4, the quick reference guide, or Lecture 11.

  1. Make sure you download the bitstream with nios2-configure-sof.
  2. Create a BSP with nios2-bsp-editor. Connect the timer for timestamping, and select a small size for libraries.
  3. Generate the BSP files with nios2-bsp-generate-files.
  4. Use nios2-app-generate-makefile and make to compile the benchmark program available in the benchmark directory on the root repository. You may need to modify the program, in case you modified the custom-instruction design.

What to turn in for Question 3

Provide a PDF that shows a screen shot of the benchmark program output. Personalize your screenshot: your github user ID or account ID must be visible. The output will contain text similar to the following (with ??? your actual HW Timing).

$ nios2-terminal.exe
nios2-terminal: connected to hardware target using JTAG UART on cable
nios2-terminal: "DE-SoC [USB-1]", device 2, instance 0
nios2-terminal: (Use the IDE stop button or Ctrl-C to terminate)

Timer Frequency 50000000
Comparison Errors (1K samples) 0
Software Timing (1K samples)  11344102
Hardware Timing (1K samples)  ????????

What to turn in

What to turn in for Question 1

Commit all Verilog that you wrote for interpolateci to your repository. We will test your verilog by running it on the testbench. If you modify the custom-instruction design (e.g. new instructions), make sure to provide a modified testbench.

What to turn in for Question 2

Provide a PDF that lists the following intermation, extracted from your synthesis reports. In addition, try to determine, as accurately as possible, how many of these resources are used by your custom instruction. A quick (but rough) way to do this is to remove the custom instruction and resynthesize the design.

  Overall Design My Custom Instruction
Number of block memory modules (RAM)
Number of ALMs
Number of registers
Number of DSP modules


What to turn in for Question 3

Provide a PDF that shows a screen shot of the benchmark program output. Personalize your screenshot: your github user ID or account ID must be visible. The output will contain text similar to the following (with ??? your actual HW Timing!).

$ nios2-terminal.exe
nios2-terminal: connected to hardware target using JTAG UART on cable
nios2-terminal: "DE-SoC [USB-1]", device 2, instance 0
nios2-terminal: (Use the IDE stop button or Ctrl-C to terminate)

Timer Frequency 50000000
Comparison Errors (1K samples) 0
Software Timing (1K samples)  11344102
Hardware Timing (1K samples)  ????????

Finally, make sure you push your repository back to github. Add all files that you have created and commit them the repository on github.

# clean your implementation

cd example-nios-ci
quartus_sh --clean exampleniostimer
cd ..

# push the result on github

git add *
git commit -m 'my homework 5 solution'
git push

Design Rubric

  • Question 1: 15 Points Total
    • Design interpolateci simulates correctly with interpolateticb: 10 points
    • Design source for interpolateci is hierarchical (contains submodules), uses proper comments, uses proper FSM design style: 3 points
    • Compact Design credit: designs using 0 or 1 data multiplier = 2 points, designes using 4 data multipliers or less = 1 point, designs using 16 data multipliers = 0 points.
  • Question 2: 10 Points Total
    • Design synthesizes into an sof that is functionally correct: 6 points.
    • PDF has a table with correct number of block memory modules: 1 point. (‘correct’ means: consistent with your synthesis reports)
    • PDF has a table with correct number of ALMs: 1 point. (‘correct’ means: consistent with your synthesis reports)
    • PDF has a table with correct number of registers: 1 point. (‘correct’ means: consistent with your synthesis reports)
    • PDF has a table with correct number of DSP modules: 1 point. (‘correct’ means: consistent with your synthesis reports)
  • Question 3: 5 Points Total
    • PDF has screen shot of benchmark program output: 4 points
    • PDF screen shot is identifiable to submitter: 1 point

Good Luck!