Homework 6 - Hardware I/O and Performance Evaluation using Cyclone V HPS
- Introduction
- Downloading and Compiling the Design
- Question: Determine the overhead of FPGA fabric access
- Report Contents
- What to turn in
- Design Rubric
Introduction
The purpose of this homework is to introduce the Hard Processor System on your DE1-SoC board. Before you can start with this assignment, you have to complete the following preparatory tasks.
- Download the Linux image for your DE1-SoC board
- Copy the image on an SD-card, and boot Linux on your board
- Configure the network on your DE1-SoC, so that you can access the system over the network
The steps to achieve this are explained in Lecture 16 and explained in Lecture 17. If you have not set up your board yet to run Linux, do that first before continuing this homework.
In this homework, you will receive a complete hardware implementation of a HPS based system that drives three PIO ports on the FPGA board. The PIO ports are connected to the HEX display, the red LEDs and the push buttons (KEY buttons). Your task will be to (a) synthesize the hardware, (b) download the resulting bitstream to the HPS and configure the FPGA, (c) write a small performance evaluation program and (d) analyze the output of this program.
You have to write a report as part of the Homework. The assignment writeup will include the label REPORT
to indicate questions that have to be answered
in the report. The questions are summarized at the end of the assignment writeup,
as well.
The command line in this assignment uses an environment identical to SoC Command Shell.
Downloading and Compiling the Design
- Accept the homework assignment from GitHub Classroom at https://classroom.github.com/a/RqQB4KAT
- This gives you a repository that you can clone to your laptop
git clone https://github.com/vt-ece4530-f19/homework-6-USERID
- Inspect the repository and identify the following directories
Directory | Purpose |
---|---|
example-hps-pio |
Hardware reference implementation |
software-test |
Sample software program |
software-q1 |
Software application for Question 1 |
software_rbfloader |
Utility program to download bitstreams on FPGA-SoC |
conversion |
Script to convert bitstreams into Raw Bitstream Format (rbf) |
-
Go the the
example-hps-pio
directory. Use the Quartus and Platform Designer to open theexamplehpspio
design, and theplatformhps
platform. -
REPORT
: Determine topology of the design and make a diagram of its main components. The diagram does not need to be overly detailed; buses can be drawn as wires and processors, memories, peripherals, .. can be drawn as blocks. However, make sure that the overall system architecture is apparent from the drawing. -
The platform contains three PIO ports. For each PIO port, find out its width, and find out what element of the DE1SoC board it connects to.
-
REPORT
Add a table in your report similar to the following.
PIO port | Width | Input or Output | Purpose |
---|---|---|---|
PIO_0 | number of bits | input or output | what board peripheral is driven by it |
PIO_1 | |||
PIO_2 |
-
You do not have to compile the hardware into a bitstream. You can use the following prebuilt bitstream: examplehpspio.sof.
-
Next, convert the bitstream from sof format to raw bitstream format (rbf). Go to the conversion subdirectory and type
make
. When the script comletes, you will find two files:examplehpspio.rbf
andhps_0.h
. The former is a version of the bitstream suitable for download from with the ARM on the FPGA. The latter is an include file used by software applications. It is similar in purpose to thesystem.h
file of Nios systems; it describes the main components from the software.
cd conversion
make
- Next, compile the software that is used to download the bitstream to the board.
cd software_rbfloader
make
- Next, compile the software application to will be used to test the hardware
cd software_test
make
-
You are now ready to run the sample application. Copy all of the following files to the board.
- The hardware bitstream
examplehpspio.rbf
- The bitstream loader
hps_config_fpga
- The software test application
piotest
- The hardware bitstream
Here is the command I used to do this in one go. In your case, you may have to use something slightly different, compatible with the network connection available on your DE1-SoC board.
$ scp -P50444 software_test/piotest \
conversion/examplehpspio.rbf \
software_rbfloader/hps_config_fpga \
root@172.29.39.72:/home/root
-
Now, log on to the DE1-SoC board.
-
On the DE1-SoC, download the bitstream as follows.
root@socfpga:~# ./hps_config_fpga examplehpspio.rbf
- On the DE1-SoC, run the program as follows.
root@socfpga:~# ./piotest
You should see the HEX display counting and a counter on the red LEDs. By pressing the leftmost key (KEY3), you can reverse the direction of the counter. By pressing the rightmost key (KEY0), you can make it increment again. If all that is working fine, you are now ready for the main question.
Question: Determine the overhead of FPGA fabric access
- Investigate the software application in
software_q1
. It contains a loop that writes 1000 values to the HEX display, as well as 1000 values to the main memory. The key section of that program is the following:
volatile unsigned long *h2p_lw_hex_addr=NULL;
// This function write to the HEX displays using a PIO configured in the FPGA
void printhex(unsigned j) {
*h2p_lw_hex_addr = j;
}
// This function is identical to the previous one, but writes to a global variable in memory
volatile unsigned memhex;
void printhexmem(unsigned j) {
memhex = j;
}
#define MEASUREMENTS 1000
int main(int argc, char **argv) {
...
// initialize pointer to PIO port
h2p_lw_hex_addr = virtual_base +
( ( unsigned long )( ALT_LWFPGASLVS_OFST + PIO_1_BASE ) & ( unsigned long)( HW_REGS_MASK ) );
// main measurement loop
for (i = 0; i<MEASUREMENTS; i++) {
printhex(i);
printhexmem(i);
}
...
}
-
The question is the following: determine, as accurately as possible, the performance difference between a call to printhex and a call to printhexmem
-
You need to refer to Lecture 17 and the example
https://github.com/vt-ece4530-f19/example-hps-hello
to see how to measure time. -
Keep in mind the key points of Lecture 5: You have to handle the accuracy errors (overhead of timekeeping) as well as the precision errors (variations during measurement).
-
You have to make the comparison using two types of measurement. First, as a CPU cycle count (
PERF_COUNT_HW_CPU_CYCLES
). Next, as a CPU instruction count (PERF_COUNT_HW_INSTRUCTIONS
). Again, refer to Lecture 17 for a discussion on performance measurement using the 64-bit timer on the ARM. -
Refer to the assembly listing
hw6q1.lst
, which will be generated after you build the software insoftware_q1
. The assembly listing will show you exactly what instructions go intoprinthex
andprinthexmem
. -
REPORT
In your report, present the analysis of your performance comparison. Note that I am looking for a narrative, not a shortlist of numbers. Your analysis should contain at least the following elements.- Cycle count for
printhex
and cycle count forprinthexmem
- Causes of the possible cycle count difference
- Instruction count for
printhex
andprinthexmem
- Causes for the possible instruction count difference.
- Impact of compiler optimization (
-O3
flag added to theCFLAGS
macro in the Makefile)
- Cycle count for
Report Contents
-
REPORT
: Determine topology of the design and make a diagram of its main components. The diagram does not need to be overly detailed; buses can be drawn as wires and processors, memories, peripherals, .. can be drawn as blocks. However, make sure that the overall system architecture is apparent from the drawing. -
REPORT
Add a table in your report similar to the following.
PIO port | Width | Input or Output | Purpose |
---|---|---|---|
PIO_0 | number of bits | input or output | what board peripheral is driven by it |
PIO_1 | |||
PIO_2 |
-
REPORT
In your report, present the analysis of your performance comparison. Note that I am looking for a narrative, not a shortlist of numbers. Your analysis should contain at least the following elements.- Cycle count for
printhex
and cycle count forprinthexmem
- Causes of the possible cycle count difference
- Instruction count for
printhex
andprinthexmem
- Causes for the possible instruction count difference.
- Impact of compiler optimization (
-O3
flag added to theCFLAGS
macro in the Makefile)
- Cycle count for
What to turn in
Write your report as a PDF file, and add it to the root of your repository.
In addition push your modifications to software_q1
to the repository.
# add your report
git add myreport.pdf
# clean your implementation
cd example-hps-pio
quartus_sh --clean examplehpspio
cd ..
# push the result on github
git add *
git commit -m 'my homework 6 solution'
git push
Design Rubric
- Topology Question: 4 points
- Drawing contains all modules (2 points)
- Modules are properly labeled (1 point)
- System architecture shows off-chip elements (memories, switches, etc) (1 point)
- PIO Table Question: 3 points
- Width, I/O designation, Purpose is correct (1 point per row)
- Performance Analysis Question: 13 points
- Accuracy of cycle measurement is correctly estimated (2 points)
- Precision of cycle measurement is correctly estimated (2 points)
- Cycle count difference between
printhex
andprinthexmem
is correct (2 points) - Cycle count difference is correctly analyze by reference to assembly listing and architecture of the design (2 points)
- Instruction count difference between
printhex
andprinthexmem
is correct (2 points) - Impact of compiler optimization on cycle count is analyzed (2 points)
- Impact of compiler optimization on instruction count is analyzed (1 point)