Lecture 17 - FPGA SoC 2
Introduction
Today we continue the study of the FPGA-SoC platform. The key points of previous lecture were the following.
-
There are two 800 MHz Cortex A9 (ARMv7) cores integrated into the Cyclone V FPGA. These cores, together with a large collection of peripherals are referred to as Hard Processor System (HPS). HPS is fully self-contained and can boot its own operating system.
-
The FPGA fabric is attached to the HPS through a set of bus bridges plus an FPGA controller module. The bus bridges will enable custom hardware modules to use the same memory-mapped model that we have discussed with MSP-430 and Nios-II, with one important difference.
-
The difference is that the operating system changes the memory-mapping hardware components. The OS isolates the hardware from the (application) software in two aspects. First, the applications execute in their own virtual memory space. The memory addresses referenced by the application do not correspond to the physical memory addresses applied to the hardware. Second, the applications do not have full privilege over the hardware. The applications have to access the hardware through the Linux kernel.
-
There are two solutions to access hardware from within the application. The first is to go through the intermediary of the kernel by means of a device driver. The device driver provides a generic access API (open, close, read, write) that is mapped to the specific hardware module at hand. The second solution is to map a section of physical memory directly into the virtual memory space of the application.
Today we will discuss an example of the device driver mechanism. We will use a device driver to access an accelerometer chip attached to the HPS.
I assume that you are able to boot the Linux kernel on your DE1-SoC, and that you have installed the ARM cross-compiler tools. Refer to Lecture 16 for information on installing Linux, and refer to the Software Installation Guidelines.
Looking around in Linux on DE1-SoC
After logging into the board, it is helpful to look around to assess the capabilities you have available. First, let’s check the memory available.
Memory Resources
The free
command lists the memory usage in Kilobytes. We see that, after
booting, about 18M of the memory are used, while almost a full gigabyte
is available.
root@socfpga:~# free
total used free shared buffers
Mem: 1031824 18860 1012964 0 1180
-/+ buffers: 17680 1014144
Swap: 0 0 0
1 GB is a lot; it is 256M integers. However, it is not infinite. So we can write a program that consumes all of the available memory. The free command shows that there is no swap configured on this system, which means that 1G is a hard limit. The following program allocates a large array, fills it with data and then waits for you to stop it (with ctrl-c).
#include <stdio.h>
#include <stdlib.h>
#define BOUND 250000000
int a[BOUND];
int main() {
unsigned i;
printf("Hello, World\n");
for (i=0; i<BOUND; i++)
a[i] = i;
while (1) ;
return 0;
}
If you compile, download and run the program, you’ll see that free is gradually reducing
up until almost everything is consumed. This doesn’t happen instantaneously; the kernel virtual memory system allocates physical memory on behalf of the program, and as long as a program does not references a virtual memory address, the virtual memory management will not allocate physical memory for it. Note how the program is started (with an ampersand, to run it in background) and how it is aborted (first recalled using fg
and then killed
with ctrl-c).
root@socfpga:~# ./hello&
[1] 481
Hello, World
root@socfpga:~# free
total used free shared buffers
Mem: 1031824 356708 675116 0 1328
-/+ buffers: 355380 676444
Swap: 0 0 0
root@socfpga:~# free
total used free shared buffers
Mem: 1031824 877508 154316 0 1328
-/+ buffers: 876180 155644
Swap: 0 0 0
root@socfpga:~# free
total used free shared buffers
Mem: 1031824 997168 34656 0 1328
-/+ buffers: 995840 35984
Swap: 0 0 0
root@socfpga:~# fg
./hello
If the integer array becomes too large, the kernel will abort the program. For example, if you change the constant BOUND
to 280000000, you will see the following message when running the program.
root@socfpga:~# ./hello
Killed
This illustrates a first difference with the previous bare metal (non-virtual-memory) approach. If you compile a program without virtual memory support, the linker will know at compile time if the program is too big to fit in memory. When running on a linux system, the compiler does not know how much real memory will be available. Therefore, the linker will not throw an error if you allocate an array that would be too large to fit in memory. Instead, the virtual memory of the operating system will throw an exception at runtime if you have used up all available memory.
Hardware Resources
From the command line prompt, you may also learn about the hardware
resources in the system. The /proc/
is a virtual directory tree
that contains lots of information on the runtime state of the linux system.
The first of these is the cpuinfo
file, which contains a list of the processors
and their capabilities. There are several other *info
fields that
document the status of the memory subsystem.
root@socfpga:~# cat /proc/cpuinfo
processor : 0
model name : ARMv7 Processor rev 0 (v7l)
Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x3
CPU part : 0xc09
CPU revision : 0
processor : 1
model name : ARMv7 Processor rev 0 (v7l)
Features : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x3
CPU part : 0xc09
CPU revision : 0
Hardware : Altera SOCFPGA
Revision : 0000
Serial : 0000000000000000
Another relevant data structure is the /proc/devices
file, which lists the
character
and block
devices available in the system. The difference between
a character and a block devices lies in the manner in which the software interacts
with each of them. A character device is used for data streams. The software can
read or write an infinite stream of characters to a character device with the need
to jump backward or forward in the stream. A block device is used for files that
need random access. The software can open a file and read it sequentially or else
jump back and forward in the devices. For the purpose of this lecture, an important
device is the i2c device driver, which is of the character type.
root@socfpga:~# cat /proc/devices
Character devices:
1 mem
2 pty
3 ttyp
4 /dev/vc/0
4 tty
4 ttyS
5 /dev/tty
5 /dev/console
5 /dev/ptmx
7 vcs
10 misc
13 input
89 i2c
90 mtd
128 ptm
136 pts
153 spi
180 usb
189 usb_device
252 ptp
253 pps
254 fpga
Block devices:
1 ramdisk
259 blkext
8 sd
31 mtdblock
65 sd
66 sd
67 sd
68 sd
69 sd
70 sd
71 sd
128 sd
129 sd
130 sd
131 sd
132 sd
133 sd
134 sd
135 sd
179 mmc
Software Activity
To find what processes are running on the OS, use busybox ps
.
This reveals that there is a webserver running on the board (lighthttpd).
To connect, you can point your browser to board_ip_address/index.html
,
where board_ip_address
is the IP address used by the board.
A dynamically updated view may be obtained using top
.
root@socfpga:~# ps
PID USER VSZ STAT COMMAND
1 root 1316 S init [5]
2 root 0 SW [kthreadd]
3 root 0 SW [ksoftirqd/0]
4 root 0 SW [kworker/0:0]
5 root 0 SW< [kworker/0:0H]
7 root 0 SW [migration/0]
8 root 0 SW [rcu_bh]
9 root 0 SW [rcu_sched]
10 root 0 SW [migration/1]
11 root 0 SW [ksoftirqd/1]
12 root 0 SW [kworker/1:0]
13 root 0 SW< [kworker/1:0H]
14 root 0 SW< [khelper]
15 root 0 SW [kdevtmpfs]
16 root 0 SW< [netns]
17 root 0 SW< [writeback]
18 root 0 SW< [bioset]
19 root 0 SW< [kblockd]
20 root 0 SW [khubd]
21 root 0 SW< [rpciod]
22 root 0 SW [kworker/1:1]
23 root 0 SW [khungtaskd]
24 root 0 SW [kswapd0]
25 root 0 SW [fsnotify_mark]
26 root 0 SW< [nfsiod]
27 root 0 SW [kworker/u4:1]
32 root 0 SW< [ff705000.spi]
35 root 0 SW< [fff01000.spi]
40 root 0 SW< [kpsmoused]
41 root 0 SW [kworker/0:1]
42 root 0 SW< [dw-mci-card]
43 root 0 SW [mmcqd/0]
44 root 0 SW< [dwc2]
45 root 0 SW< [deferwq]
46 root 0 SW [kjournald]
123 daemon 1460 S /sbin/portmap
142 root 3592 S /usr/sbin/sshd
146 root 1660 S /sbin/syslogd -n -O /var/log/messages
149 root 1660 S /sbin/klogd -n
153 root 1964 S /usr/sbin/lighttpd -f /etc/lighttpd.conf
157 root 10628 S /www/pages/cgi-bin/scroll_server
172 root 1564 S /sbin/getty 115200 ttyS0
173 root 1564 S /sbin/getty 38400 tty1
189 root 5948 R {sshd} sshd: root@pts/0
191 root 3640 S sshd: root@notty
192 root 2612 S -sh
194 root 3324 S /usr/libexec/sftp-server
240 root 0 SW [kworker/u4:2]
248 root 1948 R ps
busybox
is an application which combines a large number of traditional
linux utilities in a single binary.
root@socfpga:~# busybox
BusyBox v1.20.2 (2013-09-27 23:27:54 CDT) multi-call binary.
Copyright (C) 1998-2011 Erik Andersen, Rob Landley, Denys Vlasenko
and others. Licensed under GPLv2.
See source distribution for full notice.
Usage: busybox [function] [arguments]...
or: busybox --list
or: function [arguments]...
BusyBox is a multi-call binary that combines many common Unix
utilities into a single executable. Most people will create a
link to busybox for each function they wish to use and BusyBox
will act like whatever it was invoked as.
Currently defined functions:
[, [[, ar, ash, awk, basename, bunzip2, bzcat, cat, chattr, chgrp, chmod, chown, chroot, chvt, clear,
cmp, cp, cpio, cut, date, dc, dd, deallocvt, df, diff, dirname, dmesg, dnsdomainname, dpkg-deb, du,
dumpkmap, dumpleases, echo, egrep, env, expr, false, fbset, fdisk, fgrep, find, flock, free, fsck,
fsck.minix, fuser, grep, groups, gunzip, gzip, halt, head, hexdump, hostname, hwclock, id, ifconfig,
ifdown, ifup, insmod, ip, kill, killall, klogd, less, ln, loadfont, loadkmap, logger, logname, logread,
losetup, ls, lsmod, md5sum, microcom, mkdir, mkfifo, mkfs.minix, mknod, mkswap, mktemp, modprobe, more,
mount, mv, nc, netstat, nohup, nslookup, od, openvt, patch, pidof, ping, ping6, pivot_root, poweroff,
printf, ps, pwd, rdate, readlink, realpath, reboot, renice, reset, rm, rmdir, rmmod, route, run-parts,
sed, seq, setconsole, sh, sleep, sort, start-stop-daemon, strings, stty, swapoff, swapon, switch_root,
sync, sysctl, syslogd, tail, tar, tee, telnet, test, tftp, time, top, touch, tr, traceroute, true, tty,
udhcpc, udhcpd, umount, uname, uniq, unzip, uptime, users, usleep, vi, watch, wc, wget, which, who,
whoami, xargs, yes, zcat
Finally, a useful diagnosis tool for linux activity is the message log,
collected in /var/log/messages
. If the kernel runs into specific
issues or exceptions, or if the hardware experiences faults, then the
message log will often contain a relevant warning.
I2C device driver
The linux kernel on the DE1-SoC contains an I2C driver to access the I2C peripheral in the HPS. The DE1-SoC contains an ADXL345 accelerometer chip, attached to the I2C bus. If an application on the HPS needs to use the ADXL345 accelerometer chip, then it needs to use the I2C device driver.
The device driver is visible in the linux file system as a dedicate file
in the /dev
directory. There are two i2c peripherals in the HPS, and each
has its own I2C driver, /dev/i2c-0
and /dev/i2c-1
respectively.
root@socfpga:~# ls /dev
bus ptyp3 tty18 tty43 ttyS1
console ptyp4 tty19 tty44 ttyp0
cpu_dma_latency ptyp5 tty2 tty45 ttyp1
fpga0 ptyp6 tty20 tty46 ttyp2
full ptyp7 tty21 tty47 ttyp3
i2c-0 ptyp8 tty22 tty48 ttyp4
i2c-1 ptyp9 tty23 tty49 ttyp5
initctl ptypa tty24 tty5 ttyp6
input ptypb tty25 tty50 ttyp7
kmem ptypc tty26 tty51 ttyp8
kmsg ptypd tty27 tty52 ttyp9
log ptype tty28 tty53 ttypa
mem ptypf tty29 tty54 ttypb
mmcblk0 ram0 tty3 tty55 ttypc
mmcblk0p1 ram1 tty30 tty56 ttypd
mmcblk0p2 random tty31 tty57 ttype
mmcblk0p3 spidev1.0 tty32 tty58 ttypf
mtab tty tty33 tty59 urandom
network_latency tty0 tty34 tty6 vcs
network_throughput tty1 tty35 tty60 vcs1
null tty10 tty36 tty61 vcsa
psaux tty11 tty37 tty62 vcsa1
ptmx tty12 tty38 tty63 watchdog
ptp0 tty13 tty39 tty7 zero
pts tty14 tty4 tty8
ptyp0 tty15 tty40 tty9
ptyp1 tty16 tty41 ttyLCD0
ptyp2 tty17 tty42 ttyS0
The Linux distribution also provides a set of utilities that can access
the I2C port directly from the command line. They are i2cdetect
, i2cdump
,
i2cget
and i2cset
.
First, let’s remind the basic concepts of the I2C protocol. On an I2C bus there are master and slave devices. The slave devices all have a 7-bit slave address that needs to be used for every read or write transfer. A read or write transfer can be single-byte or multi-byte. Each byte is acknowledged after each transfer.
Figure: Detailed signaling on an I2C bus
Figure: I2C write operation (master to slave)
Figure: I2C read operation (slave to master)
Most chips supporting an I2C interface, including the ADXL345, define a layer of indexed registers on top of the I2C transfer protocol. The registers on the chip can be logically addressed using an extra byte in each I2C transfer.
-
To write to chip register N, the I2C protocol performs a write to the I2C slave of two bytes: the value N, followed by the value to write in register N.
-
To read from chip register N, the I2C protocol performs a write to the I2C slave of a byte with value N, followed by a read from the I2C slave of a byte.
The following tools are available on the board to directly access the I2C interface. The manual pages should be consulted for command line usage.
Tool | Purpose |
---|---|
i2cdetect |
Find chips on an I2C bus |
i2cdump |
Scan chip registers on an I2C bus |
i2cset |
Write bytes on an I2C bus |
i2cget |
Read bytes from an I2C bus |
The following are some examples. Notice the warning message.
# list the I2C devices
i2cdetect -l
i2c-0 i2c Synopsys DesignWare I2C adapter I2C adapter
i2c-1 i2c Synopsys DesignWare I2C adapter I2C adapter
# Read the device ID (register index 0) from I2C bus i2c-0,
# I2C slave address 0x53
# First, set the register index
root@socfpga:~# i2cset -y 0 0x53 0x0
# Then, read the byte
root@socfpga:~# i2cget -y 0 0x53
0xe5
# Read 6 bytes from I2C bus i2c-0, I2C slave address 0x53,
# starting at register index 0x32
root@socfpga:~# i2cdump -r 0x32-0x37 0 0x53
No size specified (using byte-data access)
WARNING! This program can confuse your I2C bus, cause data loss and worse!
I will probe file /dev/i2c-0, address 0x53, mode byte
Probe range limited to 0x32-0x37.
Continue? [Y/n] y
0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef
30: 00 00 00 00 00 00 ......
In the following, we will use the I2C device driver to access the ADXL345 accelerometer on the chip.
Example 2: Accelerometer
The ADXL345 accelerometer chip is integrated on the DE1-SoC on the I2C port driven by the HPS.
The challenge for the software running on Linux is that the I2C peripheral requires control of memory-mapped registers. The OS provides a device driver instead that accesses the I2C peripheral on behalf of the application.
The hardware is visible in Linux through a series of devices,
accessible on the /dev
directory of the file system. There are
two I2C devices on the Linux running on DE1-SoC:
root@socfpga:~# ls /dev/i2c*
/dev/i2c-0 /dev/i2c-1
These devices can be opened as files and read/written by byte streams,
using open
, close
, write
, read
and ioctl
system library calls on the
Linux OS.
The I2C slave addressing of the device is integrated into the i2c device driver. The user does not need to deal with I2C device addressing, and I2C master/slave handshakes. The user only handles the byte streams to and from the ADXL345.
Figure: Accessing the ADXL345 registers through I2C device driver
To access the ADXL345, one opens the device as a file, and sets the device to control a slave of a selected address. In this case, we use the slave address for the ADXL345 (0x53).
int file;
const char *filename = "/dev/i2c-0";
uint8_t id;
bool bSuccess;
const int mg_per_digi = 4;
uint16_t szXYZ[3];
int cnt=0, max_cnt=0;
// open bus
if ((file = open(filename, O_RDWR)) < 0) {
perror("Failed to open the i2c bus of gsensor");
exit(1);
}
// init
// gsensor i2c address: 101_0011
int addr = 0b01010011;
if (ioctl(file, I2C_SLAVE, addr) < 0) {
printf("Failed to acquire bus access and/or talk to slave.\n");
exit(1);
}
Figure: I2C address of the ADXL345 (from the datasheet)
Once the file handle of the I2C device is opened, one can simple read and write bytes to it. The device driver will take care of formatting the data according to the proper I2C format. The ADXL345 is accessed using single-byte read/write and multi-byte read operations.
bool ADXL345_REG_WRITE(int file, uint8_t address, uint8_t value){
bool bSuccess = false;
uint8_t szValue[2];
// write to define register
szValue[0] = address;
szValue[1] = value;
if (write(file, &szValue, sizeof(szValue)) == sizeof(szValue)){
bSuccess = true;
}
return bSuccess;
}
bool ADXL345_REG_READ(int file, uint8_t address,uint8_t *value){
bool bSuccess = false;
uint8_t Value;
// write to define register
if (write(file, &address, sizeof(address)) == sizeof(address)){
// read back value
if (read(file, &Value, sizeof(Value)) == sizeof(Value)){
*value = Value;
bSuccess = true;
}
}
return bSuccess;
}
bool ADXL345_REG_MULTI_READ(int file, uint8_t readaddr,uint8_t readdata[], uint8_t len){
bool bSuccess = false;
// write to define register
if (write(file, &readaddr, sizeof(readaddr)) == sizeof(readaddr)){
// read back value
if (read(file, readdata, len) == len){
bSuccess = true;
}
}
return bSuccess;
}
Figure: I2C access of the ADXL345 (from the datasheet). The read boxes contain data bytes provides as input to the device driver; the device driver handles the I2C protocol.
The ADXL345 registers of interest can be consulted from the I2C datasheet. For example, consider the reading of XYZ acceleration data, which is a multi-byte read operation. In the C application, this is programmed as follows:
bool ADXL345_XYZ_Read(int file, uint16_t szData16[3]){
bool bPass;
uint8_t szData8[6];
bPass = ADXL345_REG_MULTI_READ(file, 0x32, (uint8_t *)&szData8, sizeof(szData8));
if (bPass){
szData16[0] = (szData8[1] << 8) | szData8[0];
szData16[1] = (szData8[3] << 8) | szData8[2];
szData16[2] = (szData8[5] << 8) | szData8[4];
}
return bPass;
}
Hence, this is a multi-read starting at ADXL register address 0x32. In total, 6 bytes are read from the I2C slave. From the ADXL345 register map, shown in the data sheet, we can find that these registers contain the acceleration data for the X, Y and Z direction.
Figure: Register map of the ADXL345. The acceleration registers are located starting at address 0x32.
Similarly, there is an ADXL345 initialization function, which enables the chip and configures it using I2C programming:
bool ADXL345_Init(int file){
bool bSuccess;
// +- 2g range, 10 bits
bSuccess = ADXL345_REG_WRITE(file,
ADXL345_REG_DATA_FORMAT, XL345_RANGE_2G | XL345_FULL_RESOLUTION);
//Output Data Rate: 50Hz
if (bSuccess){
bSuccess = ADXL345_REG_WRITE(file,
ADXL345_REG_BW_RATE, XL345_RATE_50); // 50 HZ
}
//INT_Enable: Data Ready
if (bSuccess){
bSuccess = ADXL345_REG_WRITE(file,
ADXL345_REG_INT_ENALBE, XL345_DATAREADY);
}
// stop measure
if (bSuccess){
bSuccess = ADXL345_REG_WRITE(file,
ADXL345_REG_POWER_CTL, XL345_STANDBY);
}
// start measure
if (bSuccess){
bSuccess = ADXL345_REG_WRITE(file,
ADXL345_REG_POWER_CTL, XL345_MEASURE);
}
return bSuccess;
}
The overall application can be compiled and run just as the hello world application.
cd hps_gsensor
make
# the following commands will appearL
arm-linux-gnueabihf-gcc \
-g -Wall -Dsoc_cv_av \
-Ic:/intelFPGA_lite/18.1/embedded/ip/altera/hps/altera_hps/hwlib/include/soc_cv_av \
-Ic:/intelFPGA_lite/18.1/embedded/ip/altera/hps/altera_hps/hwlib/include/ \
-c main.c -o main.o
arm-linux-gnueabihf-gcc \
-g -Wall -Dsoc_cv_av \
-Ic:/intelFPGA_lite/18.1/embedded/ip/altera/hps/altera_hps/hwlib/include/soc_cv_av \
-Ic:/intelFPGA_lite/18.1/embedded/ip/altera/hps/altera_hps/hwlib/include/ \
-c ADXL345.c -o ADXL345.o
arm-linux-gnueabihf-gcc \
-g -Wall \
main.o ADXL345.o -o gsensor
# copy the executable to the board:
scp -P 50444 gsensor root@172.29.42.57:/home/root
# In a terminal connected to HPS, run the application
./gsensor
# the following output will appear:
root@socfpga:~# ./gsensor
===== gsensor test =====
id=E5h
[1]X=-16 mg, Y=-56 mg, Z=1008 mg
[2]X=-16 mg, Y=-68 mg, Z=1060 mg
[3]X=-8 mg, Y=-60 mg, Z=1052 mg
[4]X=-16 mg, Y=-64 mg, Z=1056 mg
[5]X=-12 mg, Y=-68 mg, Z=1064 mg
[6]X=-16 mg, Y=-64 mg, Z=1060 mg
...
We can access the accelerometer also using the I2C-tools discussed
above. Let’s recap the I2C programming sequence done by the gsensor
application and formulate that sequence using the i2c-tools
package.
- Set the resolution and data format to +-2G (write 0x8 to ADXL345 register 0x31)
- Set the conversion rate to 50Hz (write 0x9 to ADXL345 register 0x2C)
- Set the power control register to measure (write 0x8 to register 0x2D)
- Read the conversion result (read 6 bytes from register 0x32 and following)
root@socfpga:~# i2cset -y 0 0x53 0x31 0x8
root@socfpga:~# i2cset -y 0 0x53 0x2c 0x9
root@socfpga:~# i2cset -y 0 0x53 0x2d 0x8
root@socfpga:~# i2cdump -r 0x32-0x37 0 0x53
No size specified (using byte-data access)
WARNING! This program can confuse your I2C bus, cause data loss and worse!
I will probe file /dev/i2c-0, address 0x53, mode byte
Probe range limited to 0x32-0x37.
Continue? [Y/n] y
0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef
30: fe ff ed ff 09 01 ?.?.??
Measuring Performance
Another important aspect of low-level interfacing and hardware/software codesign is the measurement of time. The ARM provides a low-level performance measurement infrastructure in the form of a 64-bit continuous timer. This timer can be accessed in the Linux Performance Monitoring API, perf_event_open.
The access mechanism to the performance counter infrastructure is through
a device driver accessed through a system call. The system call returns
a file descriptor, which can be used to read the current timestamp.
The following code creates functions to initialize the counter
infrastructure. The functions have constructor
and destructor
attributes, which are called when a program first starts
resp. when the program completes. The cpucycles()
function can
be called to read the timestamp counter.
static int fddev = -1;
__attribute__((constructor)) static void init(void) {
static struct perf_event_attr attr;
attr.type = PERF_TYPE_HARDWARE;
attr.config = PERF_COUNT_HW_CPU_CYCLES;
fddev = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0);
}
__attribute__((destructor)) static void fini(void) {
close(fddev);
}
static inline long long cpucycles(void) {
long long result = 0;
if (read(fddev, &result, sizeof(result)) < sizeof(result)) return 0;
return result;
}
Time measurement now proceeds as before, by calling cpucycles
. However,
the linux processing environment adds a significant level of overhead
and noise. Hence, both the accuracy and the precision of the time
measurement suffer. The following is an example program.
int main() {
unsigned i, k;
volatile unsigned j;
long long time_start = 0;
long long time_end = 0;
printf("time overhead = ");
for (k=0; k<10; k++) {
time_start = cpucycles();
time_end = cpucycles();
printf("%llu ", time_end - time_start);
}
printf("\ntime delta = ");
for (k=0; k<10; k++) {
time_start = cpucycles();
j=0;
for (i=0; i<100000; i++)
j += i;
time_end = cpucycles();
printf("%llu ", time_end - time_start);
}
printf("\n");
return 0;
}
It can be compiled and copied to the board as follows:
$make
arm-linux-gnueabihf-gcc -g -Wall -Dsoc_cv_av -I/ip/altera/hps/altera_hps/hwlib/include/soc_cv_av -I/ip/altera/hps/altera_hps/hwlib/include/ -c main.c -o main.o
arm-linux-gnueabihf-gcc -g -Wall main.o -o cyclecount
arm-linux-gnueabihf-objdump -D cyclecount > cyclecount.lst
# replace IP_ADDRESS_OF_BOARD below with the IP address of your board
$ scp cyclecount root@IP_ADDRESS_OF_BOARD:/home/root
root@IP_ADDRESS_OF_BOARD's password:
cyclecount 100% 10KB 716.8KB/s 00:00
The output of this program on the board is as follows:
time overhead = 2769 2104 1473 1442 1453 1466 1464 1414 1432 1403
time delta = 932513 901704 901538 901444 901446 901466 901486 901479 921746 901588
There are several causes for the variation, including caching, scheduling within the Linux kernel, and overhead from hardware events. The overhead is significant (1400 cycles) and the noise is a couple percentage points (eg 20000 cycles on 900000 cycles). Therefore, when measuring performance in an environment like this we will measure the steady state by repeating the measurements ten times. Furthermore, we will take the median of the set of collected measurements, so that we are likely to measure a ‘typical’ case.
Instead of counting cycles, we can measure various other features:
- PERF_COUNT_HW_INSTRUCTIONS (Retired instructions)
time overhead = 885 876 837 812 815 807 813 814 812 814
time delta = 1200896 1200847 1200838 1200838 1200833 1205092 1200839 1200836 1200838 1200836
- PERF_COUNT_HW_CACHE_MISSES (Last level cache misses)
time overhead = 4 1 0 0 0 0 0 0 0 0
time delta = 2 0 0 0 0 108 0 0 0 0
As well as various other performance counters as described in the perf_event_open manpage.