Introduction

Today we continue the study of the FPGA-SoC platform. The key points of previous lecture were the following.

  • There are two 800 MHz Cortex A9 (ARMv7) cores integrated into the Cyclone V FPGA. These cores, together with a large collection of peripherals are referred to as Hard Processor System (HPS). HPS is fully self-contained and can boot its own operating system.

  • The FPGA fabric is attached to the HPS through a set of bus bridges plus an FPGA controller module. The bus bridges will enable custom hardware modules to use the same memory-mapped model that we have discussed with MSP-430 and Nios-II, with one important difference.

  • The difference is that the operating system changes the memory-mapping hardware components. The OS isolates the hardware from the (application) software in two aspects. First, the applications execute in their own virtual memory space. The memory addresses referenced by the application do not correspond to the physical memory addresses applied to the hardware. Second, the applications do not have full privilege over the hardware. The applications have to access the hardware through the Linux kernel.

  • There are two solutions to access hardware from within the application. The first is to go through the intermediary of the kernel by means of a device driver. The device driver provides a generic access API (open, close, read, write) that is mapped to the specific hardware module at hand. The second solution is to map a section of physical memory directly into the virtual memory space of the application.

Today we will discuss an example of the device driver mechanism. We will use a device driver to access an accelerometer chip attached to the HPS.

I assume that you are able to boot the Linux kernel on your DE1-SoC, and that you have installed the ARM cross-compiler tools. Refer to Lecture 16 for information on installing Linux, and refer to the Software Installation Guidelines.

Looking around in Linux on DE1-SoC

After logging into the board, it is helpful to look around to assess the capabilities you have available. First, let’s check the memory available.

Memory Resources

The free command lists the memory usage in Kilobytes. We see that, after booting, about 18M of the memory are used, while almost a full gigabyte is available.

root@socfpga:~# free
             total         used         free       shared      buffers
Mem:       1031824        18860      1012964            0         1180
-/+ buffers:              17680      1014144
Swap:            0            0            0

1 GB is a lot; it is 256M integers. However, it is not infinite. So we can write a program that consumes all of the available memory. The free command shows that there is no swap configured on this system, which means that 1G is a hard limit. The following program allocates a large array, fills it with data and then waits for you to stop it (with ctrl-c).

#include <stdio.h>
#include <stdlib.h>

#define BOUND 250000000

int a[BOUND];

int main() {
   unsigned i;
   printf("Hello, World\n");
   for (i=0; i<BOUND; i++)
   	a[i] = i;
   while (1) ;
   return 0;
}

If you compile, download and run the program, you’ll see that free is gradually reducing up until almost everything is consumed. This doesn’t happen instantaneously; the kernel virtual memory system allocates physical memory on behalf of the program, and as long as a program does not references a virtual memory address, the virtual memory management will not allocate physical memory for it. Note how the program is started (with an ampersand, to run it in background) and how it is aborted (first recalled using fg and then killed with ctrl-c).

root@socfpga:~# ./hello&
[1] 481
Hello, World
root@socfpga:~# free
             total         used         free       shared      buffers
Mem:       1031824       356708       675116            0         1328
-/+ buffers:             355380       676444
Swap:            0            0            0
root@socfpga:~# free
             total         used         free       shared      buffers
Mem:       1031824       877508       154316            0         1328
-/+ buffers:             876180       155644
Swap:            0            0            0
root@socfpga:~# free
             total         used         free       shared      buffers
Mem:       1031824       997168        34656            0         1328
-/+ buffers:             995840        35984
Swap:            0            0            0
root@socfpga:~# fg
./hello

If the integer array becomes too large, the kernel will abort the program. For example, if you change the constant BOUND to 280000000, you will see the following message when running the program.

root@socfpga:~# ./hello
Killed

This illustrates a first difference with the previous bare metal (non-virtual-memory) approach. If you compile a program without virtual memory support, the linker will know at compile time if the program is too big to fit in memory. When running on a linux system, the compiler does not know how much real memory will be available. Therefore, the linker will not throw an error if you allocate an array that would be too large to fit in memory. Instead, the virtual memory of the operating system will throw an exception at runtime if you have used up all available memory.

Hardware Resources

From the command line prompt, you may also learn about the hardware resources in the system. The /proc/ is a virtual directory tree that contains lots of information on the runtime state of the linux system. The first of these is the cpuinfo file, which contains a list of the processors and their capabilities. There are several other *info fields that document the status of the memory subsystem.

root@socfpga:~# cat /proc/cpuinfo
processor       : 0
model name      : ARMv7 Processor rev 0 (v7l)
Features        : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x3
CPU part        : 0xc09
CPU revision    : 0

processor       : 1
model name      : ARMv7 Processor rev 0 (v7l)
Features        : swp half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x3
CPU part        : 0xc09
CPU revision    : 0

Hardware        : Altera SOCFPGA
Revision        : 0000
Serial          : 0000000000000000

Another relevant data structure is the /proc/devices file, which lists the character and block devices available in the system. The difference between a character and a block devices lies in the manner in which the software interacts with each of them. A character device is used for data streams. The software can read or write an infinite stream of characters to a character device with the need to jump backward or forward in the stream. A block device is used for files that need random access. The software can open a file and read it sequentially or else jump back and forward in the devices. For the purpose of this lecture, an important device is the i2c device driver, which is of the character type.

root@socfpga:~# cat /proc/devices
Character devices:
  1 mem
  2 pty
  3 ttyp
  4 /dev/vc/0
  4 tty
  4 ttyS
  5 /dev/tty
  5 /dev/console
  5 /dev/ptmx
  7 vcs
 10 misc
 13 input
 89 i2c
 90 mtd
128 ptm
136 pts
153 spi
180 usb
189 usb_device
252 ptp
253 pps
254 fpga

Block devices:
  1 ramdisk
259 blkext
  8 sd
 31 mtdblock
 65 sd
 66 sd
 67 sd
 68 sd
 69 sd
 70 sd
 71 sd
128 sd
129 sd
130 sd
131 sd
132 sd
133 sd
134 sd
135 sd
179 mmc

Software Activity

To find what processes are running on the OS, use busybox ps. This reveals that there is a webserver running on the board (lighthttpd). To connect, you can point your browser to board_ip_address/index.html, where board_ip_address is the IP address used by the board. A dynamically updated view may be obtained using top.

root@socfpga:~# ps
  PID USER       VSZ STAT COMMAND
    1 root      1316 S    init [5]
    2 root         0 SW   [kthreadd]
    3 root         0 SW   [ksoftirqd/0]
    4 root         0 SW   [kworker/0:0]
    5 root         0 SW<  [kworker/0:0H]
    7 root         0 SW   [migration/0]
    8 root         0 SW   [rcu_bh]
    9 root         0 SW   [rcu_sched]
   10 root         0 SW   [migration/1]
   11 root         0 SW   [ksoftirqd/1]
   12 root         0 SW   [kworker/1:0]
   13 root         0 SW<  [kworker/1:0H]
   14 root         0 SW<  [khelper]
   15 root         0 SW   [kdevtmpfs]
   16 root         0 SW<  [netns]
   17 root         0 SW<  [writeback]
   18 root         0 SW<  [bioset]
   19 root         0 SW<  [kblockd]
   20 root         0 SW   [khubd]
   21 root         0 SW<  [rpciod]
   22 root         0 SW   [kworker/1:1]
   23 root         0 SW   [khungtaskd]
   24 root         0 SW   [kswapd0]
   25 root         0 SW   [fsnotify_mark]
   26 root         0 SW<  [nfsiod]
   27 root         0 SW   [kworker/u4:1]
   32 root         0 SW<  [ff705000.spi]
   35 root         0 SW<  [fff01000.spi]
   40 root         0 SW<  [kpsmoused]
   41 root         0 SW   [kworker/0:1]
   42 root         0 SW<  [dw-mci-card]
   43 root         0 SW   [mmcqd/0]
   44 root         0 SW<  [dwc2]
   45 root         0 SW<  [deferwq]
   46 root         0 SW   [kjournald]
  123 daemon    1460 S    /sbin/portmap
  142 root      3592 S    /usr/sbin/sshd
  146 root      1660 S    /sbin/syslogd -n -O /var/log/messages
  149 root      1660 S    /sbin/klogd -n
  153 root      1964 S    /usr/sbin/lighttpd -f /etc/lighttpd.conf
  157 root     10628 S    /www/pages/cgi-bin/scroll_server
  172 root      1564 S    /sbin/getty 115200 ttyS0
  173 root      1564 S    /sbin/getty 38400 tty1
  189 root      5948 R    {sshd} sshd: root@pts/0
  191 root      3640 S    sshd: root@notty
  192 root      2612 S    -sh
  194 root      3324 S    /usr/libexec/sftp-server
  240 root         0 SW   [kworker/u4:2]
  248 root      1948 R    ps

busybox is an application which combines a large number of traditional linux utilities in a single binary.

root@socfpga:~# busybox
BusyBox v1.20.2 (2013-09-27 23:27:54 CDT) multi-call binary.
Copyright (C) 1998-2011 Erik Andersen, Rob Landley, Denys Vlasenko
and others. Licensed under GPLv2.
See source distribution for full notice.

Usage: busybox [function] [arguments]...
   or: busybox --list
   or: function [arguments]...

        BusyBox is a multi-call binary that combines many common Unix
        utilities into a single executable.  Most people will create a
        link to busybox for each function they wish to use and BusyBox
        will act like whatever it was invoked as.

Currently defined functions:
        [, [[, ar, ash, awk, basename, bunzip2, bzcat, cat, chattr, chgrp, chmod, chown, chroot, chvt, clear,
        cmp, cp, cpio, cut, date, dc, dd, deallocvt, df, diff, dirname, dmesg, dnsdomainname, dpkg-deb, du,
        dumpkmap, dumpleases, echo, egrep, env, expr, false, fbset, fdisk, fgrep, find, flock, free, fsck,
        fsck.minix, fuser, grep, groups, gunzip, gzip, halt, head, hexdump, hostname, hwclock, id, ifconfig,
        ifdown, ifup, insmod, ip, kill, killall, klogd, less, ln, loadfont, loadkmap, logger, logname, logread,
        losetup, ls, lsmod, md5sum, microcom, mkdir, mkfifo, mkfs.minix, mknod, mkswap, mktemp, modprobe, more,
        mount, mv, nc, netstat, nohup, nslookup, od, openvt, patch, pidof, ping, ping6, pivot_root, poweroff,
        printf, ps, pwd, rdate, readlink, realpath, reboot, renice, reset, rm, rmdir, rmmod, route, run-parts,
        sed, seq, setconsole, sh, sleep, sort, start-stop-daemon, strings, stty, swapoff, swapon, switch_root,
        sync, sysctl, syslogd, tail, tar, tee, telnet, test, tftp, time, top, touch, tr, traceroute, true, tty,
        udhcpc, udhcpd, umount, uname, uniq, unzip, uptime, users, usleep, vi, watch, wc, wget, which, who,
        whoami, xargs, yes, zcat

Finally, a useful diagnosis tool for linux activity is the message log, collected in /var/log/messages. If the kernel runs into specific issues or exceptions, or if the hardware experiences faults, then the message log will often contain a relevant warning.

I2C device driver

The linux kernel on the DE1-SoC contains an I2C driver to access the I2C peripheral in the HPS. The DE1-SoC contains an ADXL345 accelerometer chip, attached to the I2C bus. If an application on the HPS needs to use the ADXL345 accelerometer chip, then it needs to use the I2C device driver.

The device driver is visible in the linux file system as a dedicate file in the /dev directory. There are two i2c peripherals in the HPS, and each has its own I2C driver, /dev/i2c-0 and /dev/i2c-1 respectively.

root@socfpga:~# ls /dev
bus                 ptyp3               tty18               tty43               ttyS1
console             ptyp4               tty19               tty44               ttyp0
cpu_dma_latency     ptyp5               tty2                tty45               ttyp1
fpga0               ptyp6               tty20               tty46               ttyp2
full                ptyp7               tty21               tty47               ttyp3
i2c-0               ptyp8               tty22               tty48               ttyp4
i2c-1               ptyp9               tty23               tty49               ttyp5
initctl             ptypa               tty24               tty5                ttyp6
input               ptypb               tty25               tty50               ttyp7
kmem                ptypc               tty26               tty51               ttyp8
kmsg                ptypd               tty27               tty52               ttyp9
log                 ptype               tty28               tty53               ttypa
mem                 ptypf               tty29               tty54               ttypb
mmcblk0             ram0                tty3                tty55               ttypc
mmcblk0p1           ram1                tty30               tty56               ttypd
mmcblk0p2           random              tty31               tty57               ttype
mmcblk0p3           spidev1.0           tty32               tty58               ttypf
mtab                tty                 tty33               tty59               urandom
network_latency     tty0                tty34               tty6                vcs
network_throughput  tty1                tty35               tty60               vcs1
null                tty10               tty36               tty61               vcsa
psaux               tty11               tty37               tty62               vcsa1
ptmx                tty12               tty38               tty63               watchdog
ptp0                tty13               tty39               tty7                zero
pts                 tty14               tty4                tty8
ptyp0               tty15               tty40               tty9
ptyp1               tty16               tty41               ttyLCD0
ptyp2               tty17               tty42               ttyS0

The Linux distribution also provides a set of utilities that can access the I2C port directly from the command line. They are i2cdetect, i2cdump, i2cget and i2cset.

First, let’s remind the basic concepts of the I2C protocol. On an I2C bus there are master and slave devices. The slave devices all have a 7-bit slave address that needs to be used for every read or write transfer. A read or write transfer can be single-byte or multi-byte. Each byte is acknowledged after each transfer.

Figure: Detailed signaling on an I2C bus

hps-i2c-signal

Figure: I2C write operation (master to slave)

hps-i2c-write

Figure: I2C read operation (slave to master)

hps-i2c-read

Most chips supporting an I2C interface, including the ADXL345, define a layer of indexed registers on top of the I2C transfer protocol. The registers on the chip can be logically addressed using an extra byte in each I2C transfer.

  • To write to chip register N, the I2C protocol performs a write to the I2C slave of two bytes: the value N, followed by the value to write in register N.

  • To read from chip register N, the I2C protocol performs a write to the I2C slave of a byte with value N, followed by a read from the I2C slave of a byte.

The following tools are available on the board to directly access the I2C interface. The manual pages should be consulted for command line usage.

Tool Purpose
i2cdetect Find chips on an I2C bus
i2cdump Scan chip registers on an I2C bus
i2cset Write bytes on an I2C bus
i2cget Read bytes from an I2C bus

The following are some examples. Notice the warning message.

# list the I2C devices
i2cdetect -l
i2c-0   i2c             Synopsys DesignWare I2C adapter         I2C adapter
i2c-1   i2c             Synopsys DesignWare I2C adapter         I2C adapter

# Read the device ID (register index 0) from I2C bus i2c-0, 
# I2C slave address 0x53
# First, set the register index 
root@socfpga:~# i2cset -y 0 0x53 0x0
# Then, read the byte
root@socfpga:~# i2cget -y 0 0x53
0xe5

# Read 6 bytes from I2C bus i2c-0, I2C slave address 0x53, 
# starting at register index 0x32
root@socfpga:~# i2cdump -r 0x32-0x37 0 0x53
No size specified (using byte-data access)
WARNING! This program can confuse your I2C bus, cause data loss and worse!
I will probe file /dev/i2c-0, address 0x53, mode byte
Probe range limited to 0x32-0x37.
Continue? [Y/n] y
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f    0123456789abcdef
30:       00 00 00 00 00 00                              ......

In the following, we will use the I2C device driver to access the ADXL345 accelerometer on the chip.

Example 2: Accelerometer

The ADXL345 accelerometer chip is integrated on the DE1-SoC on the I2C port driven by the HPS.

The challenge for the software running on Linux is that the I2C peripheral requires control of memory-mapped registers. The OS provides a device driver instead that accesses the I2C peripheral on behalf of the application.

The hardware is visible in Linux through a series of devices, accessible on the /dev directory of the file system. There are two I2C devices on the Linux running on DE1-SoC:

root@socfpga:~# ls /dev/i2c*
/dev/i2c-0  /dev/i2c-1

These devices can be opened as files and read/written by byte streams, using open, close, write, read and ioctl system library calls on the Linux OS.

The I2C slave addressing of the device is integrated into the i2c device driver. The user does not need to deal with I2C device addressing, and I2C master/slave handshakes. The user only handles the byte streams to and from the ADXL345.

Figure: Accessing the ADXL345 registers through I2C device driver

hps-i2c-hier

To access the ADXL345, one opens the device as a file, and sets the device to control a slave of a selected address. In this case, we use the slave address for the ADXL345 (0x53).

  int file;
  const char *filename = "/dev/i2c-0";
  uint8_t id;
  bool bSuccess;
  const int mg_per_digi = 4;
  uint16_t szXYZ[3];
  int cnt=0, max_cnt=0;

  // open bus
  if ((file = open(filename, O_RDWR)) < 0) {
    perror("Failed to open the i2c bus of gsensor");
    exit(1);
  }

  // init
  // gsensor i2c address: 101_0011
  int addr = 0b01010011;
  if (ioctl(file, I2C_SLAVE, addr) < 0) {
    printf("Failed to acquire bus access and/or talk to slave.\n");
    exit(1);
  }

Figure: I2C address of the ADXL345 (from the datasheet)

hps-i2c

Once the file handle of the I2C device is opened, one can simple read and write bytes to it. The device driver will take care of formatting the data according to the proper I2C format. The ADXL345 is accessed using single-byte read/write and multi-byte read operations.

bool ADXL345_REG_WRITE(int file, uint8_t address, uint8_t value){
  bool bSuccess = false;
  uint8_t szValue[2];

  // write to define register
  szValue[0] = address;
  szValue[1] = value;
  if (write(file, &szValue, sizeof(szValue)) == sizeof(szValue)){
    bSuccess = true;
  }

  return bSuccess;
}

bool ADXL345_REG_READ(int file, uint8_t address,uint8_t *value){
  bool bSuccess = false;
  uint8_t Value;

  // write to define register
  if (write(file, &address, sizeof(address)) == sizeof(address)){

    // read back value
    if (read(file, &Value, sizeof(Value)) == sizeof(Value)){
      *value = Value;
      bSuccess = true;
    }
  }

  return bSuccess;
}

bool ADXL345_REG_MULTI_READ(int file, uint8_t readaddr,uint8_t readdata[], uint8_t len){
  bool bSuccess = false;

  // write to define register
  if (write(file, &readaddr, sizeof(readaddr)) == sizeof(readaddr)){
    // read back value
    if (read(file, readdata, len) == len){
      bSuccess = true;
    }
  }
  return bSuccess;
}

Figure: I2C access of the ADXL345 (from the datasheet). The read boxes contain data bytes provides as input to the device driver; the device driver handles the I2C protocol.

hps-i2c-access

The ADXL345 registers of interest can be consulted from the I2C datasheet. For example, consider the reading of XYZ acceleration data, which is a multi-byte read operation. In the C application, this is programmed as follows:

bool ADXL345_XYZ_Read(int file, uint16_t szData16[3]){
    bool bPass;
    uint8_t szData8[6];
    bPass = ADXL345_REG_MULTI_READ(file, 0x32, (uint8_t *)&szData8, sizeof(szData8));
    if (bPass){
        szData16[0] = (szData8[1] << 8) | szData8[0];
        szData16[1] = (szData8[3] << 8) | szData8[2];
        szData16[2] = (szData8[5] << 8) | szData8[4];
    }

    return bPass;
}

Hence, this is a multi-read starting at ADXL register address 0x32. In total, 6 bytes are read from the I2C slave. From the ADXL345 register map, shown in the data sheet, we can find that these registers contain the acceleration data for the X, Y and Z direction.

Figure: Register map of the ADXL345. The acceleration registers are located starting at address 0x32.

hps-i2c-adxl

Similarly, there is an ADXL345 initialization function, which enables the chip and configures it using I2C programming:

bool ADXL345_Init(int file){
    bool bSuccess;

    // +- 2g range, 10 bits
    bSuccess = ADXL345_REG_WRITE(file, 
      ADXL345_REG_DATA_FORMAT, XL345_RANGE_2G | XL345_FULL_RESOLUTION);

    //Output Data Rate: 50Hz
    if (bSuccess){
        bSuccess = ADXL345_REG_WRITE(file, 
          ADXL345_REG_BW_RATE, XL345_RATE_50); // 50 HZ
    }

    //INT_Enable: Data Ready
    if (bSuccess){
        bSuccess = ADXL345_REG_WRITE(file, 
          ADXL345_REG_INT_ENALBE, XL345_DATAREADY);
    }

    // stop measure
    if (bSuccess){
        bSuccess = ADXL345_REG_WRITE(file, 
          ADXL345_REG_POWER_CTL, XL345_STANDBY);
    }

    // start measure
    if (bSuccess){
        bSuccess = ADXL345_REG_WRITE(file, 
          ADXL345_REG_POWER_CTL, XL345_MEASURE);
    }

    return bSuccess;
}

The overall application can be compiled and run just as the hello world application.

cd hps_gsensor
make

# the following commands will appearL
arm-linux-gnueabihf-gcc \
    -g -Wall  -Dsoc_cv_av \
    -Ic:/intelFPGA_lite/18.1/embedded/ip/altera/hps/altera_hps/hwlib/include/soc_cv_av   \
    -Ic:/intelFPGA_lite/18.1/embedded/ip/altera/hps/altera_hps/hwlib/include/ \
     -c main.c -o main.o
arm-linux-gnueabihf-gcc \
     -g -Wall  -Dsoc_cv_av \
     -Ic:/intelFPGA_lite/18.1/embedded/ip/altera/hps/altera_hps/hwlib/include/soc_cv_av \
     -Ic:/intelFPGA_lite/18.1/embedded/ip/altera/hps/altera_hps/hwlib/include/  \
      -c ADXL345.c -o ADXL345.o
arm-linux-gnueabihf-gcc \
      -g -Wall \
       main.o ADXL345.o -o gsensor

# copy the executable to the board:
scp -P 50444 gsensor root@172.29.42.57:/home/root

# In a terminal connected to HPS, run the application
./gsensor

# the following output will appear:
root@socfpga:~# ./gsensor
===== gsensor test =====
id=E5h
[1]X=-16 mg, Y=-56 mg, Z=1008 mg
[2]X=-16 mg, Y=-68 mg, Z=1060 mg
[3]X=-8 mg, Y=-60 mg, Z=1052 mg
[4]X=-16 mg, Y=-64 mg, Z=1056 mg
[5]X=-12 mg, Y=-68 mg, Z=1064 mg
[6]X=-16 mg, Y=-64 mg, Z=1060 mg
...

We can access the accelerometer also using the I2C-tools discussed above. Let’s recap the I2C programming sequence done by the gsensor application and formulate that sequence using the i2c-tools package.

  • Set the resolution and data format to +-2G (write 0x8 to ADXL345 register 0x31)
  • Set the conversion rate to 50Hz (write 0x9 to ADXL345 register 0x2C)
  • Set the power control register to measure (write 0x8 to register 0x2D)
  • Read the conversion result (read 6 bytes from register 0x32 and following)
root@socfpga:~# i2cset -y 0 0x53  0x31 0x8
root@socfpga:~# i2cset -y 0 0x53  0x2c 0x9
root@socfpga:~# i2cset -y 0 0x53  0x2d 0x8
root@socfpga:~# i2cdump -r 0x32-0x37 0 0x53
No size specified (using byte-data access)
WARNING! This program can confuse your I2C bus, cause data loss and worse!
I will probe file /dev/i2c-0, address 0x53, mode byte
Probe range limited to 0x32-0x37.
Continue? [Y/n] y
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f    0123456789abcdef
30:       fe ff ed ff 09 01                              ?.?.??

Measuring Performance

Another important aspect of low-level interfacing and hardware/software codesign is the measurement of time. The ARM provides a low-level performance measurement infrastructure in the form of a 64-bit continuous timer. This timer can be accessed in the Linux Performance Monitoring API, perf_event_open.

The access mechanism to the performance counter infrastructure is through a device driver accessed through a system call. The system call returns a file descriptor, which can be used to read the current timestamp. The following code creates functions to initialize the counter infrastructure. The functions have constructor and destructor attributes, which are called when a program first starts resp. when the program completes. The cpucycles() function can be called to read the timestamp counter.

static int fddev = -1;
__attribute__((constructor)) static void init(void) {
        static struct perf_event_attr attr;
        attr.type = PERF_TYPE_HARDWARE;
        attr.config = PERF_COUNT_HW_CPU_CYCLES;
        fddev = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0);
}

__attribute__((destructor)) static void fini(void) {
        close(fddev);
}

static inline long long cpucycles(void) {
        long long result = 0;
        if (read(fddev, &result, sizeof(result)) < sizeof(result)) return 0;
        return result;
}

Time measurement now proceeds as before, by calling cpucycles. However, the linux processing environment adds a significant level of overhead and noise. Hence, both the accuracy and the precision of the time measurement suffer. The following is an example program.

int main() {
  unsigned i, k;
  volatile unsigned j;
  long long time_start = 0;
  long long time_end   = 0;
  
  printf("time overhead = ");
  for (k=0; k<10; k++) {
    time_start = cpucycles();
    time_end   = cpucycles();
    printf("%llu ", time_end - time_start);
  }
  printf("\ntime delta = ");
  for (k=0; k<10; k++) {
    time_start = cpucycles();
    j=0;
    for (i=0; i<100000; i++)
      j += i;
    time_end   = cpucycles();
    printf("%llu ", time_end - time_start);
  }
  printf("\n");
  
  return 0;
}

It can be compiled and copied to the board as follows:

$make
arm-linux-gnueabihf-gcc -g -Wall -Dsoc_cv_av -I/ip/altera/hps/altera_hps/hwlib/include/soc_cv_av -I/ip/altera/hps/altera_hps/hwlib/include/ -c main.c -o main.o
arm-linux-gnueabihf-gcc -g -Wall main.o -o cyclecount
arm-linux-gnueabihf-objdump -D cyclecount > cyclecount.lst

# replace IP_ADDRESS_OF_BOARD below with the IP address of your board
$ scp cyclecount root@IP_ADDRESS_OF_BOARD:/home/root
root@IP_ADDRESS_OF_BOARD's password:
cyclecount                                                                     100%   10KB 716.8KB/s   00:00

The output of this program on the board is as follows:

time overhead = 2769 2104 1473 1442 1453 1466 1464 1414 1432 1403
time delta = 932513 901704 901538 901444 901446 901466 901486 901479 921746 901588

There are several causes for the variation, including caching, scheduling within the Linux kernel, and overhead from hardware events. The overhead is significant (1400 cycles) and the noise is a couple percentage points (eg 20000 cycles on 900000 cycles). Therefore, when measuring performance in an environment like this we will measure the steady state by repeating the measurements ten times. Furthermore, we will take the median of the set of collected measurements, so that we are likely to measure a ‘typical’ case.

Instead of counting cycles, we can measure various other features:

  • PERF_COUNT_HW_INSTRUCTIONS (Retired instructions)
time overhead = 885 876 837 812 815 807 813 814 812 814
time delta = 1200896 1200847 1200838 1200838 1200833 1205092 1200839 1200836 1200838 1200836
  • PERF_COUNT_HW_CACHE_MISSES (Last level cache misses)
time overhead = 4 1 0 0 0 0 0 0 0 0
time delta = 2 0 0 0 0 108 0 0 0 0

As well as various other performance counters as described in the perf_event_open manpage.