Introduction

We start with the last platform that we will study in the course: the hard ARM processor on the Cyclone V FPGA. The ‘hard’ terminology means that there is an ARM processor implemented directly in silicon of the FPGA. Unlike the Nios, which is configured in lookup table logic in the FPGA fabric, this ARM is always there. The terminology used in the Quartus documentation for this ARM is the ‘HPS’, the Hard Processor System.

The HPS includes many more features besides the ARM processors. In fact, it includes a complete set of peripherals, on-chip memories, and memory controllers. The HPS can operate by itself, and can boot an embedded Linux operating system, without configuring the FPGA. Naturally, such complexity does not come for free. The following image shows the physical size of a Nios-II processor system with 64Kbyte of memory relative to the hard processor system in the Cyclone V FPGA of the DE1SoC kit. While the distances in this figure do not represent precise physical sizes, you still get the notion that the HPS is roughly one fourth of the size of the FPGA.

In this lecture, we will discuss the basic features of the HPS, the benefits from using Linux to develop applications, the effect of running Linux on the memory hierarchy. In particular, we will discuss how this impacts the operation of memory-mapped hardware. We’ll discuss the example of an accelerometer integrated onto the DE1-SoC board, and addressed from within a C application running on Linux.

Figure: Hard Processor System in FPGA fabric with Nios-II with CRC

hps-fabric

The ARM Hard Processor System

The ARM Hard Processor System is documented in the Cyclone V Volume 3 Device Handbook: Hard Processor System.

The following figure, taken from the Hard Processor System documentation, shows a block diagram of the HPS. The three blue areas correspond to the FPGA fabric (top), the processor interconnect (left), and the dual-ARM processors themselves (right).

The bus interconnect system is sophisticated. It consists of several different layers (L3, L2) with the ARM Cortex A9 core at the center. That ARM runs much faster then the FPGA fabric. By default, the clock speed of the ARMs is 800MHz, 16 times faster then the default clock rate of the DE1-SoC FPGA.

The MPU subsystem contains dual-core Cortex A9 processor. Each A9 has an 8-stage out-of-order speculative pipeline that achieves 2.5 DMips/MHz/Core. Compare that to the Nios-II processor, which has 0.9 DMips/MHz at the most complex configuration (Nios-II/f), and only 0.1 DMips/MHz at the most economical configuration (Nios-II/e). The A9 core also supports Neon SIMD extensions which enables datapath processing of 128 bits in parallel. The A9 core also has a floating point hardware accelerator.

Each processor core has a 32Kbyte instruction cache and a 32Kbyte data cache. This is the L1 (level 1) cache, and it’s not shown on the figure. The figure shows a cache coherency controller (SCU and ACP; Snoop Control Unit and Accelerator Coherency Port). The output of the L1 cache requests go to a shared L2 (level 2) cache of 512 Kbyte.

Figure: Hard Processor System Block Diagram

hps-arch

The interconnect beyond the level 2 consists of the Level 3 Interconnect switches. These switches connect master and slaves, in such a manner that multiple master-slave connection can co-exist at the same time.

The HPS integrates four different memory controllers in addition to on-chip memory. Besides the four memory controllers, there is a 64KByte on-chip RAM and a 64KByte Boot ROM.

  • The SDRAM controller enables access to off-chip DDR3 dynamic RAM memory. The DE1-SoC board has 1 GByte of off-chip memory accessible through this SDRAM controller.
  • The HPS has a NAND Flash controller for off-chip flash memory access (not used on DE1-SoC).
  • The HPS has an SPI Flash controller for off-chip serial flash memory access (not used on DE1-SoC).
  • The HPS has a Secure Digital Memory Card (SD/MMC) controller. The DE1-SoC board can use this to store the file system of an embedded Linux OS.

The HPS integrates a slew of peripherals:

  • Several system control modules for clock and reset
  • Debug support
  • Timers (periodic and watchdog)
  • Direct Memory Access Engine
  • Ethernet Interface
  • USB Interface
  • UART
  • CAN bus Controller
  • SPI master and slave
  • GPIO

The interface between the HPS and the FPGA fabric deserves a separate discussion.

The FPGA Manager enables the configuration of the FPGA from within the ARM processor. That is, the FPGA Manager can serve the same role as the USB Blaster port on the DE1-SoC board. This feature allows an FPGA to be remotely configured.

The dual high-speed AXI buses enable high-speed data transfer between the FPGA hardware and the ARM HPS. There are two bridges, one that allows the FPGA to be a master, and the other that allows the ARM to be a master. Thus, it is possible for the FPGA hardware to directly access the memory space of the Hard Processor System. The data width of these buses is 64 bit by default, but can be reconfigured to 128 bit.

A second lightweight bus, consisting of a 32-bit AXI bus from the ARM (master) to the FPGA fabric (slave), is used for less high-performance (and easier) bus transactions.

Additionally, the FPGA fabric has six master interfaces directly to the SDRAM controller, with a configurable data width up to 256 bit per transfer.

Figure: Hard Processor System Interface to FPGA fabric

hps-fpga

Running Software Applications on HPS

Next, we look into software development for the HPS.

System Connectivity

The following figure illustrates three relevant interfaces for the HPS to run software applications.

  • The SD/MMC interface fits an SD/MMC memory card with a Linux kernel and a root file system. It is used to boot an embedded Linux system on the board.
  • The Ethernet port is used for network connectivity. Once Linux has booted and the network interface is properly configured, one can log in remotely to the HPS system using ssh.
  • The UART-USB is used as a console connection for the Linux OS. Once the system has booted, one can log in with a console terminal.

Figure: System integration of Hard Processor System

hps-connect

Bare Metal Software versus Linux Application

When we ran applications on the MSP-430 or the Nios-II, we ran so-called ‘bare-metal’ applications. These applications run a single C program that can directly access the entire 64K address space (for the MSP-430) or the entire 4G address space (for the Nios-II). The following lists some of the key differences between running applications on a bare-metal system such as MSP-430 and Nios-II, and on an OS-driven system such as HPS.

  MSP-430 Nios-II ARM A9 (HPS)
Processor Softcore MSP430F1x Softcore Nios-II Hardcore ARM v7
Word/Address 16 bit/ 64K 32 bit/ 4G 32 bit/ 4G
Speed 50MHz 50 MHz 800MHz
Memory no cache, on-chip mempry no cache, on-chip mempry 2-level cache (L1=32K/32K, L2=512K), 64K on-chip, 1G off-chip
Software bare-metal bare-metal Linux
Connectivity Debug UART + User UART JTAG UART USB UART or Ethernet
Compiler msp430-elf-gcc nios2-elf-gcc arm-linux-gnueabihf-gcc
Loader Download Debug UART Download JTAG nios2-download Add file in root file system and execute from command line
HW/SW Interface Peripheral Bus Avalon Bus HSP-to-FPGA Bridge


On the HPS system we will run an embedded OS configured on the SD-card. When the HPS boots, it will load a Linux Kernel from the SD-card into SDRAM and boot Linux. In addition, there is a root file system on the SDCard that holds user data files and binaries (the images of user applications).

Once Linux runs, it allows multiple applications to co-exist as processes. Each of these processes operates as an independent thread of control. Every application has at least one process, but not all processes are user applications. There co-exist about 50 processes on a booted HPS. You can list those from the HPS command line after login:

root@socfpga:~# ps
  PID USER       VSZ STAT COMMAND
    1 root      1316 S    init [5]
    2 root         0 SW   [kthreadd]
    3 root         0 SW   [ksoftirqd/0]
    4 root         0 SW   [kworker/0:0]
    5 root         0 SW<  [kworker/0:0H]
    6 root         0 SW   [kworker/u4:0]
    7 root         0 SW   [migration/0]
    8 root         0 SW   [rcu_bh]
    9 root         0 SW   [rcu_sched]
   10 root         0 SW   [migration/1]
   ...
   46 root         0 SW   [kjournald]
  123 daemon    1460 S    /sbin/portmap
  142 root      3592 S    /usr/sbin/sshd
  146 root      1660 S    /sbin/syslogd -n -O /var/log/messages
  149 root      1660 S    /sbin/klogd -n
  153 root      1964 S    /usr/sbin/lighttpd -f /etc/lighttpd.conf
  157 root     10628 S    /www/pages/cgi-bin/scroll_server
  173 root      1564 S    /sbin/getty 38400 tty1
  219 root      2552 S    -sh
  224 root      1948 R    ps

The Linux kernel takes care of multiplexing the hardware resources between the processes (including user applications). These hardware resources do not only include the processor but also memory and hardware peripherals.

The Linux kernel uses two mechanisms to enable multiple applications to co-exist without interfering with each other. The first is Virtual Memory, which allows each user process to live in its own virtual memory space, independent from the real memory space. The second is a differentiation of priority between the kernel with its low-level hardware access mechanisms, and the user application.

The consequence of this design is that applications cannot directly access the hardware because they don’t see the hardware in their memory space, and because they don’t have the privilege to access the hardware.

There are two solutions to this problem which will be applicable to hardware/software codesign.

  1. Use a device driver for a particular hardware peripheral. A device driver builds a common software abstraction (such as a file or a stream) which enables an application to systematically access the hardware by ‘opening’, ‘writing’, ‘reading’ and ‘closing’ the peripheral.

  2. Use memory-mapping of virtual memory to real memory. An application can build a pipe from the virtual address space to the real address space, such that hardware peripherals become directly visible, and such that a user application can gain sufficient privileges to directly access that hardware.

Configuring the DE1-SoC to run the Linux OS

The guide Using Linux on the DE1-SoC gives a good overview on the steps to install and run Linux on your DE1-SoC kit. For this course, please install the Linux Console image found on the Terasic website.

Step 1: Setting up the DE1-SoC kit to run Linux

  • Get hold of a MicroSD card to write a Linux image. A 4GB card is adequate.

  • Follow the steps of the Using Linux on the DE1-SoC manual described in Section 2 (2.1 - 2.4). Use the Linux Console Image from the Terasic Website.

  • Plug the Micro SD card in the card socket. Also, connect a USB mini-B cable to the UART-USB outlet, and plug the other end into your laptop.

hps-de1soc

  • Power up the board and start a new MobaXterm session. If MobaXterm was already running, quit and restart it, so that the program picks up the UART-USB peripheral.

  • Start a new serial session in MobaXterm. MobaXterm should find a USB Serial Port if the USB-UART is properly connected and the board is powered on. Set the baudrate to 115200.

hps-mobaserial

  • You will be greeted with the login prompt for the Linux system. Use root (no password) to get a command line prompt.
Poky 8.0 (Yocto Project 1.3 Reference Distro) 1.3
 ttyS0

socfpga login: root
root@socfpga:~#
  • Press the HPS Reset Button (shown on the DE1-SoC photo above) to do a hard-reset on the board while leaving the terminal connected. You will see the boot log of the Linux system starting up on HPS.
U-Boot 2013.01.01 (Oct 24 2013 - 17:40:22)

CPU   : Altera SOCFPGA Platform
BOARD : Altera SOCFPGA Cyclone V Board
DRAM:  1 GiB
MMC:   ALTERA DWMMC: 0
In:    serial
Out:   serial
Err:   serial
Net:   mii0
Warning: failed to set MAC address

Hit any key to stop autoboot:  0
reading u-boot.scr
** Unable to read file u-boot.scr **
Optional boot script not found. Continuing to boot normally
reading zImage
3809104 bytes read in 1283 ms (2.8 MiB/s)
reading socfpga.dtb
17119 bytes read in 13 ms (1.3 MiB/s)
fpgaintf
ffd08028: 00000000    ....
fpga2sdram
ffc25080: 00000000    ....
axibridge
ffd0501c: 00000000    ....
## Flattened Device Tree blob at 00000100
   Booting using the fdt blob at 0x00000100
   Loading Device Tree to 03ff8000, end 03fff2de ... OK

Starting kernel ...

... (cutting many lines here)

Starting syslogd/klogd: done
Starting Lighttpd Web Server: lighttpd.
Starting blinking LED server
Stopping Bootlog daemon: bootlogd.

Poky 8.0 (Yocto Project 1.3 Reference Distro) 1.3
 ttyS0

socfpga login:
  • If you’ve made it to this point, you have booted Linux on the HPS. Good job!

Step 2: Configuring the Network (router method)

The next step is to configure the networking stack of your board so that you can log in through the Ethernet port rather then making use of the console. The first method is to use a (wireless) router such as the one you’re using for your home network.

Log in to the DE1-SoC and navigate to the /etc/network directory. This directory contains the file interfaces, which you can use to configure the IP address for your board.

Here is an example of that file for the board I am using. I rely on a static IP address as it is easier to remember the board address, which will only be connected to your local LAN. The following configuration must be added in /etc/network to replace the current eth0 configuration.

auto eth0
iface eth0 inet static
  address   192.168.10.50
  netmask   255.255.255.0
  network   192.168.10.0
  broadcast 192.168.10.255
  gateway   192.168.10.1
  dns-nameservers 198.82.247.98 198.82.247.66

Once you have configured the network, reboot the board and try to reach it from your laptop. First, see if you can ping it:

ping 192.168.10.50
PING 192.168.10.50 (192.168.10.50) 56(84) bytes of data.
64 bytes from 192.168.10.50: icmp_seq=1 ttl=64 time=2.04 ms
64 bytes from 192.168.10.50: icmp_seq=2 ttl=64 time=1.01 ms
64 bytes from 192.168.10.50: icmp_seq=3 ttl=64 time=1.08 ms
...

Next, see if you can log in to it using a secure shell connection:

ssh root@192.168.10.50

# you should see the following prompt:
root@socfpga:~#

You can now access the board directly through the network address 192.168.10.50.

Step 2-bis: Configuring the Network (Raspberry PI method)

The second method uses a raspberry PI as a wireless router. The PI connects to the local wireless LAN, and then forwards traffic for the DE1-SoC board to the Ethernet connection.

Raspberry Ethernet Configuration

Add or change the following eth0 configuration in /etc/network/interfaces on your Raspberry PI. This IP configuration puts the PI Ethernet port on the same network as the DE1-SoC board. Thus, from the PI we can use the command ssh 192.168.10.50 which will connect you to the HPS Linux system on the DE1-SoC.

# when used as router port
auto eth0
iface eth0 inet static
   address 192.168.10.1
   netmask 255.255.255.0
   network 192.168.10.0
   broadcast 192.168.10.255

Raspberry PI Wireless Configuration

To configure the PI to connect to a wireless network automatically, an easy method is to set up the wireless network configuration data in /etc/wpa_supplicant/wpa_supplicant.conf. The following is the configuration on the board I am using (redacted for sensitive data). This configurartion connects the PI to a local wireless network with plain pre-shared key, as well as to the eduroam network.

country=GB
ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev
update_config=1

network={
  ssid="THE_SSID_OF_YOUR_NETWORK_GOES_HERE"
  psk="THE_NETWORK_SECURITY_KEY_GOES_HERE"
}

network={
  ssid="eduroam"
  scan_ssid=1
  key_mgmt=WPA-EAP
  eap=PEAP
  identity="YOUR_EMAIL_GOES_HERE"
  password=THE_HASH_OF_YOUR_EDUROAM_PASSWORD_GOES_HERE
  phase2="MSCHAPV2"
}

The password for the eduroam configuration is a hash of the eduroam password, computed as follows:

echo -n password_here | iconv -t utf16le | openssl md4

Raspberry PI auto-emailing

It is useful for the PI to tell you what IP address it has picked up from the wireless network, so that you can log in to the PI over the network.

Do this as follows. First, install mailer packages on the PI.

sudo apt-get install ssmtp mailutils

Next, enable SSH so that you can log in remotely. Use sudo raspi-config and select ‘connections’ to enable SSH login.

Third, configure the mailer in /etc/ssmtp/ssmtp.config. Note that this configuration integrates your email credentials in a configuration file. Consider using a throw-away email address in place fo your official work address.

# The place where the mail goes. The actual machine name is required no
# MX records are consulted. Commonly mailhosts are named mail.domain.com
mailhub=smtp.gmail.com:587

# The full hostname
hostname=raspberrypi
AuthUser=YOUREMAILADDRESS
AuthPass=YOUREMAILPASSWORD
FromLineOverride=YES
UseSTARTTLS=YES

Finally, create a script /home/pi/mailip.sh that emails IP configuration data.

#!/bin/sh

/sbin/ifconfig | /usr/bin/mail -s "Raspberry Hello" your-email@vt.edu

And auto-run the script 60 seconds after each reboot of the Raspberry PI.

@reboot sleep 60 && /home/pi/mailip.sh

When you power-cycle your raspberry PI, you will now receive an email with its newest IP address. This is very useful in networks with dynamic IP address assignment.

Raspberry PI port forwarding

The final step on the Raspberry PI is to use port forwarding. This allows you to send your bitstreams and ARM executables directly to the DE1-SoC without having to log in to the PI.

We will forward port 50444 on the PI to port 22 on the DE1-SoC. Port 22 is commonly used for SSH connections.

Copy the following commands in a text file (or script) on the PI:

iptables --flush
iptables --table nat --flush
iptables --delete-chain
iptables --table nat --delete-chain
iptables --table nat --append POSTROUTING --out-interface wlan0 -j MASQUERADE
iptables --append FORWARD --in-interface wlan0 -j ACCEPT
echo 1 > /proc/sys/net/ipv4/ip_forward

# for DE1SOC (socfpga)
iptables -t nat -A PREROUTING -p tcp -i wlan0 --dport 50444 -j DNAT --to 192.168.10.50:22
iptables -t nat -A POSTROUTING -p tcp --dport 22 -j MASQUERADE

To enable forwarding, simply source this text file on the PI (while logged in as root). Afterwards, you can test the connection from your laptop with a command such as the following:

# Use SCP to copy hello executable to board
scp -P 50444 hello root@PI_IP_ADDRESS_HERE:/home/root

Compiling software

The software development environment for HPS applications is the ARM DS-5 software package, which is integrated in the SoC EDS. To compile programs, make sure that the compiler can be found on the command line. You can either run a Soc EDS Command Shell from the start menu, or you can set the path to the C compiler in your Cygwin PATH variable.

export PATH=$PATH:/cygdrive/c/Program\ Files/DS-5\ v5.25.0/sw/gcc/bin

All the examples for this lecture are in the example-hps-hello repository. In a Cygwin shell, download these examples using git.

git clone https://github.com/vt-ece4530-f19/example-hps-hello

Example 1: Hello World!

We start by running the most popular application to illustrate applications on the DE1-SoC HPS.

First, compile the application

cd helloworld
make
# you will see the following commands:
# arm-linux-gnueabihf-gcc -g -Wall main.c -o hello
# arm-linux-gnueabihf-objdump -D hello >hello.lst

Once you have a binary, you can inspect it using the usual tools. Using arm-linux-gnueabihf-objdump you create an assembly listing. Open hello.lst and look for the main function to find the following.

000083cc <main>:
    83cc:       b580            push    {r7, lr}
    83ce:       af00            add     r7, sp, #0
    83d0:       f248 4034       movw    r0, #33844      ; 0x8434
    83d4:       f2c0 0000       movt    r0, #0
    83d8:       f7ff ef72       blx     82c0 <_init+0x20>
    83dc:       2300            movs    r3, #0
    83de:       4618            mov     r0, r3
    83e0:       bd80            pop     {r7, pc}
    83e2:       bf00            nop

Notice that the instructions are 16-bit. ARM processors can store the most common instructions in a compressed 16-bit format called ARM thumb. These instructions save program memory space.

Once the program is compiled, you can download it to the board.

scp -P 50444 hello root@ip_address:/home/root/hello

Next, on the board terminal, you can run the application.

./hello
# prints 'Hello, World'