Lecture 1 - Introduction

Introduction
The C program
The Linker Description File
Downloading the example C program
Compiling the C progam
The Assembly Output
Conclusions

Introduction

The hardware/software codesign course consists of three parts, each of them discussing an embedded computing platform in detail. An embedded computing platform, in this context, consists at the minimum of a processor with instruction- and data-memory. In addition to this processor, you may also include additional hardware modules that perform specialized operations for the software. This concept is very common in microcontrollers, which use hardware peripheral modules to facilitate interfacing the real world.

Each of the three platforms we will discuss has a different level of complexity and performance. The first one is the MSP-430, a 16-bit microcontroller. The second one is the NIOS-II, a 32-bit microcontroller. The last one is an ARM-A9, a 32-bit embedded processor.

The advantage of studying a small and simple implementation is that it’s possible to fully understand the hardware and software in all its gory details. This relates especially to the hardware/software interface, the place where flexibility (software) and specialization (hardware) meet.

In this lecture, we will study how a small C program is compiled for the MSP-430. The objective of compiling a C program is to convert that program into a list of instructions for the MSP-430 microcontroller. We will compile the C program for the MSP-430 on a standard laptop using a cross-compiler, a compiler that is capable of generating instructions for another processor than the one it is running on. In this case, we will make use of msp430-gcc-opensource, a compiler that generates MSP-430 instructions but that runs on an X86 processor.

The C program

As with most embedded software, this C program does not use standard input/output. Instead, it makes use of a specialized input/output mechanism, implemented by assignments to special variables.

We focus on the variables P1OUT and P1DIR, and leave the WDTCTL assignment aside for now.
P1OUT and P1DIR are used to interact with a parallel I/O port. P1DIR = 0xFF sets the direction of the I/O port to output, while P1OUT sets a value to the output port. In a tight loop, each bit of the output port is set and reset.

#include "omsp_system.h"

int main(void) {
  register unsigned int k = 1;

  P1DIR = 0xFF;              // Initialize for output
  WDTCTL = WDTPW | WDTHOLD;  // Disable watchdog timer
  eint();

  while (1) {
    k = 0x1;
    do {
      P1OUT |= k;
      P1OUT &= ~k;
      k <<= 1;
    } while (k != 0x80);
  }

  dint();

  return 0;
}

An interesting question, from the perspective of hardware/software codesign, is to determine the speed of the while (1) loop. That is, to find out how quickly each bit is set and reset. The C program alone cannot answer that question. The C program only expresses the functionality of the application, but it ignores the timing characteristics.

The timing of the program depends on two factors.

First, it depends on the clock frequency of the MSP-430. The higher the clock rate, the faster instructions execute on the microcontroller. There, the higher the clock rate, the faster the loop of the C program execution will spin.
Second, it depends on the number of instructions used by the MSP-430 for one iteration of the while loop. The C program does not reveal how many instructions are used, as that is decided by the compiler.

To determine the second factor, we therefore have to look at the assembly output of the loop. So the next question, then, is to cross-compile this program for the MSP-430.

The Linker Description File

When a cross-compiler converts a C program into instructions, it has to place each instruction in program memory, and it has to choose a proper address range for the different parts of the overall program. Indeed, in the world of embedded computing, the amount of memory, and the kind of memory (RAM, Flash, ..) is highly variable from one design to the next. For example, while one MSP-430 based microcontroller may have 16KByte of RAM, another may have 32KByte of RAM.

A Linker Description File is a file, often in textual format, that explains to the compiler how the memory is organized. For example, the following shows (part of) the linker description file for the example C program. The MEMORY sections says that there is an SFR area containing 16 bytes and starting at address 0. There is also a RAM area containing 16 Kbytes and starting at address 512. Finally, there is also a ROM are with (32 bytes less than) 24 Kbytes starting at address 40,960. Note that we used decimal notation in the text, while the linker description file uses hexadecimal.

MEMORY {
  SFR              : ORIGIN = 0x0000, LENGTH = 0x0010
  RAM (wx)         : ORIGIN = 0x0200, LENGTH = 0x4000
  ROM (rx)         : ORIGIN = 0xA000, LENGTH = 0x6000-0x20
  ...
}

A Linker Desription File also contains another piece of information: it explains to the compiler what memory is be used for what kind of program element. Indeed, the compiler is oblivious to how you would like to use RAM or ROM; it’s up to the programmer to explain that, e.g. theRAM area is to be used for the stack, and that the ROM area is to be used for instructions.

At the macro-level, the output of a compiler is organized in sections, and the different types of outputs fit into different sections. The most important section names commonly generated by a compiler are listed below.

Name	Purpose
`.text`	Instructions (opcodes)
`.data`	Initialized global variables
`.bss`	Uninitialized global variables
`.rodata`	Constants (read-only)

The Linker Description File then maps each section in the compiler output to a memory range. The following example shows how the text section is mapped into ROM and the bss section into RAM. There are many details in this syntax that we will ignore; the most important is to spot the notation .text : { ... } > ROM.

  .text :
  {
    . = ALIGN(2);
    PROVIDE (_start = .);
    KEEP (*(SORT(.crt_*)))
    *(.lowtext .text .stub .text.* .gnu.linkonce.t.* .text:*)
    KEEP (*(.text.*personality*))
    /* .gnu.warning sections are handled specially by elf32.em.  */
    *(.gnu.warning)
    *(.interp .hash .dynsym .dynstr .gnu.version*)
    PROVIDE (__etext = .);
    PROVIDE (_etext = .);
    PROVIDE (etext = .);
    . = ALIGN(2);
    KEEP (*(.init))
    KEEP (*(.fini))
    KEEP (*(.tm_clone_table))
  } > ROM

Downloading the example C program

This example follows a similar format as the others given in this course. Examples are stored in a public repository on the github organization for this course (vt-ece4530-f19). Before you can experiment with the examples, you first have to download them. To download the example, clone the git repository as follows.

git clone https://github.com/vt-ece4530-f19/example-portcnt

If you then inspect the files available in that directory, you will find the following:

$ ls example-portcnt
linker.msp430.x  main.c  makefile  omsp_system.h

We already discussed the C program (main.c) and the linker description file (linker.msp430.x). The compilation commands are stored in the makefile. If you installed the software tools for this course in a non-standard directory, you may have to manually adjust the makefile.

Compiling the C progam

We are now ready to compile the program. Type make and you should see (a) compilation of the C program into an object file; (b) linking of the object file and (c) generation of a listing file at the level of assembly.

We break down the specific commands for each step. The compilation command converts a C program into an object file, a file that holds the opcodes for a program. The object file only contains the code for the functions found in the C program, and it does not contains runtime support, library functions, or functions stored in another C program. The compilation step starts from a C program, main.c and creates an object file, portcnt.o. Several flags provide specific guidelines for the compilation; -Wall generates warnings, -mcpu selects the specific type of microcontroller used, and -c ends the compilation when the object file is created. The object file, portcnt.o is not a complete program. The file misses a runtime environment (initialization code), and it misses an implementation of the library functions used.

/cygdrive/c/ti/msp430-gcc/bin/msp430-elf-gcc  \
                                       -Wall  \
                                       -mmcu=msp430c1111 \
                                       -c \
                                       main.c -o portcnt.o

The second command is the linking command. This command takes the object file portcnt.o and creates the elf file portcnt.elf. ELF stands for Executable Linkable Format. This is a complete executable program that holds all the information needed to create an image of the program in MSP-430 program memory. The linker uses the linker description file to determine the address ranges to use for the different sections of the program (text, data, bss and so on). Like the previous command, various flags are used to configure the linker.

/cygdrive/c/ti/msp430-gcc/bin/msp430-elf-gcc \
                           -mmcu=msp430c1111 \
                           -T linker.msp430.x \
                           portcnt.o \
                           -o portcnt.elf

The following two commands are two utility functions. msp430-elf-size reports the size of the sections of the executable in MSP-430 memory. For example, we use 488 bytes to store the instructions of portcnt, 14 bytes for constants, and 22 bytes for global variables. The total memory needs are 524 bytes, which is 20c in hexadecimal notation.

The command immediately after that demonstrates the conversion of the ELF file into an image file, a replica of the program memory when the ELF program is loaded in MSP-430 memory.

/cygdrive/c/ti/msp430-gcc/bin/msp430-elf-size portcnt.elf
   text    data     bss     dec     hex filename
    128       4       4     136      88 portcnt.elf
/cygdrive/c/ti/msp430-gcc/bin/msp430-elf-objcopy -O ihex portcnt.elf portcnt.a43
/cygdrive/c/ti/msp430-gcc/bin/msp430-elf-objcopy -O ihex portcnt.elf portcnt.a43

The final command demonstrates how to generate a list of assembly instructions from the ELF file. objdump is a general-purpose tool to examine executable files and object files. The flag -d is used to create an assembly file, the -S flag interleaves C source code in the assembly output as comments, and -t dumps the symbol table for the executable. The result, portcnt.lst is the file that we will investigate next.

/cygdrive/c/ti/msp430-gcc/bin/msp430-elf-objdump \
                            -dSt \
                            portcnt.elf \
                            >portcnt.lst

The Assembly Output

It’s worthwhile to load portcnt.lst in an editor and analyze its contents. We will focus the analysis on one particular part of the C program, namely the innermost do-while loop inside of the main function. Let’s recall our objective. We want to determine how fast the while loop can execute, and for that we have to know the number of instructions that the processor requires to execute the loop.

    do {
      P1OUT |= k;
      P1OUT &= ~k;
      k <<= 1;
    } while (k != 0x80);

The following listing shows the implementation of this do-while loop (In a later lecture, we will discuss some techniques to identify the structure of a C program in assembly code). We can learn many things from studying this assembly code.

First, the assignments to P1OUT are implemented as byte-move instructions (mov.b) address 33. Thus, input/output ports are programmed as absolute memory locations.
Next, the assembly code as shown is not efficient. Notice, for example, how often the address of P1OUT is loaded in register r12. Un-optimized assembly code generated by a compiler is not efficient. Luckily, the compiler is able to produce more optimal code (that avoids such redundancies), and we will discuss later how to do this.
The port P1OUT is read as well as written during the loop. This is because a C expression such as P1OUT |= k requires reading P1OUT, then or-ing it with k, then then writing it back to P1OUT1.
The compiler is limited to instructions supported by the processor. For example, the MSP-430 microcontroller has no shift instructions. Therefore, a C statement such as k <<= 1 has to be implemented using other instructions. In this case, the compiler added a variable to itself, effectively multiplying a variable by 2 (and therefore, shifting it left).

0000a138 <.L2>:
    a138:	7c 40 21 00 	mov.b	#33,	r12	;#0x0021
    a13c:	6d 4c       	mov.b	@r12,	r13	;
    a13e:	4e 4a       	mov.b	r10,	r14	;
    a140:	7c 40 21 00 	mov.b	#33,	r12	;#0x0021
    a144:	4d de       	bis.b	r14,	r13	;
    a146:	3d f0 ff 00 	and	#255,	r13	;#0x00ff
    a14a:	cc 4d 00 00 	mov.b	r13,	0(r12)	;
    a14e:	7c 40 21 00 	mov.b	#33,	r12	;#0x0021
    a152:	6d 4c       	mov.b	@r12,	r13	;
    a154:	4c 4a       	mov.b	r10,	r12	;
    a156:	7c e3       	xor.b	#-1,	r12	;r3 As==11
    a158:	4e 4c       	mov.b	r12,	r14	;
    a15a:	7c 40 21 00 	mov.b	#33,	r12	;#0x0021
    a15e:	4d fe       	and.b	r14,	r13	;
    a160:	3d f0 ff 00 	and	#255,	r13	;#0x00ff
    a164:	cc 4d 00 00 	mov.b	r13,	0(r12)	;
    a168:	0c 4a       	mov	r10,	r12	;
    a16a:	0c 5a       	add	r10,	r12	;
    a16c:	0a 4c       	mov	r12,	r10	;
    a16e:	3a 90 80 00 	cmp	#128,	r10	;#0x0080
    a172:	e2 23       	jnz	$-58     	;abs 0xa138
    a174:	e0 3f       	jmp	$-62     	;abs 0xa136

Overall, we see that the inner loop takes 21 instructions (address a138 to a174). Even when each instruction would take only a single cycle, the loop would take at least 21 cycles per iteration. In practice, the round-trip time will be much higher, since MSP-430 instructions take multiple cycles to execute. We will discuss this in one of the coming lectures.

Conclusions

This ends our very brief encounter with the MSP-430 microcontroller. We started from a C program and translated the program to assembly in order to investigate how much time a loop in C would take to execute. In hardware-software codesign, we care about the low-level interactions between hardware and software greatly. To identify performance bottlenecks of software, we have to understand in detail how the hardware operates. And to design hardware-acceleration for slow software programs, we have to understand how the software and hardware interact.

In the coming lectures we will embark on a tour of the MSP-430 that includes a study of the MSP-430 architecture and its interaction mechanisms with hardware peripherals. We will design hardware acceleration modules to speed up the C software. The simplicity of the MSP-430 will help us to see a level of detail that would be unrealistic for a complex architecture (such as an X86).