Lecture 2 - The MSP-430 core

Introduction
- Roadmap of the coming lectures
MSP-430 Core

Introduction

Last lecture, we studied the MSP-430 cross-compiler using a small example C program. The cross-compiler transforms the program into instructions for the MSP-430 core. A single compiled file is an object file and several object files can be linked together into an executable file. Along the process of creating the executable file, we made several important observations.

Unlike a compiled program for an operating system such as Windows or Linux, a program for a small microcontroller must be designed in consideration of the memory organization for that microcontroller. In particular, we have to know the type and amount of memory, and we have to know where in the address space these memories are located. The compiler uses a special file, the linker description file, that explains what memory types are available, and that explains how compiler sections (data and instructions) should be mapped into that memory.
Predicting the execution time of a C program is not straightforward. A compiler will make a correct translation of a C program into an assembly program. However, that doesn’t mean that this assembly program is optimal or efficient. When we have to answer precise question with respect to the execution time of a program, we have to study the program at the level of machine instructions. In a simple architecture such as the MSP-430, this is often sufficient to answer questions with respect to the cycle count of the program. In complex architectures (such as out-of-order and superscalar processors), even knowing the precise sequence of instructions is still not enough.

Studying software at the level of assembly, for a program where you only wrote the C code, may appear a complicated task. However, in practice it is easier than you think, even for processors that may be unfamiliar to you. The trick is to match up the control structures between the C program, and the assembly listing generated from that program. The control structure in C is defined by control structures such as for loops, while loops and so on. The control structure in assembly is defined by conditional and unconditional jumps such as jmp, jnz and so on. Let’s illustrate that with an example. Here is the C program once again.

int main(void) {
  register unsigned int k = 1;

  P1DIR = 0xFF;              // initialize for output
  WDTCTL = WDTPW | WDTHOLD;  // Disable watchdog timer
  eint();

  while (1) {
    k = 0x1;
    do {
      P1OUT |= k;
      P1OUT &= ~k;
      k <<= 1;
    } while (k != 0x80);
  }

  dint();

  return 0;
}

And here is the assembly listing generated out of the program using a cross-compiler. Only the main function is shown; the complete listing portcnt.lst is much longer!

0000a11c <main>:
    a11c:	0a 12       	push	r10		;
    a11e:	5a 43       	mov.b	#1,	r10	;r3 As==01
    a120:	7c 40 22 00 	mov.b	#34,	r12	;#0x0022
    a124:	fc 43 00 00 	mov.b	#-1,	0(r12)	;r3 As==11
    a128:	3c 40 20 01 	mov	#288,	r12	;#0x0120
    a12c:	bc 40 80 5a 	mov	#23168,	0(r12)	;#0x5a80
    a130:	00 00 
    a132:	32 d2       	eint			
    a134:	03 43       	nop			

0000a136 <.L3>:
    a136:	5a 43       	mov.b	#1,	r10	;r3 As==01

0000a138 <.L2>:
    a138:	7c 40 21 00 	mov.b	#33,	r12	;#0x0021
    a13c:	6d 4c       	mov.b	@r12,	r13	;
    a13e:	4e 4a       	mov.b	r10,	r14	;
    a140:	7c 40 21 00 	mov.b	#33,	r12	;#0x0021
    a144:	4d de       	bis.b	r14,	r13	;
    a146:	3d f0 ff 00 	and	#255,	r13	;#0x00ff
    a14a:	cc 4d 00 00 	mov.b	r13,	0(r12)	;
    a14e:	7c 40 21 00 	mov.b	#33,	r12	;#0x0021
    a152:	6d 4c       	mov.b	@r12,	r13	;
    a154:	4c 4a       	mov.b	r10,	r12	;
    a156:	7c e3       	xor.b	#-1,	r12	;r3 As==11
    a158:	4e 4c       	mov.b	r12,	r14	;
    a15a:	7c 40 21 00 	mov.b	#33,	r12	;#0x0021
    a15e:	4d fe       	and.b	r14,	r13	;
    a160:	3d f0 ff 00 	and	#255,	r13	;#0x00ff
    a164:	cc 4d 00 00 	mov.b	r13,	0(r12)	;
    a168:	0c 4a       	mov	r10,	r12	;
    a16a:	0c 5a       	add	r10,	r12	;
    a16c:	0a 4c       	mov	r12,	r10	;
    a16e:	3a 90 80 00 	cmp	#128,	r10	;#0x0080
    a172:	e2 23       	jnz	$-58     	;abs 0xa138
    a174:	e0 3f       	jmp	$-62     	;abs 0xa136

So how to we decompose the assembly program? Let’s create an abbreviated version of this program, that only includes the labels (symbols such as <.L2>) and the jump instructions (‘jmp and jnz for example). The following structure appears:

0000a11c <main>:
    // code block 1
0000a136 <.L3>:
    // code block 2
0000a138 <.L2>:
    // code block 3
    a172:	e2 23       	jnz	$-58     	;abs 0xa138
    a174:	e0 3f       	jmp	$-62     	;abs 0xa136

Now it’s easy to match the main program in C to the assembly. The portion marked code block 1 in the assembly holds all code from the C program from the start of main to the start of the while (1) loop. The portion marked code block 2 in the assembly holds all code from the C program from the beginning of while (1) to the start of the do .. while block in the C program. The portion marked code block 3 holds the inner portion of the while loop. The jump statements at the end of the assembly listing mark the end of code block 2 and code block 3 respectively.

Roadmap of the coming lectures

In this course we will start with an in-depth study of hardware/software of the MSP430. Over the next seven lectures, we will cover everything needed to design a hardware coprocessor for the MSP-430, including the following topics.

Lecture 2: The MSP430 architecture and its implementation in Verilog. Simulation of the MSP430 in Verilog.
Lecture 3: The embedded software design flow of the MSP-430. The linking process. Compiler opimization.
Lecture 4: The hardware synthesis flow of the MSP-430 for the DE-1 SoC kit.
Lecture 5: Performance evaluation using timers.
Lecture 6: Memory-mapped custom hardware for the MSP-430.
Lecture 7: Hardware/Software Communication. Using memory-mapped interfaces to call hardware modules from software.
Lecture 8: Hardware Acceleration. Overhead factors in hardware acceleration.

MSP-430 Core

The MSP-430 is a 16-bit microcontroller, used by Texas Instruments for a range of low-power microcontrollers. We will study an open-source implementation of the MSP-430 called openmsp430. The Verilog implementation, along with documentation for the design, can be found on https://opencores.org/projects/openmsp430.

We will briefly discuss how to simulate this design in Verilog, and then discuss the most important characteristics of the core. To simulate the Verilog implementation of the MSP-430, you have complete the software installation instructions for this course.

Open a Cygwin shell and clone the example repository for this lecture:

$ git clone https://github.com/vt-ece4530-f19/example-openmsp430

Then, run the simulation of the example C program on the MSP-430. This requires both the Modelsim simulator and MSP-430 C compiler to be available on the Cygwin search path. You can test this by the following command.

$ which msp430-elf-gcc.exe
/cygdrive/c/ti/msp430-gcc/bin/msp430-elf-gcc.exe 
$ which vsim
/cygdrive/c/intelFPGA_lite/18.1/modelsim_ase/win32aloem/vsim

In case you see ‘no vsim in …’ or a similar message, it means that your search path is not set correctly. Edit your ~/.bashrc in Cygwin to set the correct path (if you don’t know how to do this, post a question on Piazza).

To run the Verilog simulation on the command line, do the following:

$ cd openmsp430/core/sim/rtl_sim/run
$ ./run_c portcnt

This command starts a simulation script that completes the following steps.

Compile the portcnt example in ../src-c/portcnt into an executable
Convert the executable into a memory image for the MSP-430
Run the MSP-430 RTL simulation using modelsim using the testbench in ../src/-c/portcnt/portcnt.v

The simulation runs 10,000 clock cycles and then aborts:

# run -all
#  ===============================================
# |                 START SIMULATION              |
#  ===============================================
#  ===============================================
# |               SIMULATION FAILED               |
# |              (simulation Timeout)             |
#  ===============================================
#
# DMA REPORT: Total Accesses:           0 Total RD:           0 Total WR:           0
#             Total Errors:             0 Error RD:           0 Error WR:           0
#
# SIMULATION SEED:           0
#
# ** Note: $finish    : ../../../bench/verilog/tb_openMSP430.v(718)
#    Time: 500 us  Iteration: 0  Instance: /tb_openMSP430
# End time: 13:50:10 on Aug 29,2019, Elapsed time: 0:00:04
# Errors: 0, Warnings: 0

We can, however, run the same simulation in the Modelsim GUI and observe the waveform coming out of the chip. This works as follows:

Run the command line simulation as above, to compile the Verilog
Open Modelsim and change the working directory to openmsp430/core/sim/rtl_sim/run
Start the simulation using the testbench tb_openMSP430
Add selected signals to the Waveform Viewer. For example, dut/mem_backbine/pmem_.. signals to see the program memory accesses.
Run the simulation. Each clock cycle takes 50ns.

Homework 1 will be an assignment in this Verilog simulation environment.

OpenMSP430 System Architecture

There following two documents are helpful to understand the MSP-430. Both are posted under Technical Documentation.

The OpenMSP430 User Guide describes the Verilog implementation of OpenMSP430.
The MSP430x1x Family Users Guide, Chapter 3, describes the instruction set of the MSP-430, as well as the basic principles of the memory organization of the core.

The OpenMSP430 is based on the MSP430x1x series of microcontrollers from Texas Instruments. While the basic Verilog implementation of the OpenMSP430 does not have the same set of peripherals or low-level power-control features, it does provide binary compatibility with the official Texas Instruments design. That means that a program, compiled for MSP430x1x, will also run on the OpenMSP430.

Figure: System Architecture of the MSP430x1x MSP-430 System Architecture (Texas Instruments)

The system architecture of the MSP430x1x shows a 16-bit core with on-chip memory and a set of peripherals, all connected to a 16-bit address bus with 16-bit data words. There are two important implications of the 16-bit nature of the design.

The 16-bit address bus means that the MSP-430, in its standard configuration, can address 64K byte address locations.
The 16-bit data bus means that the MSP-430, in its standard configuration, can transport 16-bits of data with a single bus operation.

The figure in the openMSP430 manual is slightly different, and it reflects the structure of the Verilog code of the core.

In particular, there are three different bus structures: the program memory interface, the data memory interface, and the peripheral bus. All of these are driven from the same openMSP430 module, so that there can be only one transfer happening at a time: a program memory read, a data memory read/write, or a peripheral bus read/write. For our study, the peripheral bus is the most important one, as it integrates external hardware modules with the OpenMSP430.

The communication between external hardware modules and peripherals, and the software, is done through read/write operations on the MSP-430 core. Hence, the MSP-430 is the central exchange for this system. It reads data items and writes data items. The memory bus unifies multiple hardware concepts (such as fetching instructions from program memory, reading/writing variables from data memory, and accessing hardware peripherals) into a single logical concept.

The Serial Debug Interface (SD) sits between the execution unit and the bus. The Serial Debug Interface is used to download programs into the program memory (ROM or RAM in the figure) of the OpenMSP430, as well as to inspect processor registers and to control the execution of instructions. The Serial Debug Interface, when active, is in full control of the core. When the Serial Debug Interface is inactive, the OpenMSP430 core starts fetching instructions from program memory and executing them.

Figure: Design Structure of the OpenMSP430 OpenMSP430 Design Structure

MSP-430 Memory Map

Figure: MSP-430x1x Memory Map MSP430x1x Mem mory Map

The memory map from the MSP430 documentation shows byte-addresses. There are roughly four portions in the memory map:

The lower address ranges, from 0x0 to 0x1FF, are used for peripheral modules (as well as a few system registers).
Above that range, is a RAM area for data variable storage. This includes global variables as well as the stack. The RAM area extends from 0x200 to as much RAM is available.
At the top of the memory is a flash/ROM area for program storage. The upper region of the memory area holds space for interrupt vectors (including the reset vector that tells where program execution should start after reset). The ROM area extends downwards as deep as there is ROM/Flash available.

The RAM and ROM boundary are flexible, and defined by the amount of memory available. Hence, it is perfectly possible that there is a gap between the RAM and ROM with empty, unused locations. Such a gap is simply unused memory. Reading from it returns an undefined (unknown, unpredictable) value, while writing to it has no effect.

It is useful to briefly refer to the linker description file that we introduced last lecture. The MEMORY section of the linker description file is, in effect, a textual version of the memory map. Hence, you can also determine the size of the gap between RAM and ROM for the following linker description file:

MEMORY {
  SFR              : ORIGIN = 0x0000, LENGTH = 0x0010
  RAM (wx)         : ORIGIN = 0x0200, LENGTH = 0x4000
  ROM (rx)         : ORIGIN = 0xA000, LENGTH = 0x6000-0x20
  ...
}

MSP-430 Instruction Set

The MSP-430 has 16 registers, but several of them have a special purpose.

Register	Function
`R0`	Program Counter
`R1`	Stack Pointer
`R2`	Status Register or Constant Generator
`R3`	Constant Generator
`R4` .. `R15`	General Purpose

The advantage of having the program counter, stack pointer mapped into general-purpose registers is that it simplifies the instruction set. The MSP-430 is an excellent example of this principle. The core implements only 27 instructions, which can be split into three groups (dual-operand, single-operand, and jump). These instructions can be used with 7 addressing modes, so that the arguments from these addressing modes can come from a wide variety of source.

Adressing Mode	Example	Effect
Register	`MOV R5, R6`	`R5` written to `R6`
Index	`MOV 2(R5), 6(R6)`	`mem[R5+2]` written to `mem[R6+6]`
Symbolic	`MOV PC, TGT`	Jump to relative address
Absolute	`MOV PC, @TGT`	Jump to absolute address
Indirect	`MOV @R10, R11`	`mem[R10]` written to `R11`
Indirect Autoincrement	`MOV @R10+, R11`	`mem[R10]` written to `R11`; `R10` += 2
Immediate	`MOV #45, 0(R5)`	`45` written to `mem[R5]`

For the description of individual instructions, refer to the MSP430x1x Family Users Guide, Chapter 3.

MSP-430 Instruction Timing

As we are extensively analyzing the instruction timing of MSP-430 programs, it is useful to know how long instructions take to execute. Each instruction on the MSP-430 takes a fixed amount of clock cycles, and the amount of clock cycles completely depends on the addressing modes used by that instruction.

Single-operand instructions are described in Table 3-15 of Chapter 3. Dual-operand instructions are described in Table 3-16 of Chapter 3. Let’s look at an example.

Figure: MSP-430 partial instruction timing for Format-II instructions msp430timing

Let’s say there is an instruction

   XOR @R5, 8(R6)

This instruction uses indirect addressing for the source operand, and indexed addressing for the destination operand. According to the table, the number of cycles for such an instructtion is 5. This will be true for any two-operand instruction of the form ins @R5, 8(R6). The table also shows that the instruction has length two. This means that the instruction requires two 16-bit words to be encoded (32 bits).

Another example. The first three instructions of the main program of portcnt.lst are as follows. Determine their instruction timing, using Table 3-15 and 3-16 in the manual?

0000a11c <main>:
    a11c: 0a 12         push  r10   ;
    a11e: 5a 43         mov.b #1, r10 ;r3 As==01
    a120: 7c 40 22 00   mov.b #34,  r12 ;#0x0022

Conclusions

We introduced several aspects of the MSP430 architecture, and its implementation in Verilog. Over the next few lectures, we will study the MSP430 hardware and software in further detail, and investigate how to integrate custom hardware modules onto the peripheral bus.