Introduction

In this lecture we summarize what we’ve learned about the MSP-430, and we identify the benefits and challenges of hardware software codesign using this microcontroller. We will further compare the features of the MSP-430 to those of the more advanced processors that will be the topic after the midterm. Finally, we’ll enumerate important points to remember using a study guide.

An MSP-430 summary

Instruction Set

Register Function
R0 Program Counter
R1 Stack Pointer
R2 Status Register or Constant Generator
R3 Constant Generator
R4 .. R15 General Purpose


Mode Syntax Description
Register Rn Register contents are operand
Indexed X(Rn) (Rn + X) points to the operand. X is stored following the opcode
Symbolic ADDR (PC + X) points to the operand. X is stored following the opcode. Used for MSP-430 extended addressing mode (MSP-430X).
Absolute &ADDR Operand at address ADDR. ADDR is stored following the opcode
Indirect @Rn Operand at address Rn
Indirect autoincrement @Rn+ Operand at address Rn. Rn is incremented afterwards by 1 (byte instructions) or 2 (word instructions)
Immediate mode #N Operand is value #N. N is stored following opcode


The microcontroller we have studied has a 16-bit processor with a small instruction set. The microcontroller excels in orthogonality: accomplishing a lot with only a limited set of features that can be combined in numerous ways. The microcontroller has 16 registers, 27 instructions and 7 addressing modes. Several of the registers have a special purpose.

This allows to simplify the instruction set. There are 27 basic instructions, while on top of that, there are an additional 24 emulated instructions. Such instructions can be expressed in terms of the 27 basic instructions.

  • Example 1: R1 serves as the stack pointer and R0 serves as the program counter. A return instruction can therefore be expressed as a move-instruction with indirect-autoincrement addressing mode. In the MSP430x1x Family Users Guide, this instruction is listed as being emulated by mov @SP+, PC (which in fact could be written as mov @R1+, R0!).

Figure: MSP-430 return instruction msp430-return


  • Example 2: The set-carry flag instruction uses the bit-set to write a 1-bit (R3) into the status register (R2). This instruction can be emulated as BIS #1, SR. This instruction makes use of a bit-set instruction (BIS), and it makes special use of register 2 as the status register and register 3 as a constant generator.

Figure: MSP-430 setc instruction msp430-setc


The list of basic instructions and emulated instructions can be found in Table 3-17 of the MSP430x1x Family Users Guide. The instructions marked with a cross are the emulated versions. This table shows, for example that RLA R10 (rotate left arithmetically) is an emulated instruction, while RRA R10 (rotate right arithmetically) is NOT an emulated instruction.

Here is a good midterm question: Explain why RLA can be emulated with ADD, and explain why this trick does not work with RRA.

Memory Map

Figure: MSP-430 Memory Map msp430memorymap

The MSP-430 memory map offers a unified view on program, data and peripherals. The MSP-430 memory is a byte-addressable region that holds 64Kbytes. The MSP-430 uses a little-endian organization, meaning that lower-weighted bytes are stored in lower-weighted memory addresses. The bottom 512 addresses are used for ‘special function registers’ and peripherals. Above that region, there is read/write storage for data (RAM) including the global variables and stack. The stack starts from top of RAM, and grows down. The top region of memory is reserved for program memory (ROM), with the interrupt vector table located at the topmost memory locations. In between the program memory and data memory, there is a potential gap if the ROM, RAM and peripheral region do not fully cover the 64KByte memory range.

For a given size of RAM (R bytes) and ROM (S bytes), the default memory boundaries are as follows.

  • The peripheral region extends from 0x0 to 0x1FF
  • The RAM region extends from 0x200 to 0x200 + R
  • The ROM region extends from (0x10000 - S) to (0xFFFF - 0x20)
  • The interrupt vector table extends from 0xFFE0 to 0xFFFF

Of course, the values R and S have to be chosen such that (0x200 + R) < (0x10000 - S). The openMSP430 environment provides additional flexibility in the size of the interrupt vector table, and the size of the peripheral memory region.

The physical layout of the memory map is encoded in the linker description file, a file that helps the compiler organize the code in MSP-430 memory. Look for a structure that starts with MEMORY. Here is an example that uses the symbols S to reflect RAM and ROM size, respectively. The linker description file must reflect the physical memory layout. For example, thanks to the linker description file, the compiler determines the value of the reset interrupt vector, the entry point of the program. The compiler uses the linker description file also to decide on the correct initial value for the stack pointer.

MEMORY {
  SFR              : ORIGIN = 0x0000, LENGTH = 0x0010
  RAM (wx)         : ORIGIN = 0x0200, LENGTH = 0x200 + S
  ROM (rx)         : ORIGIN = 0x10000 - R, LENGTH = R - 0x20
  VECT1            : ORIGIN = 0xFFE0, LENGTH = 0x0002
  VECT2            : ORIGIN = 0xFFE2, LENGTH = 0x0002
  VECT3            : ORIGIN = 0xFFE4, LENGTH = 0x0002
  VECT4            : ORIGIN = 0xFFE6, LENGTH = 0x0002
  VECT5            : ORIGIN = 0xFFE8, LENGTH = 0x0002
  VECT6            : ORIGIN = 0xFFEA, LENGTH = 0x0002
  VECT7            : ORIGIN = 0xFFEC, LENGTH = 0x0002
  VECT8            : ORIGIN = 0xFFEE, LENGTH = 0x0002
  VECT9            : ORIGIN = 0xFFF0, LENGTH = 0x0002
  VECT10           : ORIGIN = 0xFFF2, LENGTH = 0x0002
  VECT11           : ORIGIN = 0xFFF4, LENGTH = 0x0002
  VECT12           : ORIGIN = 0xFFF6, LENGTH = 0x0002
  VECT13           : ORIGIN = 0xFFF8, LENGTH = 0x0002
  VECT14           : ORIGIN = 0xFFFA, LENGTH = 0x0002
  VECT15           : ORIGIN = 0xFFFC, LENGTH = 0x0002
  RESETVEC         : ORIGIN = 0xFFFE, LENGTH = 0x0002
}

While the memory locations are computed as byte addresses, the internal processor wordlength is 16 bit rather than 8 bit. Instructions that have memory operand therefore come in two formats: word format and byte format. A word-format instruction will read or write two (consecutive) bytes from memory, while a byte-format instruction will read or write a single byte from memory. The memory is physically organized as a 16-bit memory, and the MSP-430 has a 16-bit bus (16-bit data) to access the memory. This affects byte operations, as well - refer to the ‘Read and Write Bus Cycle’ below.

When a byte instruction has a 16-bit register as its destination, the upper byte of the register is cleared. When a byte instruction has a (16-bit) memory as its destination, appropriate bus control signals will be used to ensure that the proper byte is written.

The folliwing test program performs byte-reads and byte-writes from and to memory. The program reads characters from a string, adds them all up, then writes back the result to the string.

#include "omsp_system.h"

int main(void) {
  volatile unsigned char b[5] = "01234";

  register unsigned k;

  WDTCTL = WDTPW | WDTHOLD;  // Disable watchdog timer

  while (1) {
    k = b[0];
    k += b[1];
    k += b[2];
    k += b[3];
    k += b[4];

    b[0] = k;
    b[1] = k;
    b[2] = k;
    b[3] = k;
    b[4] = k;
  }

  dint();

  return 0;
}

The assembly listing of the while-loop demonstrates these byte operations. The array b is mapped to the stack, and is read using indexed addressing into register r12. Next, the sum of the bytes is copied back to b using simular byte-write operations.

0000a042 <.L2>:
    a042:       5c 41 01 00     mov.b   1(r1),  r12     ;
    a046:       0a 4c           mov     r12,    r10     ;
    a048:       5c 41 02 00     mov.b   2(r1),  r12     ;
    a04c:       0a 5c           add     r12,    r10     ;
    a04e:       5c 41 03 00     mov.b   3(r1),  r12     ;
    a052:       0a 5c           add     r12,    r10     ;
    a054:       5c 41 04 00     mov.b   4(r1),  r12     ;
    a058:       0a 5c           add     r12,    r10     ;
    a05a:       5c 41 05 00     mov.b   5(r1),  r12     ;
    a05e:       0a 5c           add     r12,    r10     ;
    a060:       4c 4a           mov.b   r10,    r12     ;
    a062:       c1 4c 01 00     mov.b   r12,    1(r1)   ;
    a066:       4c 4a           mov.b   r10,    r12     ;
    a068:       c1 4c 02 00     mov.b   r12,    2(r1)   ;
    a06c:       4c 4a           mov.b   r10,    r12     ;
    a06e:       c1 4c 03 00     mov.b   r12,    3(r1)   ;
    a072:       4c 4a           mov.b   r10,    r12     ;
    a074:       c1 4c 04 00     mov.b   r12,    4(r1)   ;
    a078:       4c 4a           mov.b   r10,    r12     ;
    a07a:       c1 4c 05 00     mov.b   r12,    5(r1)   ;
    a07e:       30 40 42 a0     br      #0xa042         ;

When simulating this program (in Modelsim), you can trace the values of r12 and find that they subsequently read 0030, 0031, 0032, and so on during the reading of b. Next, you’ll find that the resulting sum fa is copied back to the memory holding b as a sequence of writes to alternating low-byte and high-byte memory locations.

Read and Write Bus Cycle

Signal Direction Width Function
PER_EN output 1 Active high. Indicates an active read or write access
PER_ADDR output 14 Peripheral word address
PER_DOUT input 16 Peripheral to MSP430 data bus
PER_DIN output 16 MSP430 to Peripheral data bus
PER_WE output 2 Indicates an active-write on a byte of a word


We studied the bus cycle of the peripheral bus and concluded that the MSP-430 bus uses a simple, straightforward interconnect mechanism. Every bus access completes in a single cycle, and a bus cycle can either read from memory(-mapped peripheral) or write to memory(-mapped peripheral). The write strobes PER_WE are used to restrict a word-level bus transfer to a byte transfer.

Using the bus signals, we write bus decoders for memory-mapped registers, which still look like memory-locations to the software, but which are hardware registers in their own right. These memory-mapped registers were the first true indication of the hardware-software codesign nature of the class. Using an abstraction common to a software program (memory read/write), we are able to directly influence hardware! What a powerful notion that really is. In Lecture 6 we discussed the design of memory-mapped registers in depth.

Interrupt Processing

We briefly discussed the processing in interrupts on the MSP-430.

There are 16 interrupt sources by default. Each interrupt has a number, from 0 to 15. Interrupt 0-13 are standard maskable interrupts. Interrupt 14 is a non-maskable interrupt, and interrupt 15 is the reset pin.

Interrupt Source Vector Address Priority
Power-up, external reset, watchdog, flash password 0FFFEh 15, highest
NMI, oscillator fault, flash memory access violation 0FFFCh 14
device-specific 0FFFAh 13
device-specific 0FFF8h 12
device-specific 0FFF6h 11
Watchdog timer 0FFF4h 10
device-specific 0FFF2h 9
device-specific 0FFF0h 8
device-specific 0FFEEh 7
device-specific 0FFECh 6
device-specific 0FFEAh 5
device-specific 0FFE8h 4
device-specific 0FFE6h 3
device-specific 0FFE4h 2
device-specific 0FFE2h 1
device-specific 0FFE0h 0, lowest


The interrupt latency is 6 cycles, meaning that it takes 6 cycles from the moment the MSP-430 hardware starts processing the interrupt until the execution of the first instruction of the interrupt service routine. Here is the sequence of events.

  1. The hardware asserts an interrupt request line, IRQ. There are 14 interrupt request lines for each of the 14 maskable interrupts.

  2. If interrupts are enabled, the MSP-430 completes the current instruction, and pushes the PC (pointing to the next instruction) as well as the status register on the stack.

  3. If multiple interrupts are pending, the highest-priority interrupt is selected. The status register is cleared, which disables further interrupts.

  4. The content of the interrupt vector corresponding to the interrupt is loaded into the PC and the program continues processing with the first instruction of the interrupt service routine.

  5. The interrupt service routine must ensure that the interrupt flag is cleared, which implies (at the level of the hardware) that the corresponding IRQ line is de-asserted.

Exiting from an interrupt service routine takes 5 clock cycles.

  1. The interrupt service routine terminates by executing IRET, which pops the status register from the stack - thereby re-enabling interrupts.

  2. The IRET then pops the PC from the stack and continues execution from that point.

Interrupt service routines require special handling by the compiler because of two reasons. First, they need to be ended using an IRET instruction rather then a RET instruction. Second, the entry point of the interrupt service routine has to be saved as an interrupt vector.

For example, here is the ISR that we made in Lecture 6 for the memory-mapped register example, followed by the disassembled ISR. Note that the compiler numbers interrupts from 1, rather than 0. Interrupt 2 is the second-lowest priority interrupt.

void __attribute__ ((interrupt(2))) demo2regisr (void) {
  de1soc_hexlo(SEC);
}
/*
Disassembly of section __interrupt_vector_2:

0000ffe2 <__interrupt_vector_2>:
    ffe2:       28 fc           interrupt service routine at 0xfc28

Disassembly of section .text:

0000fc28 <demo2regisr>:
    fc28:       0f 12           push    r15             ;
    fc2a:       0e 12           push    r14             ;
    fc2c:       0d 12           push    r13             ;
    fc2e:       0c 12           push    r12             ;
    fc30:       0b 12           push    r11             ;
    fc32:       0a 12           push    r10             ;
    fc34:       09 12           push    r9              ;
    fc36:       08 12           push    r8              ;
    fc38:       07 12           push    r7              ;
    fc3a:       06 12           push    r6              ;
    fc3c:       05 12           push    r5              ;
    fc3e:       04 12           push    r4              ;
    fc40:       5c 42 16 01     mov.b   &0x0116,r12     ;0x0116
    fc44:       b0 12 98 fc     call    #-872           ;#0xfc98
    fc48:       34 41           pop     r4              ;
    fc4a:       35 41           pop     r5              ;
    fc4c:       36 41           pop     r6              ;
    fc4e:       37 41           pop     r7              ;
    fc50:       38 41           pop     r8              ;
    fc52:       39 41           pop     r9              ;
    fc54:       3a 41           pop     r10             ;
    fc56:       3b 41           pop     r11             ;
    fc58:       3c 41           pop     r12             ;
    fc5a:       3d 41           pop     r13             ;
    fc5c:       3e 41           pop     r14             ;
    fc5e:       3f 41           pop     r15             ;
    fc60:       00 13           reti
*/

Function Call Semantics

The MSP430 Application Binary Interface describes the calling conventions used by the MSP-430.

Arguments are passed down through registers R12, R13, R14, R15. In some cases, the compiler will also use R8, R9, R10, R11 for additional arguments. Function return arguments come back through R12, R13, R14, R15. If additional storage is needed for arguments, the compiler will spill over the arguments on the stack.

The best way to study how the ABI calling conventions work is using small examples. It’s a good idea to keep the level of compiler optimization to the minimum, so that the assembly code still reflects the C code (for example, function inlining and expression strength reduction can significantly alter the code).

Here is an example. The function myadd takes for arguments - two input arguments and two return arguments (pointers).

unsigned  myadd(unsigned  a, unsigned b,
                unsigned *e, unsigned *f) {
  *e = a + b;
  *f = a - b;
  return 1;
}

We now look at the assembly code and try to explain the purpose of each line in the listing.

0000fc22 <myadd>:
    ; make room for 8 elements on the stack
    fc22:       31 82           sub     #8,     r1      ;r2 As==11

    ;       +-----------------+ 
    ;       |         a       |  6(r1)
    ;       +-----------------+
    ;       |         b       |  4(r1)
    ;       +-----------------+
    ;       |        &e       |  2(r1)
    ;       +-----------------+ 
    ;       |        &f       |  0(r1)
    ;       +-----------------+

    fc24:       81 4c 06 00     mov     r12,    6(r1)   ;
    fc28:       81 4d 04 00     mov     r13,    4(r1)   ;
    fc2c:       81 4e 02 00     mov     r14,    2(r1)   ;
    fc30:       81 4f 00 00     mov     r15,    0(r1)   ;

    ; a + b -> r13
    fc34:       1d 41 06 00     mov     6(r1),  r13     ;
    fc38:       1d 51 04 00     add     4(r1),  r13     ;

    ; r13 -> *e
    fc3c:       1c 41 02 00     mov     2(r1),  r12     ;
    fc40:       8c 4d 00 00     mov     r13,    0(r12)  ;

    ; a - b -> r13
    fc44:       1d 41 06 00     mov     6(r1),  r13     ;
    fc48:       1d 81 04 00     sub     4(r1),  r13     ;

    ; r13 -> *f
    fc4c:       2c 41           mov     @r1,    r12     ;
    fc4e:       8c 4d 00 00     mov     r13,    0(r12)  ;

    ; return argument = 1
    fc52:       5c 43           mov.b   #1,     r12     ;r3 As==01

    ; destroy stack frame and exit
    fc54:       31 52           add     #8,     r1      ;r2 As==11
    fc56:       30 41           ret

MSP-430 vs Nios vs ARM

With this lecture, we’ve reached the end of our discussion on MSP-430. We have demonstrated how a custom hardware module can be integrated into a software function, resulting in an overall speedup. The speedup is due to the parallellism offered in hardware. Computations can be mapped onto parallel gates, while the same computations would take many clock cycles when executed sequentially on an MSP-430. And that touches very much upon the essential aspect of hardware/software codesign: we trade in a flexible, but sequential software function for a fixed, but specialized hardware implementation of that function. This trade-off is at the core of so many electronic devices that we take for granted: cell phones, laptops, wireless routers, televisions, cars (yes, even those).

Processor Improvements

Now, you may wonder why we would spend time on Nios-II and ARM when we have already touched upon the core of hardware/software codesign. The reason is that Nios-II and ARM are much more representative of what we call typical embedded processor cores, as compared to the MSP-430. Each of these cores has its purpose and objectives, but silicon integration favors devices that deliver the best performance, and Nios-II and ARM are a step closer in the direction of performance.

Over the next lectures, we will discuss Nios-II and ARM, and we will discover how these cores achieve a higher performance, and what that means for hardware/software codesign. Here are a few basic differences.

  • Pipelining: The MSP-430 is essentially a multi-cycle processor. Each instruction takes several instructions to execute, and the complexity of each instruction is essentially determined by the addressing modes of the operands. The Nios-II and ARM cores are pipelined processor, and strive for a one-cycle-per-instruction execution rate. The Nios-II uses up to 6 pipeline stages, while an ARM A9 uses an out-of-order 8-stage pipeline. The result of this additional performance is that the instruction cycle is considerably more complicated at the hardware level. Associating instructions and clock cycles becomes, in effect, much harder.

  • Word Length The MSP-430 is a 16-bit processor, while Nios-II and ARM are 32-bit processors. A wider datapath means more processing power - provided that we have sufficient number crunching at hand to keep these 32-bits well utilized. Having a processor that churns around bits faster is a challenge for the hardware, too. The hardware interface must scale proportionally with the processor datapath to keep the pace. Hence, hardware coprocessors for Nios-II and ARM will be bulkier.

  • Memory Space The memory space on the MSP-430 is tiny, just a few kilobyte for most practical systems. In contrast the ARM A9 on the DE1-SoC board has a 1GB off-chip DDR RAM memory. That is enough to run a sizable operating system as well as software applications with high complexity. When such an operating system comes into play, the hardware-software interface becomes much more challenging, as well. An operating system may come with virtual memory, and the ability to seamlessly switch the processor from one software task to the other. How can a coprocessor keep track of all of this?

  • Bus The MSP-430 bus is a simple single-cycle bus that completes input/output in a single bus cycle. In the Nios II and ARM subsystems, busses become more sophisticated, primarily to deal with the increased demand for an efficient communication pathway between the high-performance processor and the memory. The AMBA and advanced AXI bus systems enable very sophisticated data transfers. But again, this comes as a challenge to the hardware designer, who now has to develop a memory-mapped interface that is able to deal with such advanced bus transactions.

  • Processor Functional Units The MSP-430 excels in simplicity, but as a result it misses much of the basic functionality, too. For example, the MSP-430 has no native integer multiply and divide instructions, and it has no support for floating point computations. In contrast the Nios-II and ARM do have built-in multiply/dvide instructions, and may even have dedicated floating point hardware. Hence, this is yet another factor in performance gain of these systems over MSP-430. Not only are they faster (in clock speed), they are also able to do more work, more efficiently, in fewer clock cycles. Whereas the MSP-430 has to make use of ‘compiler helper functions’, the Nios-II and ARM can execute these operations direcrly in hardware.

Hence, there are multiple solid reasons to investigate the Nios-II and ARM in the context of hardware/software codesign. The interaction with software becomes more challenging compared to the MSP-430, but the rewards are numerous, too.

On the limit of performance

Then, what is the limit of performance in silicon? We can take two views on this question, depending on whether you go by the technological limit or else by the architectural limit.

The technological limit reflects how hard you can drive the silicon, and, it is commonly understood that power density of computer chips is one of the big challenges in driving the integration forward. IEEE Design and Test Magazine recently published a special issue on Dark Silion, the notion that the power density of modern multi-processor IC’s is so stringent that we can no turn on all cores at once. An important cause of this limit is that silicon technology no longer scales down power as quickly as feature size - a limitation called the end of Dennard scaling. As a result, the power density of chips starts to increase, and we have to turn off transistors to prevent them from overheating.

Figure: End of Dennard Scaling [Shafique 17]

dennard

The architectural limit reflects how efficient we are at making silicon transistors do useful things. And you can make a good guess at it, if you know the performance and the complexity of a modern processor. For example, an Intel i7 920 core (2007, 45 nm technology, 263 square mm die) contains 731 million transistors. That’s a whole lot, enough to build 816,000 32-bit adders (assuming that a 1-bit full adder requires 28 transistors). Now if you take these 816,000 adders and you drive them at the same speed as the Intel core (2.93GHz), you get an engine that runs at 2,390,000 Giga-operations per second. Of course this is far beyond the technological limit, and could never be practically built. But still, it reflects how good silicon intrinsically is at processing.

In one square millimeter of 45 nm CMOS silicon, we can run 2,390,000 / 263 = 9,090 Giga operations per second. Compare this number to what processor typically obtain. The Intel i7 920 core mentioned earlier delivers 28 Operations operations per clock cycle, or, at 2.93GHz, 82.3 Giga-operations per second. Per square millimeter, this i7 delivers 82.3 / 263 = 0.313 Giga operations per second. Hence the ratio between intrinsic silicon performance, and that of a typical processor is 29,040. While this is a purely theoretical number, it demonstrates just how powerful silicon technology is at doing digital logic.

Of course, there are many proposals to improve the efficiency of silicon. Increasing the computational parallellism is one strategy. Lowering the voltage supply is another important one. Modern processors are designed to operate at reduced voltage supply. This fundamentally changes the characteristics of the transistors, which no longer operate as switches. However, the net gain is a significant increase in computational efficiency (operations per second and per watt). The following figure, taken from the same special IEEE Design and Test issue, illustrates the performance improvements obtained in an Intel processor by lowering the voltage supply.

Figure: Near Threshold Computing [De 17]

nvt

Study Guide

The midterm covers Lecture 1 to 8 included. The lecture materials includes the posted notes and the examples discussed in class through handouts or through code distributed as git repositories. You do not have to memorize manuals, but you should be familiar with the technical aspects of MSP430 hardware and software as discussed in class. For example, you should have no problem to understand the assembly listings we discussed in class, or the relation between an MSP430 linker description file and the physical memory organization of an MSP430. The questions on the midterm are design-oriented, which means that they will test your knowledge in the context of short design problems. Hence, insight into the course content is much more important than memorizing the content.

  • Lecture 1 - Introduction
    • Format and meaning of assembly listings
    • Connection between C code and corresponding assembly code
    • Sections in a compiler output
    • Format and meaning of linker description file
    • Code Example: portcnt
  • Lecture 2 - MSP-430
  • Lecture 3 - MSP430 Software Flow
    • C software toolflow for MSP430, file formats from C to binary
    • Data types in C and their length in MSP430 GCC
    • volatile storage specifier
    • Compiler sections: text, data, bss
    • Compiler command line (compiler, linker, loader, listing generator)
    • Code Examples: tightloop
  • Lecture 4 - MSP430 Hardware Flow
    • System architecture of MSP430: memories, core, peripherals and buses (PMEM, DMEM, PER).
    • Mapping of MSP430 onto an FPGA board.
    • The msp430de package, memory configuration
    • Hardware abstracion library
    • Code Example: msp430de
  • Lecture 5 - Performance Evaluation
    • Resolution, range, accuracy and precision of a timing measurement
    • The TimerA module and its use as 16-bit Timestamp counter
    • Overhead on timer measurement
    • Detecting timer wrap-around
    • Accuracy and precision of TimeStamp counter
    • Dealing with target uncertainty
    • Code Example: pref430
  • Lecture 6 - The Memory-mapped Register
    • Physical implementation of the MSP430 peripheral bus: address, datain, dataout, enable, write-strobes.
    • Address decoding for memory-mapped registers (read and write)
    • Verilog implementation and integration of memory-mapped register
    • Adding interrupts to memory-mapped peripherals
    • Code Example: memmap430
  • Lecture 7 - Function Call Semantics
    • The MSP430 Application Binary Interface
    • Implementation of a function call with argument passing
    • Compiler utility (or helper) functions: why they are needed and what they achieve
  • Lecture 8 - MSP-430 Coprocessor
    • Design integration of a memory-mapped coprocessor onto the MSP430 peripheral bus
    • The premise of hardware acceleration: ideal acceleration versus practically achievable acceleration
    • Speedup evaluation for a practical example
    • Code Example: mul64