Lecture 7 - Function Call Semantics
- Introduction
- The data semantics of function calls on MSP430
- The control semantics of function calls on MSP430
- The semantics of a hardware function call
- Conclusions
Introduction
To a programmer, the easiest manner to work with a coprocessor is a synchronous model, which makes the coprocessor look like a function call. Let’s say we have a coprocessor which takes two input arguments and produces one result, all of them integers. You could model the
execution of the coprocessor as a function call my_func
. The function call takes two arguments, hides how the arguments are routed from the software to the hardware, how the coprocessor is run, and how the result is retrieved.
main() {
...
my_func(&out, in1, in2); => build my_func as a hardware module
...
}
If we break down such a function call into smaller steps, it’s clear that there are two aspects to the design of such a function call: a data aspect and a control aspect.
-
The data aspect is concerned with moving data. The arguments of the hardware function call have to be moved from software to hardware, typically by writing into memory-mapped registers. Furthermore, the results from the hardware function have to be retrieved back from hardware into software, by reading from memory-mapped registers.
-
The control aspect is concerned with ensuring that the sequence of operations between software and hardware is maintained. Indeed, from a system perspective, the function
my_func
is called at a specific point in the execution ofmain
. On the other hand, a hardware module that implements this function call is fully parallel with the processor (that runs themain
function). There has to be a control signal from the processor and the hardware coprocessor such that the coprocessor starts to work at the right time. Furthermore, there has to be a status signal from the coprocessor back to the processor to indicate that the coprocessor has completed its operation.
In this lecture, we will study the semantics of a software function call. That is, we will study how software programmers solve the data aspect and the control aspect of a function call when software is calling software. This experiment will give us the following insight.
-
First, we will learn how a compiler implements function calls. As a result, we also understand better the overhead associated with a function call. The overhead is the additional effort that a processor must do in order to provide the comfort of a function call to the programmer.
-
Next, when we understand how a software function call works, we will have a basis to build a hardware function call using memory-mapped registers. Next lecture, we will build a data semantics and a control semantics for hardware coprocessors, such that we can call them from within a C program.
Now, let’s take a closer look at the implementation of software function calls on the MSP-430.
The data semantics of function calls on MSP430
We’ll start by studying a (non-optimized) function call on MSP430.
The MSP430 Application Binary Interface contains important details regarding
the calling conventions. The default register organization is as
shown in the following table. Register R4-R10 are callee saved,
which means that functions should preserve them across a function call.
Register R12-R15 are used to pass arguments into the function, and
to retrieve return values. As each register is 16 bit, this means
that up to 64 bits can be passed in one function call. This could be
four 16-bit variables (unsigned
), two 32-bit variables (unsigned long
),
one 64-bit variable (unsigned long long
).
Register | Alias | Callee Saved | Role |
---|---|---|---|
R0 | PC | Program Counter | |
R1 | SP | yes | Stack Pointer |
R2 | SR | Status Register | |
R3 | CG | Constant | |
R4 | yes | ||
R5 | yes | ||
R6 | yes | ||
R7 | yes | ||
R8 | yes | ||
R9 | yes | ||
R10 | yes | ||
R11 | no | ||
R12 | no | arg1 & return1 | |
R13 | no | arg2 & return2 | |
R14 | no | arg3 & return3 | |
R15 | no | arg4 & return4 |
The registers
R12 to R15 are allocated in sequence. For example, a function that
uses two integer arguments will use R12 for the first argument,
and R13 for the second. If the function would take two long
arguments,
then R13:R12 is used for the first argument, and R14:R15 is used for
the second argument. The lowest-numbered register will hold the least significant
word. When more then four registers are needed for arguments, the compiler will
build a data structure called a stack frame.
Consider the following function which implements a 32-bit multiply from two 16-bit arguments. We’ll compile this program in two different ways - with and without optimization - and we’ll compare the resulting assembly code.
unsigned long mymul(unsigned a, unsigned b) {
unsigned long r;
r = (unsigned long) a * b;
return r;
}
unsigned a, b;
unsigned long r;
int main(void) {
a = 35;
b = 60;
r = mymul(a, b);
return 0;
}
Here is the compilation result with the -Os
optimization flag (optimization for size). All examples so far have been using this flag:
/cygdrive/c/ti/msp430-gcc/bin/msp430-elf-gcc \
-I../hal \
-Wall \
-Os \ # Optimization for size
-mmcu=msp430c1111 \
-Ic:/ti/msp430-gcc/include \
-c main.c \
-o main.o
The input arguments of this function are stored in R12 (a) and R13 (b).
Since the MSP430 does not have a multiply instruction, the compiler substitutes
the multiply instruction by a call to __mspabi_mpyl
a function that computes
a 32-bit multiplication (see Help Function API, Chapter 6 in MSP430 Application Binary Interface).
int32 __mspabi_mpyl(int32 x, int32 y);
This internal helper function __mspabi_mpyl
takes two 32-bit integers
as argument. Therefore, we have to organize the two 16-bit arguments in R12 and R13
as two 32-bit arguments in R13:R12 and R15:R14. This explains the format of the
assembly code. Finally, the return argument from __mspabi_mpyl
is a 32-bit
integer, which is the same as the return argument from mymul
. Therefore,
the function can return directly after the call to __mspabi_mpyl
, since
the return argument is already stored in R13:R12 as expected.
0000fc3a <mymul>:
fc3a: 0e 4d mov r13, r14 ;
fc3c: 0f 43 clr r15 ;
fc3e: 0d 43 clr r13 ;
fc40: b0 12 60 fc call #-928 ;#0xfc60
fc44: 30 41 ret
0000fc60 <__mspabi_mpyl>:
fc60: 0a 12 push r10 ;
...
If we compile the program without the -Os
flag, the assembly is more complicated. Without optimization, the compiler goes into a not-so-smart-but-always-correct
strategy of code generation that will always build a stack frame. Hence, studying
the assembly code for mymul
compiled without optimization helps us to understand
how a stack frame works.
/cygdrive/c/ti/msp430-gcc/bin/msp430-elf-gcc \
-I../hal \
-Wall \
-mmcu=msp430c1111 \
-Ic:/ti/msp430-gcc/include \
-c main.c \
-o main.o
Take a moment to look over this code. While the code is not optimal,
every instruction, every line can
be logically explained.
The stack frame is a data structure that
organizes every local variable, including the arguments, on the stack.
The push r10
at the beginning and the pop r10
at the end is an
example of a callee-saved register. The sub #8, r1
and add #8, r1
at the beginning and the end make room on the stack (and remove room
from the stack) by moving the stack pointer. 8 bytes is enough room to
store two 16-bit arguments (a and b) and a 32-bit return value (r).
The use of the stack frame does not change how the compiler passes
arguments. Hence, the function mymul
still has to be called with
R12 holding a, R13 holding b, and R13:R12 holding the return value.
That creates additional move instructions between the stack frame and
the registers.
0000fc3a <mymul>:
fc3a: 0a 12 push r10 ;
fc3c: 31 82 sub #8, r1 ;r2 As==11
fc3e: 81 4c 02 00 mov r12, 2(r1) ;
fc42: 81 4d 00 00 mov r13, 0(r1) ;
fc46: 1a 41 02 00 mov 2(r1), r10 ;
fc4a: 0c 4a mov r10, r12 ;
fc4c: 0d 43 clr r13 ;
fc4e: 2a 41 mov @r1, r10 ;
fc50: 0e 4a mov r10, r14 ;
fc52: 0f 43 clr r15 ;
fc54: b0 12 92 fc call #-878 ;#0xfc92
fc58: 81 4c 04 00 mov r12, 4(r1) ;
fc5c: 81 4d 06 00 mov r13, 6(r1) ;
fc60: 1c 41 04 00 mov 4(r1), r12 ;
fc64: 1d 41 06 00 mov 6(r1), r13 ;
fc68: 31 52 add #8, r1 ;r2 As==11
fc6a: 3a 41 pop r10 ;
fc6c: 30 41 ret
0000fc92 <__mspabi_mpyl>:
fc92: 0a 12 push r10 ;
...
Here is the stack frame for the function mymul
.
Address | Content | Offset |
---|---|---|
A | Return Address | |
A-2 | r10 (callee saved) |
|
A-4 | r , high word |
6(r1) |
A-6 | r , low word |
4(r1) |
A-8 | argument a |
2(r1) |
A-10 | argument b |
0(r1) , @r1 |
In general, when a function is called with a large number of arguments,
larger then the number of arguments that would fit in registers, then
the stack will also be used to pass those arguments.
The stack frame offers two advantages compared to using registers. First, it enables a function to use an arbitrary number of arguments. Second, it supports recursion.
The control semantics of function calls on MSP430
Consider next how the main function will call mymul
.
First, it prepares the function arguments.
Next, it calls the function (pushes the PC on the stack, and
jumps to the function). When mymul
executed the return
instruction, the PC is retrieved from the stack, and control
returns to the instruction after the function call.
Hence, the control semantics of a function call is organized
using the stack as well.
unsigned a, b;
unsigned long r;
int main(void) {
a = 35;
b = 60;
r = mymul(a, b);
return 0;
}
/* assembly generated without -Os flag:
*
* 0000fc6e <main>:
* fc6e: b2 40 23 00 mov #35, &0x0206 ;#0x0023
* fc72: 06 02
* fc74: b2 40 3c 00 mov #60, &0x0200 ;#0x003c
* fc78: 00 02
* fc7a: 1c 42 06 02 mov &0x0206,r12 ;0x0206
* fc7e: 1d 42 00 02 mov &0x0200,r13 ;0x0200
* fc82: b0 12 3a fc call #-966 ;#0xfc3a
* fc86: 82 4c 02 02 mov r12, &0x0202 ;
* fc8a: 82 4d 04 02 mov r13, &0x0204 ;
* fc8e: 4c 43 clr.b r12 ;
* fc90: 30 41 ret
*/
The semantics of a hardware function call
Does the stack frame make sense when we call a coprocessor? Not at all! A coprocessor does not use the stack to pass arguments from the main function to the ‘hardware function’. Instead, the coprocessor uses memory-mapped registers. So we will have to revise how to pass arguments to the coprocessor.
Furthermore, we will have to devise our own ‘control semantics’, since
the call
instruction only works when we call a software function.
/* original design
*
* unsigned long mymul(unsigned a, unsigned b) {
* unsigned long r;
* r = (unsigned long) a * b;
* return r;
* }
*
*/
unsigned long mymul_in_hw(unsigned a, unsigned b) {
MEMORY_MAPPED_A = a;
MEMORY_MAPPED_B = b;
/* do the hardware magic here .. */
return MEMORY_MAPPED_RLONG;
}
In the next lecture, we will look at a detailed implementation of a hardware multiplication coprocessor using this concept, and we will compare its performance to the performance of the software implementation.
Conclusions
We studied the function call semantics for the MSP-430 ABI. There is overhead associated with each function call. That overhead is associated with moving data arguments, and passing control from one function to the next. Without optimization, function calls involve quite a bit of preparation on behalf of the compiler. Luckily, the compiler is able to remove most of the overhead when you use the appropriate optimization flag.