Real-Time Input Output 

The purpose of this lecture is as follows.

To discuss the mechanisms by which signal processing is done in real-time
To compare two different methods of implementing input/output (from ADC/ to DAC)
To discuss essential tools for performance assessment of real-time DSP programs.

Basics of Real-Time Operation 

Real-time Signal Processing fits in a category of problems known as hard-real-time problems. If you want to generate a signal with a sampling frequency of, say 32 kHz, then you have to guarantee that every $\frac{1}{32KHz} = 31.25 \mu s$ , a new sample can be sent to the Digital-to-Analog converter. If the sample comes even one microsecond too late, then the signal will lose its fixed sample frequency of 32 kHz, resulting in severe signal distortion.

Furthermore, because many real-time digital-signal-processing (DSP) programs transform and process signals as they are received, these DSP programs typically are not developed to ‘compute ahead’: real time signals are not processed in batch, but rather as they appear, in real time. If an Analog to Digital Converter is configured to sample a signal at 32 kHz, then it will deliver a new input to the DSP program every $31.25 \mu s$ . Your DSP program has to complete all the computations needed for one single sample within $31.25 \mu s$ . If it can complete the computations sooner, that’s fine; your processor could take a rest, switch into sleep mode and save power. However, the computations must complete within $31.25 \mu s$ , since otherwise (a) the output signal is not produced at the correct sample frequency and (b) the C program misses the next input sample.

_images/realtimedsp.jpg — Real-time DSP operation requires the time to compute on a new input sample to be smaller than the sample period.

The previous figure is a sketch of the operations taking place in a real-time DSP application. When a new sample is received, the processor works for $T_w$ to compute an output from the input, and then produces an output sample and goes to sleep (or idle mode). After $T_{slack}$ , the new input is received, and the processing loop starts over. Obviously, the sample period $T_s$ must be longer than the working period $T_w$ , and the slack $T_{slack}$ must be positive. The load of the processor (expressed as a relative number between 0 and 100%) is $\frac{T_w}{T_s}$ .

In the following, we will break out $T_w$ in the time needed for input/output, and the time needed for computations, and we will discuss several mechanisms through which this real-time operation can be constructed.

Signal Processing Hardware 

Since we will study the real-time input/output characteristics for signal processing on the lab kit, let’s first consider the digital architecture of this kit. In the following, we will abstract out the analog components such as anti-alias filters and reconstruction filters, and only consider the digital components.

The Audio BOOSTXL board contains a microphone (with pre-amplifier), a 14-bit DAC, and a loudspeaker (with amplifier). The MSP-EXP432P401R kit contains the MSP432P401R microcontroller, which integrates a 14-bit ADC, an ARM Cortex M4 processor, and a slew of peripherals.

In this system, real-time operation implies the following activities will complete within one sample period $T_s$ .

An input sample needs to be captured from the microphone (or another input pin) and converted in the 14-bit ADC.
The input sample must be processed on the ARM using a suitable DSP algorithm. Eventually, the ARM program will compute the value for an output sample.
The output sample needs to be transmitted to the 14-bit DAC on the BoostXL board, where it is converted into an output voltage for the loudspeaker.

We will discuss three different mechanisms to achieve real-time operation. We will look for the strong and weak points for each scheme and evaluate how to implement these schemes onto the Lab kit.

Basic A/D operation 

The operation of the 14-bit ADC on the lab kit is as follows.

Sample Trigger: The conversion starts with the sample-and-hold module grabbing the voltage at the ADC input. There are multiple input channels available, and through an analog multiplexer, the programmer can select what pin voltage to select.
Conversion: After the sample trigger, the ADC conversion starts. As a successive-approximation ADC, it takes one clock cycle to determine each bit of output resolution. When the conversion completes, the ADC output is stored in an output buffer.
End of Conversion: At the end of conversion, the ADC sets a flag to indicate that a new ADC output is ready. Also, the EOC flag can be used to interrupt the processor.

The ADC has many operation modes, that differ in how the ADC is triggered and how the conversion and end-of-conversion phases are handled. The MSP432 Technical Reference Manual Chapter 22 specifies those in great detail. For this lecture on real-time input-output, it is sufficient to understand that (1) the ADC has to be triggered to initiate a conversion, and (2) it takes several clock cycles before the conversion completes.

Basic D/A operation 

A 14-bit DAC is located on the BOOSTXL-AUDIO board and connected to the MSP432 microcontroller through a serial interface. Sending a new 14-bit sample to the chip involves the following steps.

1. Assert SYNC: The DAC chip is notified that a new sample is coming through asserting the SYNC pin. This pin is driven through a GPIO pin on the MSP432 microcontroller.

2. SPI Transfer: The DAC chip receives two bytes (high byte, low byte) over the serial SPI connection, which takes 16 SPI clock cycles. The SPI is controlled through an SPI peripheral on the MSP432 microcontroller.

De-assert SYNC: The sample is latched to the DAC output as an analog voltage by de-asserting the SYNC pin.

The datasheet of the DAC8311 chip describes the details of this protocol. For this lecture on real-time input-output, it is sufficient to understand that updating an output sample on the DAC takes several operations: asserting a GPIO pin, transmitting two bytes over SPI, and de-asserting the same GPIO pin.

Timing Constraints for Software 

We next define two terms that are useful to describe the execution time of software: meeting the timing constraints and constant execution-time.

For a given C program, multiple factors affect its execution time: the number of instructions in the program, and the clock frequency of the processor running the program, are two obvious ones. Some factors are harder to capture - such as the effect of the internal processor architecture (the cache, the pipeline, and so on). Also, there may also be unpredictable factors that are defined by the data values processed by the C program. For example, a sorting program will have an execution time proportional to the length of the numbers that have to be sorted.

In real-time applications, we are greatly concerned with the execution time of the software. At the very least, we wish that the software meets the timing constraints, which means that the program must finish computing the output before it’s due. From the earlier discussion, we know that the output must be ready before the sample period $T_s$ has elapsed.

When a software program always takes the same amount of time to execute, then we will call this program to be running in constant execution time. It may not be possible to tell from a C source code exactly how much time that constant-time period will take, as this depends on the implementation details of the architecture. As a rule of thumb, if your C program does not make any data-dependent decisions, the resulting execution on a stand-alone microcontroller will be constant time.

Polled Input/Output 

We are now ready to discuss the first implementation of real-time DSP operation. This design uses polled input/output. It is completely driven out of software. The following shows pseudocode (meaning - not actual code) for this processing:

while (1) {

   trigger_adc_conversion();
   while (adc_conversion_running)
     /* wait */ ;
   insample = read_adc_output();

   outsample = processSample(insample);

   assert_dac8311_sync();
   send_spi(hibyte(outsample));
   send_spi(lobyte(outsample));
   deassert_dac8311_sync();

}

Real-time operation at a fixed sample rate is achieved as long as every function call in this pseudocode takes a constant amount of cycles to complete. The first few lines (3-6) are the ADC conversion, next is the user-defined processing, and the last few lines (10-13) is the DAC conversion. If the user-define processSample function is constant-time, then the overall program will also be constant-time, i.e., each iteration through the while loop takes the same amount of clock cycles.

The sampling frequency of the DSP application is determined by the execution time of the while-loop body. Without knowing the contents of processSample, we cannot know the sample period that can be achieved. On the other hand, if processSample always executes the same amount of operations (no data-dependent if-statements, for-loops), then the code will be constant time, so we will obtain a constant sample rate.

For DSP, polled IO is not very useful because the sample rate depends on the code complexity. Nevertheless, polled IO represents a minimal overhead, as all processor cycles are devoted to computing or reading the ADC/writing the DAC.

Measuring D/A Conversion 

We will look at an example of a polled operation and measure the sampling frequency.

#include "xlaudio.h"

static uint16_t v = 0;

uint16_t processSample(uint16_t x) {
    xlaudio_debugpinhigh();
    xlaudio_debugpinlow();
    v = (v + 1) % 16384;
    return v;
}

int main(void) {
    WDT_A_hold(WDT_A_BASE);

    xlaudio_init_poll(XLAUDIO_MIC_IN, processSample);
    xlaudio_run();

    return 1;
}

_images/polledexample.png — Output sample rate for a polled DSP program.

Using a scope, we monitor the DAC MOSI data (blue trace) and the debug pin (orange trace). The period of the green trace is about 52.4 KHz, and this represents the maximum sample rate achievable on this setup. The compiler optimization was set at level 2 (global) for the program as well as for the xlaudio library. Note that the duty cycle of the debug pin is small, which indicates that only a minor portion of the clock cycles within a sample period are spent inside of the processSample function. The blue pulse bursts just behind each orange pulse are data bits flowing from the microcontroller to the board over SPI.

The SYNC pulses are 19 microseconds apart. That corresponds to roughly 910 cycles at 48 MHz. It’s useful to quantify this number of 910 cycles more precisely.

The duty cycle pin is high over only a minor part of the 910 cycles (roughly 1.1 microseconds out of a 19 microsecond period. That means that a majority of the 910 cycles are spent outside of the processSample function, for housekeeping functions.
The housekeeping functions include the interrupt latency, the interrupt service routine overhead of the ADC, and the logic that transmits a sample to the DAC.
The transmission of a sample to the DAC consumes a non-negligible portion of the 19 microsecond period. Closer study of the DAC SYNC pulse (not shown in the graph) shows that it is asserted for about 5.8 microseconds, which is about 273 cycles. This time can never be hidden in hardware, because the DAC SYNC pulse is asserted by a GPIO pin, and is controlled by the software of the xlaudio library.

Now, compare that to a typical hardware sharing factor. If we would have a DSP program running at 8 KHz sample rate, then we would expect to have 6000 processor clock cycles available for the processing of the sample, when the processor clock would be 48 MHz. What this example shows, is that there are about 750 cycles used for basic ADC/DAC management for every sample. Hence, the actual cycle budget available per sample is less than 6000 cycles, it’s more around 5250 clock cycles. At high sample rates, this overhead may even become a dominant factor. For example, at 32 KHz sample rate, we a budget of 1500 cycles, but we loose half of those cycles to data moving overhead, and we can use only 750 of them for actual computations.

More advanced DSP hardware can of course avoid much of those 750 cycles; the specific data discussed in this example is applicable to the BOOSTXL-AUDIO kit in combination with MSPEXP-432P401R.

Interrupt Driven Input/Output 

In the second form of real time I/O, we make use of an external timer module to start ADC conversions at precise intervals. Each time an ADC conversion completes, it generates an interrupt. The interrupt service routine that services the A/D will read the A/D output and complete the user processing. Finally, a second timer is used to transmit the output sample to the DAC at the proper rate.

The pseudocode of this operation looks as follows. The initialization function configures a periodic timer and the ADC. The ADC ISR reads the sample, and forwards the result to the DAC. The run function enables ADC conversions and puts the program into a low-power (sleep) mode. Each time a hardware interrupt occurs, the processor wakes up, processes the ISR, and goes back to sleep.

initialize() {
  set periodic ADCtimer to desired sample frequency;
  configure ADC conversion to initiate when periodic ADCtimer rolls over;
  enable ADC interrupt;
}

ADC_ISR() {
  insample = read_adc_output();
  outsample = processSample(insample);
  assert_dac8311_sync();
  send_spi(hibyte(outsample));
  send_spi(lobyte(outsample));
  deassert_dac8311_sync();
}

run() {
  enable ADC conversions;
  while (1)
    go to low power mode();
}

The main advantage of using a timer, is that the sample rate can be guaranteed, as long as the sample processing can complete within a sample period. The output samples will also be at a fixed sample rate assuming that processsample is a constant-time function.

Missing an interrupt 

We should be cautious to ensure that the sample processing time isn’t too long. When that happens, the ADC conversion would complete when the processing of the previous sample has not finished. The processor would ignore the resulting interrupt until the current processing is complete.

When the processing completes, the next sample is now picked up and processed. Eventually, of course, we will run out of time and loose a sample. If the sample period is 125 microseconds but we use 130 microseconds of processing, then we can effectively only process 125 samples out of every 130 samples given. The effect of missing an interrupt is that the sample rate slowly degrades when the processing time per sample exceeds the sample period.

This also provides an easy mechanism to check the real-time performance of your program. You can measure the frequency of the DAC SYNC pulse, which should always correspond to the sample rate specified in your program. If the DAC SYNC pulse frequency is lower, then your program is loosing samples and your DSP program is not real-time.

Performance Measurement 

Finally we will discuss an easy method for performance measurement. A hardware timer is well suited to measure the execution time of a software function. The pseudocode for such a measurement would look as follows. We have assumed that a generic timer readout function, timer_value(), exists. This function returns from a periodic counter that continuously increments from 0 to MAXUINT, then wraps around to 0 and increments again. The first term (a2-a1) is the execution time of mistery_function(), while the second term (a1 - a0) removes the overhead of the timer_value() function call.

On complex processors (which includes, in fact, the ARM Cortex M4), many factors can affect computation time, including cache, memory bus conflicts, interrupts on the processor, etc. For such cases, one would perform the measurement multiple times - say 10 times - and only keep the median result.

void mistery_function() {
   // ...
}

uint32_t measure_function() {
   a0 = timer_value();
   a1 = timer_value();
   mistery_function();
   a2 = timer_value();

   return (a2 - a1) - (a1 - a0);
}

Measuring Processing Clock Cycles 

In the XLAUDIO library, there is a function xlaudio_measurePerSample which implements the performance measurement mechanism discussed earlier. You call thus function with your sample processing function as the argument, and the function xlaudio_measurePerSample returns the median number of cycles needed for processing zero-valued samples.

#include "xlaudio.h"

uint16_t processSample(uint16_t x) {
    return x*x;
}

#include <stdio.h>

int main(void) {
    WDT_A_hold(WDT_A_BASE);

    uint32_t c = xlaudio_measurePerfSample(processSample);
    printf("Perf processSampleDirectFolded: %d\n", c);

    return 1;
}

The output of this program (the printf function) will appear in the Console window of CCS:

Conclusions 

We discussed two different techniques to achieve real-time input/output operation for digital signal processing. Polling-based input/output ensures that no slack is left on the processor but makes it hard to control the sample rate. Interrupt-based input/output guarantees the same rate, as long as the processor is not overloaded.

For the coming few labs, we will rely on interrupt-based input/output. As the signal processing becomes more challenging, we will introduce a third real-time I/O mechanism in the future (DMA-based input/output). However, we will leave polling-based input/output for the academic oddity it is in Real Time DSP.