.. ECE 4703 .. _lecture 5: Real-Time Input Output ====================== .. contents:: The purpose of this lecture is as follows. * To discuss the mechanisms by which signal processing achieves real-time operation on a processor * To compare two different methods of implementing input/output (from ADC/ to DAC) * To discuss essential tools for performance assessment of real-time DSP programs. Basics of Real-Time Operation ----------------------------- Real-time Signal Processing fits in a category of problems known as hard-real-time problems. If you want to generate a signal with a sampling frequency of, say 32 kHz, then you have to guarantee that every :math:`\frac{1}{32KHz} = 31.25 \mu s`, a new sample can be sent to the Digital-to-Analog converter. If the sample comes even one microsecond too late, then the signal will lose its fixed sample frequency of 32 kHz, resulting in severe signal distortion. Furthermore, because many real-time digital-signal-processing (DSP) programs transform and process signals as they are received, these DSP programs typically are not developed to 'compute ahead': real time signals are not processed in batch, but rather as they appear, in real time. If an Analog to Digital Converter is configured to sample a signal at 32 kHz, then it will deliver a new input to the DSP program every :math:`31.25 \mu s`. Your DSP program has to complete all the computations needed for one single sample within :math:`31.25 \mu s`. If it can complete the computations sooner, that's fine; your processor could take a rest, switch into sleep mode and save power. However, the computations must complete within :math:`31.25 \mu s`, since otherwise (a) the output signal is not produced at the correct sample frequency and (b) the C program misses the next input sample. .. figure:: images/realtimedsp.jpg :figwidth: 600px :align: center Real-time DSP operation requires the time to compute on a new input sample to be smaller than the sample period. The previous figure is a sketch of the operations taking place in a real-time DSP application. When a new sample is received, the processor works for :math:`T_w` to compute an output from the input, and then produces an output sample and goes to sleep (or idle mode). After :math:`T_{slack}`, the new input is received, and the processing loop starts over. Obviously, the sample period :math:`T_s` must be longer than the working period :math:`T_w`, and the slack :math:`T_{slack}` must be positive. The load of the processor (expressed as a relative number between 0 and 100%) is :math:`\frac{T_w}{T_s}`. In the following, we will break out :math:`T_w` in the time needed for input/output, and the time needed for computations, and we will discuss several mechanisms through which this real-time operation can be constructed. Signal Processing Hardware ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Since we will study the real-time input/output characteristics for signal processing on the lab kit, let's first consider the digital architecture of this kit. In the following, we will abstract out the analog components such as anti-alias filters and reconstruction filters, and only consider the digital components. The Audio BOOSTXL board contains a microphone (with pre-amplifier), a 14-bit DAC, and a loudspeaker (with amplifier). The MSP-EXP432P401R kit contains the MSP432P401R microcontroller, which integrates a 14-bit ADC, an ARM Cortex M4 processor, and a slew of peripherals. .. figure:: images/system4703.jpg :figwidth: 400px :align: center In this system, real-time operation implies the following activities will complete within one sample period :math:`T_s`. * An input sample needs to be captured from the microphone (or another input pin) and converted in the 14-bit ADC. * The input sample must be processed on the ARM using a suitable DSP algorithm. Eventually, the ARM program will compute the value for an output sample. * The output sample needs to be transmitted to the 14-bit DAC on the BoostXL board, where it is converted into an output voltage for the loudspeaker. We will discuss two different mechanisms to achieve real-time operation. We will look for the strong and weak points for each scheme and evaluate how to implement these schemes onto the Lab kit. Basic A/D operation ^^^^^^^^^^^^^^^^^^^ The operation of the 14-bit ADC on the lab kit is as follows. 1. **Sample Trigger**: The conversion starts with the sample-and-hold module grabbing the voltage at the ADC input. There are multiple input channels available, and through an analog multiplexer, the programmer can select what pin voltage to select. 2. **Conversion**: After the sample trigger, the ADC conversion starts. As a successive-approximation ADC, it takes one clock cycle to determine each bit of output resolution. When the conversion completes, the ADC output is stored in an output buffer. 3. **End of Conversion**: At the end of conversion, the ADC sets a flag to indicate that a new ADC output is ready. Also, the EOC flag can be used to interrupt the processor. The ADC has many operation modes, that differ in how the ADC is triggered and how the conversion and end-of-conversion phases are handled. For this lecture on real-time input-output, it is sufficient to understand that (1) the ADC has to be triggered to initiate a conversion, and (2) it takes several clock cycles before the conversion completes. Basic D/A operation ^^^^^^^^^^^^^^^^^^^ A 14-bit DAC is located on the BOOSTXL-AUDIO board and connected to the MSP432 microcontroller through a serial interface. Sending a new 14-bit sample to the chip involves the following steps. 1. **Assert SYNC**: The DAC chip is notified that a new sample is coming through asserting the SYNC pin. This pin is driven through a GPIO pin on the MSP432 microcontroller. 2. **SPI Transfer**: The DAC chip receives two bytes (high byte, low byte) over the serial SPI connection, which takes 16 SPI clock cycles. The SPI is controlled through an SPI peripheral on the MSP432 microcontroller. 3. **De-assert SYNC**: The sample is latched to the DAC output as an analog voltage by de-asserting the SYNC pin. For this lecture on real-time input-output, it is sufficient to understand that updating an output sample on the DAC takes several operations: asserting a GPIO pin, transmitting two bytes over SPI, and de-asserting the same GPIO pin. Interrupt Driven Input/Output ----------------------------- In all of our experiments so far, we used a real-time input/output mechanism called interrupt driven input/output. A hardware timer module starts ADC conversions at precise intervals. Each time an ADC conversion completes, it generates an interrupt. The interrupt service routine that services the A/D will read the A/D output and complete the user processing. Finally, a second timer is used to transmit the output sample to the DAC at the proper rate; this makes sure that variable software operation latency in DSP processing is removed. The pseudocode of this operation looks as follows. The initialization function configures a periodic timer and the ADC. The ADC ISR reads the sample, and forwards the result to the DAC. The run function enables ADC conversions and puts the program into a low-power (sleep) mode. Each time a hardware interrupt occurs, the processor wakes up, processes the ISR, and goes back to sleep. .. code:: c :number-lines: 1 initialize() { set periodic ADCtimer to desired sample frequency; configure ADC conversion to initiate when periodic ADCtimer rolls over; enable ADC interrupt; } ADC_ISR() { insample = read_adc_output(); outsample = processSample(insample); assert_dac8311_sync(); send_spi(hibyte(outsample)); send_spi(lobyte(outsample)); deassert_dac8311_sync(); } run() { enable ADC conversions; while (1) go to low power mode(); } The main advantage of using a timer, is that the sample rate can be guaranteed, as long as the sample processing can complete within a sample period. The output samples will also be at a fixed sample rate assuming that processsample is a constant-time function. Missing an interrupt ^^^^^^^^^^^^^^^^^^^^ We should be cautious to ensure that the sample processing time isn't too long. When that happens, the ADC conversion would complete when the processing of the previous sample has not finished. The processor would ignore the resulting interrupt until the current processing is complete. When the processing completes, the next sample is now picked up and processed. Eventually, of course, we will run out of time and loose a sample. If the sample period is 125 microseconds but we use 130 microseconds of processing, then we can effectively only process 125 samples out of every 130 samples given. The effect of missing an interrupt is that the sample rate slowly degrades when the processing time per sample exceeds the sample period. This also provides an easy mechanism to check the real-time performance of your program. You can measure the frequency of the DAC SYNC pulse, which should always correspond to the sample rate specified in your program. If the DAC SYNC pulse frequency is lower, then your program is loosing samples and your DSP program is not real-time. DMA-driven Input/Output ----------------------- The interrupt-driven input/output system processes a sample at a time. However, many forms of computing work better when they are done on multiple samples at once. Think of vector processing (multiple data items processed using an identical operation), cache memory access (reading a *cache line* of memory items in a burst), processor pipeline processing (achieving parallellism over sequential instructions), and many others. For this scenario, real-time processing uses *block-based processing*, in which samples are processed per block (of 8, 16, 32, .. samples) rather than per individual sample. In a block-based processing system, a conversion from sample-rate based processing to block-based, and vice versa, is needed. We make use of a special peripheral called a Direct Memory Access unit. A DMA module performs data movement on behalf of the processor. We can program it with a source address and destination address, and a number of words to copy. When started, the DMA will then automatically copy the block of data. The DMA operation is fairly complex, and in this lecture we will only make a once-over-lightly discussion. .. figure:: images/dmaio.png :figwidth: 600px :align: center DMA-driven I/O's idea is to fill up a complete buffer with ADC samples before turning over the buffer to the processor. The processor can then compute on an entire buffer of samples. The following block diagram shows the sequence of operations happening under DMA driven I/O. A periodic timer starts A/D conversions at a specified interval. When an A/D conversion finishes, the DMA module is triggered and the sample is forwarded through a DMA channel to a buffer in main memory. The DMA trigger is a hardware signal; no software is involved to store a sample value in memory. The DMA makes use of a *ping* and a *pong* buffer, with the idea that the ARM is only allowed read access to the buffer which is currently not being filled. Thus, when DMA Channel 1 fills the Ping buffer, the ARM reads the Pong buffer, and when DMA Channel 2 fills the Pong buffer, the ARM reads the Ping buffer. The switching between buffers is controlled through a DMA Interrupt Service Routine, and is done behind the scenes. Finally, because the access to the DAC8311 is more complex than the ADC14 (i.e. it requires a combination of GPIO and SPI), no DMA transfers are used to produce output samples. .. figure:: images/dmapingpong.png :figwidth: 600px :align: center API for DMA ^^^^^^^^^^^ Here is a complete example of a DMA driven input/output program. The initialization function call includes an additional parameter: the size of the Ping/Pong buffer. .. code:: c :number-lines: 1 #include "xlaudio.h" void processBuffer(uint16_t x[32], uint16_t y[32]) { uint16_t i; for (i=0; i<16; i++) { y[i] = x[i]; } } #include int main(void) { WDT_A_hold(WDT_A_BASE); xlaudio_init_dma(FS_32000_HZ, XLAUDIO_J1_2_IN, BUFLEN_32, processBuffer); xlaudio_run(); return 1; } Missing a interrupt ^^^^^^^^^^^^^^^^^^^ A real-time input/output violation is harder to measure in a block-based processing system. That is because the rate of the output signal will always meet the required output sample rate. The effect of overloading the processor in a block-based scheme is a timeout in the buffer switching process. That is, when the processing has finally completed computations on the buffer (Ping or Pong), it will already by too late as the ADC/DAC have already started to overwrite that buffer with the next block. The end result is a severe distortion in the output signal. Conclusions ----------- We discussed two different techniques to achieve real-time input/output operation for digital signal processing. For the coming few labs, we will rely mostly on the DMA-based input/output.