Lecture 32: Software Performance Evaluation Performance measurement - how long does a given piece of code take to execute? - the easiest is to measure this with a timer A variant of the Timer32Expired() function; the timerlap function: unsigned lapcount; unsigned TimerLap() { unsigned lap; unsigned snap = ReadTimer23(); lap = snap - lapcount; lapcount = snap; return lap; } - This function is called as follows: Timer23Lap(); // some code lap = Timer23Lap(); lap will be similar to the number of cycles elapsed - What functions would I want to measure? - the time it takes to execute a function - the time it takes to iterate a cycle executive In principle, cycle executive must always be faster than some bound V, eg. 1 million cycles. If the iteration time drops below that, deadline guarantees can no longer be given. - the time it takes to execute and ISR - Overhead is worthwhile to consider TimerLap(); lap = TimerLap(); will return some small amount of cycles (a correction factor). Basic Setup: Timer32 configuration Cyclic executive //------------------------------------------------- Measurement loop 1: while (1) { for (i=0; i<16; i++) { thisA[i] = rand(); } for (i=0; i<10; i++) { TimerLap(); c = sumArray(thisA); cycles[i] = TimerLap(); } for (i=0; i<10; i++) { printf("%4d ", cycles[i]); } printf("\n"); } - Why do we measure 10 times? To account for variance (eg in multitasking) (this includes non-constant-time functions and architectural effects such as cache) In this case, all measurements are equally long - Measurement result: Measure with function: 355 cycles Measure without function: 46 cycles Effective cycles = 309 cycles //------------------------------------------------- Measurement loop 2: // set at 5KHz void T32_INT2_IRQHandler() { GPIO_toggleOutputOnPin(GPIO_PORT_P2, GPIO_PIN6); Timer32_clearInterruptFlag(TIMER32_1_BASE); } while (1) { for (i=0; i<16; i++) { thisA[i] = rand(); } for (i=0; i<10; i++) { TimerLap(); c = sumArray(thisA); cycles[i] = TimerLap(); } for (i=0; i<10; i++) { printf("%4d ", cycles[i]); } printf("\n"); } - Measurement result: Variable time: 355 425 424 423 355 355 424 423 424 355 Depending on when the ISR is called, the execution time increases. It seems reasonable to assume: 355 cycles = no ISR call 425 cycles = 1 ISR call and from there we can compute that the ISR takes about 60 cycles - If you increase the ISR rate, eg to 20KHz, you will see an even bigger load 628 629 698 698 699 699 629 632 632 631 Assuming that the base load is still 355 cycles, we now see 628 - 355 = 273 cycles 699 - 355 = 344 cycles which seems to indicate that there are multiple ISR calls per execution of sumArray //------------------------------------------------- Measurement loop 3: How do we measure a cyclic executive? What we can measure is the cycle executive iteration time. That gives us an idea how quick the cycle executive spins. TimerLap(); while (1) { k += TimerLap(); spin++; for (i=0; i<16; i++) { thisA[i] = rand(); } for (i=0; i<10; i++) { c = sumArray(thisA); } if (spin > 1000) { printf("%4d\n", k/1000); spin = 0; k = 0; TimerLap(); } } Note that, once more, we are iterating many times in the cyclic executive and then extract the average time. We find, for this executive: 4105 cycles So that means that the cyclic executive spins at 3MHz / 4105 = 730.8 Hz. If we remove all the work, and measure the empty cyclic executive we find: 62 cycles. So the empty cyclic executive spins at 3MHz/62 = 48.3KHz //------------------------------------------------- Measurement loop 4: What if there is also an ISR (at known rate)? Say, we have a 5 KHz ISR: void T32_INT2_IRQHandler() { GPIO_toggleOutputOnPin(GPIO_PORT_P2, GPIO_PIN6); Timer32_clearInterruptFlag(TIMER32_1_BASE); } while (1) { k += TimerLap(); spin++; for (i=0; i<16; i++) { thisA[i] = rand(); } for (i=0; i<10; i++) { c = sumArray(thisA); } if (spin > 1000) { printf("%4d\n", k/1000); spin = 0; k = 0; TimerLap(); } } We find: 4636 cycles (while it was 4105 cycles without the ISR). Can we compute the length of the ISR using measurements only? The iteration time is: for spin = 1000 we find N cycles per iteration N = ISR.k + B where B = base load of the processor, used for computations in the cyclic executive only k = number of times the ISR is called during N ISR = number of cycles for one ISR call Now let's disable the ISR and do the same measurement. The iteration time jumps to: 4119 cycles So we have B = 4119 cycles 4636 - 4119 = 517 cycles on ISR We know that the ISR rate is 5KHz, or (at 3MHz), every 600 cycles. The number of ISR calls k is ceil(N / 600) ceil(4636 / 600) = 8 calls Since N - B = ISR.k = (4636 - 4119) = 517 It follows that ISR ~ 64 cycles This is close to what we measured earlier. These examples are all on a repository msp432-sw-perf