Lecture 19 - Debugging and Profiling
Introduction
This week we’re discussing several important and useful techniques that will help you in the codesign challenge. Today, we’ll cover debugging and profiling. On Thursday, we’ll cover basic ideas in performance optimization and the identification of performance bottlenecks.
Configuring and running example-nios-sdram
The following figure illustrates the example platform that we will use in this (and the next few) lectures. It has the following features.
- Nios 2/e processor at 50MHz system clock.
- On-chip 64KByte RAM holding the
text
segment of the program - SDRAM controller to access an off-chip 64Mbyte memory (16-bit wide)
running at 100MHz system clock. The SDRAM holds static and runtime
data of the C program, including
rwdata
,rodata
,heap
,stack
, andbss
. - Three PIO ports to drive the HEX displays, red LEDs and keys on the DE1-SoC kit, respectively.
- Two 32-bit interval timers for system-timing and performance timing.
- A PLL to create three different clock regions: 50MHz system clock, 100 MHz SDRAM controller clock, and 100MHz skew-adjusted SDRAM clock.
Figure: Nios 2 SDRAM Test Platform
To compile and setup the example on your DE1-SoC kit, proceed as follows.
- Download the repository on your laptop.
git clone https://github.com/vt-ece4530-f19/example-nios-sdram
- Compile the bitstream. You can, optionally, load the design in
Quartus by opening
exampleniossdram.qpf
. In addition, you can load the platform in Platform Designer by openingplatformniossdram
. This enables you to configure design components, add new components, adjust the memory map, and so forth. The compile the bitstream from the command line (Nios II Command Shell), simply use the following.
quartus_sh --flow compile exampleniossdram
- Once you have the bitstream (sof), download it to the DE1-SoC board.
nios2-configure-sof -d 2 exampleniossdram.sof
-
Next, create a board support package. Run the Nios2 BSP editor with
nios2-bsp-editor
. Openplatformniossdram.sopcinfo
and select the following settings.sys_clk_timer
is connected totimer_0
timestamp_timer
is connected totimer_1
- Under Advanced/hal/linker, turn off
enable_alt_load
. This prevents code and data from being relocated at startup.
The load facility (which we don’t want to use) works as follows. When enable_alt_load
is selected, the loader will first place all code segments in on-chip RAM, and then copy the data segments (rwdata
, rodata
) over to the SDRAM. That makes sense if the program is stored in a non-volatile memory, and the loader is part of the processor boot sequence. In our case, we don’t use a non-volatile memory nor a bootloader; we use nios2-download
. Since the SDRAM is much bigger than the on-chip RAM, we want the loader to directly copy the data segments into SDRAM.
Figure: Disable alt_load facility
- Exit the BSP editor, and compile the BSP code.
cd software
nios2-bsp-generate-files \
--settings=hal_bsp/settings.bsp \
--bsp-dir=hal_bsp
cd hal_bsp
make
- Compile a sample application (there are several examples
in the
software
subdirectory). The spigot algorithm computes digits of pi. It can be easily grown to a program with high computational complexity and storage complexity by increasingn
, the number of digits desired from PI.
cd spigot
nios2-app-generate-makefile \
--bsp-dir ../hal_bsp \
--elf-name main.elf \
--src-files spigot.c
make
- To run the application, open a Nios 2 terminal. Next, download the application to the Nios.
# in a first Nios 2 Command Shell
nios2-terminal
# in a second Nios 2 Command Shell
nios2-download main.elf --go
Debugging
Software contains bugs. To find these bugs, we wish to observe software closely during operation. In hardware/software codesign, we have used (and will use) a wide array of techniques to observe the software in action. Let’s enumerate our arsenal of debugging tools.
- One of the most basic debugging techniques uses blinking LEDs and/or
numerical HEX displays. Many of the design examples provided for your board, including the example we’ll discuss today, use a ‘heartbeat’ LED.
On your DE1-SoC board, such a heartbeat LED shows that the board is powered, that the clock is correctly configured, and that a bitstream was loaded on the FPGA. As an example, the following heartbeat signal
blinks two LEDs at a rate of approximately 2 Hz. In addition, the
state of a ‘locked’ signal is encoded by the two LED blinking in a
unison fashion or unlocked (
locked == 1'b0
) when blinking in an alternating fashion.
reg [23:0] heartbeat;
// heartbeat indicator
always @(posedge CLOCK_50, negedge KEY[0])
if (KEY[0] == 1'b0)
heartbeat <= 24'b0;
else
heartbeat <= heartbeat + 1'b1;
assign LEDR[9] = heartbeat[23];
assign LEDR[8] = ~heartbeat[23] ^ locked;
-
A second, very common debugging technique uses
printf
of internal data structures and events during program execution. On an embedded board,printf
needs to be implemented through a UART. In our Nios-based designs, we used a JTAG-UART. -
A third debugging technique is to use a software debugger. A software debugger is a tool that provides direct control over the execution characteristics of a program. It enables a designer to load, start and stop a program. It enables a designer to observe intermediate values. It enables a designer to watch for certain events, or stop the program at a precise point during its execution. In contrast to the
printf
technique, using a debugger does not require a separate instrumentation of a program with extra printf function calls. At the downside, debugging using a software debugger removes the real-time characteristics of the program. For example, real-time interrupt service routines are hard to handle in a debugger since stopping the program also means stopping serving real-time interrupts. We will discuss debugging with GNUgdb
in further detail. -
A fourth debugging technique is to use simulation. This provides even greater visibility of the system, at the cost of requiring a simulation model for the system under development. On the other hand, using a simulation model, time itself can be simulated so that we can closely monitor real-time behavior. Building a simulation model for a complex system (such as DE1-SoC) is hard. Imagine the simulation of a design for the DE1-SoC board using Modelsim. We need Verilog to simulate the design on the FPGA. In addition, we also need Verilog for any of the chips on the board that we want to include in the simulation. And we need to design a testbench to exercise the simulation.
-
A fifth technique, which can be used in parallel with a software debugger, is to use a hardware logic analyzer. The logic analyzer provides detailed information of the state of hardware signals. In FPGA design, the logic analyzer can be compiled into the design-under-test (DUT). In contrast to a software debugger, a hardware logic analyzer captures hardware activities in full detail, and at the granularity of a single clock cycle. For example, a logic analyzer is good to study the internal hardware details of a coprocessor.
Debugging on a host
We are interested in source-level debugging, which allows you
to step through each line of your program as you run it.
To prepare a program for source-level debugging, we have to
compile debug information into the program.
The debug information contains the (address) location of every function and every variable in memory. The debug information associates each
line number in the source code with a location in the text
segment.
In GCC, you can add debug information into your program with the -g
flag. In the following examples, we will compile the spigot
application
on the host (X86 processor), using GCC for Cygwin. You may need to
install GCC (and GDB, the debugger) for cygwin if you don’t have it yet.
gcc -g spigot.c -o spigot.exe
After the application is compiled, you can load it in the GNU debugger, gdb
. The debugger allows you to list source code, step through the program, set breakpoints, inspect memory and variables, and much more.
The following shows typical commands.
gdb spigot.exe
- The list command shows source code. Functions can be referenced by name.
(gdb) list main
11 for (i=2; i<n-1; ++i)
12 printf("%04d", pi[i]);
13 printf("\n");
14 }
15
16 int main( ) {
17 int n = 5000; /* number of pi digits */
18 unsigned short *pi = (unsigned short*) malloc(n * sizeof(unsigned short));
19 div_t d;
20 int i, j, t;
- The
b
command sets a breakpoint at a line number or a function name. When the execution of a program reaches a breakpoint, execution is halted and control is returned to the debugger. You can list the active breakpoints using theinfo
command. Breakpoints can be deleted with thedelete
command.
(gdb) b main
Breakpoint 1 at 0x4011f8: file spigot.c, line 17.
(gdb) b 22
Breakpoint 2 at 0x401212: file spigot.c, line 22.
(gdb) info b
Num Type Disp Enb Address What
1 breakpoint keep y 0x004011f8 in main at spigot.c:17
2 breakpoint keep y 0x00401212 in main at spigot.c:22
(gdb) delete 2
(gdb) info b
Num Type Disp Enb Address What
1 breakpoint keep y 0x004011f8 in main at spigot.c:17
- The
r
command runs a program. You can add arguments (similar to the command line). Thes
command steps through a program and into a function. Then
command steps through a program and over a function. Thec
command continues execution after a breakpoint was hit.
(gdb) r
Starting program: /home/ece4530f19/example-nios-sdram/software/spigot/spigot.exe
Thread 1 "a" hit Breakpoint 1, main () at spigot.c:17
17 int n = 50; /* number of pi digits */
(gdb) s
18 unsigned short *pi = (unsigned short*) malloc(n * sizeof(unsigned short));
(gdb) c
Continuing.
3.14159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706798214808651328230664709384460955058223172535940812848111745028410270193852110555964462294
[Inferior 1 (process 972) exited normally]
- During execution you can print variables and expressions using variables with the
p
command. You can dump the contents of memory with thex
command.
(gdb) p j
$9 = 49
(gdb) p &j
$10 = (int *) 0x62cbe8
(gdb) x/4b 0x62cbe8
(gdb) x/4w &j
0x62cbe8: 49 664 -2147174212 6474800
- There is a text-mode user interface (TUI) that helps you to look at the source code (or assembly code) while stepping through the code. You can exit the TUI using ctrl-x ctrl-A.
(gdb) layout split
┌──spigot.c─────────────────────────────────────────────────────────────────────────────────┐
│11 for (i=2; i<n-1; ++i) │
│12 printf("%04d", pi[i]); │
│13 printf("\n"); │
│14 } │
│15 │
│16 int main( ) { │
B+>│17 int n = 50; /* number of pi digits */ │
│18 unsigned short *pi = (unsigned short*) malloc(n * sizeof(unsigned short)); │
│19 div_t d; │
│20 int i, j, t; │
│21 │
│22 memset(pi, 0, n*sizeof(unsigned short)); │
│23 pi[1]=4; │
│24 │
┌───────────────────────────────────────────────────────────────────────────────────────────┐
│0x4011ea <main> push %ebp │
│0x4011eb <main+1> mov %esp,%ebp │
│0x4011ed <main+3> and $0xfffffff0,%esp │
│0x4011f0 <main+6> sub $0x40,%esp │
│0x4011f3 <main+9> call 0x4013bc <__main> │
B+>│0x4011f8 <main+14> movl $0x32,0x30(%esp) │
│0x401200 <main+22> mov 0x30(%esp),%eax │
│0x401204 <main+26> add %eax,%eax │
│0x401206 <main+28> mov %eax,(%esp) │
│0x401209 <main+31> call 0x4013cc <malloc> │
│0x40120e <main+36> mov %eax,0x2c(%esp) │
│0x401212 <main+40> mov 0x30(%esp),%eax │
│0x401216 <main+44> add %eax,%eax │
│0x401218 <main+46> mov %eax,0x8(%esp) │
└───────────────────────────────────────────────────────────────────────────────────────────┘
native Thread 7068.0x3d18 In: main L17 PC: 0x4011f8
(gdb)
- There are many more features in gdb, extensively documented online. The following summary is the bare minimum.
Command | Purpose |
---|---|
file <exe> |
Selects file for debugging |
b <function> |
Sets a breakpoint at a function |
b <line> |
Sets a breakpoint at a line number |
info b |
Lists all breakpoints |
r |
Starts the program |
c |
Continues the program after a breakpoint was hit |
p <expression> |
Print the value of a variable or expression |
help <topic> |
Displays help on topic |
layout <mode> |
Enables text-mode UI, with mode one of split,src,asm,regs |
Debugging on a target
You can use gdb on your DE1-SoC board as well. In that case, the debugger application (gdb) runs on your PC while that program is running on a Nios on your DE1-SoC. This is called remote debugging. The JTAG connection provides an efficient and fast connection into the Nios processor, and can start, stop and examine it remotely. However, the software environment has to be properly setup.
On the host, you have to run a an application called nios2-gdb-server
.
This opens a program that connects to the Nios processor on the board.
It also opens a network connection and listens for a gdb
client
to connect on that port.
nios2-gdb-server --tcpport auto -tcppersist
Using cable "DE-SoC [USB-1]", device 2, instance 0x00
Processor is already paused
Listening on port 52464 for connection from GDB:
Next, you can run the nios2-elf-gdb
in the directory where you have
compiled the software application for the Nios.
nios2-elf-gdb main.elf
You then have to connect the debugger to the target, through the
nios2-gdb-server
application. This is done using the target
command
of gdb:
(gdb) target remote:52464
Remote debugging using :52464
0x00000000 in ?? ()
The port 52464 corresponds to the port announced by the gdb server.
In the window where you run the nios2-gdb-server
, you will see the
connection being made.
$ nios2-gdb-server --tcpport auto --tcppersist
Using cable "DE-SoC [USB-1]", device 2, instance 0x00
Processor is already paused
Listening on port 52464 for connection from GDB: accepted
Before you can run the program, you have to download the executable
to the target. That is done using the load
command.
In case you started nios2-elf-gdb
without an argument, you can
select the executable with the file
command. It is worthwhile
to take a close look at the output of the debugger, as it tells you
exactly where it is copying the various sections of the code.
For example, we see that rodata
and rwdata
are copied into the SDRAM (with starting address 0x0),
while the other parts are copied into the on-chip memory (with starting address 0x04000000).
(gdb) load
Loading section .rodata, size 0x308 lma 0x0
Loading section .rwdata, size 0x1aec lma 0x308
Loading section .entry, size 0x20 lma 0x4000000
Loading section .exceptions, size 0x210 lma 0x4000020
Loading section .text, size 0xfb50 lma 0x4000230
Start address 0x4000230, load size 72564
Transfer rate: 185 KB/sec, 392 bytes/write.
(gdb)
You can now use the same debugging commands as under host debugging. For example:
(gdb) list
23 pi[1]=4;
24
25 for (i=(int)(3.322*4*n); i>0; --i) {
26
27 t = 0;
28 for (j=n-1; j>=0; --j) {
29 t += pi[j] * i;
30 pi[j] = t % 10000;
31 t /= 10000;
32 }
(gdb) b 29
Breakpoint 2 at 0x40003d4: file spigot.c, line 29.
(gdb) c
Continuing.
Breakpoint 1, main () at spigot.c:17
17 int n = 50; /* number of pi digits */
(gdb) c
Continuing.
Breakpoint 2, main () at spigot.c:29
29 t += pi[j] * i;
(gdb) p i
$1 = 664
(gdb) p &i
$2 = (int *) 0x3ffffcc
(gdb) quit
When you exit the debugging session, the nios2-gdb-server
connection
is broken. Because of the -tcppersist
command line option given earlier,
a new server is started right away:
$ nios2-gdb-server --tcpport auto --tcppersist
Using cable "DE-SoC [USB-1]", device 2, instance 0x00
Processor is already paused
Listening on port 52464 for connection from GDB: accepted
Exiting due to 'k' command from GDB
Leaving target processor paused
Processor is already paused
Listening on port 52464 for connection from GDB:
Performance Profiling
To understand where software spends its time, we have to use a profiler. You may recall from our earlier discussion on hardware acceleration (Lecture 8), that the innermost loop of a program is the most influential to the overall performance. This is because it contains the most frequently executed code.
A profiler is a tool that helps you find the ‘innermost loop. The profiler will tell you how many times each function was executed, how many instructions were executed, how many times each variable was read and written, and so on.
We will look at one specific example, the GNU gprof
profiler, which
is a basic profiler that helps you understand how many times
each function was executed, and how much each function’s execution
time contributed to the overal program’s execution time (as a percentage).
Gprof requires an instrumentation step to your program. The instrumentation consists of two things:
-
First, every function call is instrumented with a call to a profiler function called
mcount
. This function will keep track of how many times each function is executed, as well as the sequence of functions. In other words,gprof
remembers which function calls which function in your program.gprof
then uses this to construct a ‘call graph’, a graph that represents the set of functions called by a parent function. -
Next, during profiling,
gprof
will periodically sample the program counter (using a system timer interrupt). This sample frequency is not very high, from 100Hz to perhaps a few kHz. This way,gprof
can sample the program counter and determine, statistically, where the program spends most of its time. By combining the program counter with debug information,gprof
can determine what function is executing. This leads to the self time a function.
The profiling data, consisting of the sampled program counter and the call graph, can then be combined into call graph timing information which includes the self time of a function as well as the time spend in all of the callees of that function.
We’ll illustrate this with the following example, which calls the modulo-179 in a set of complicated manners.
#include <stdio.h>
unsigned modk(unsigned x, unsigned k) {
return (x & ((1 << k) - 1));
}
unsigned divk(unsigned x, unsigned k) {
return (x >> k);
}
unsigned modulo(unsigned x) {
unsigned r, q, k, a, m, z;
m = 0xB3; // 179
k = 8;
a = (1 << k) - m;
r = modk(x, k);
q = divk(x, k);
do {
do {
r = r + modk(q * a, k);
q = divk(q * a, k);
} while (q != 0);
q = divk(r, k);
r = modk(r, k);
} while (q != 0);
z = (r >= m) ? r - m : r;
return z;
}
unsigned prod(unsigned a) {
unsigned i, p = 1;
for (i=0; i<a; i++)
p = modulo(p * i);
return i;
}
void main() {
unsigned i, j;
unsigned extended_profile = 1;
if (extended_profile) {
for (i=0; i<1000; i++)
for (j = 1; j< 179; j++)
prod(j);
} else {
for (j = 1; j< 179; j++)
printf("%d %d\n", j, prod(j));
}
}
Profiling on a host
To enable profiling, we have to compile the program with debug information enabled, as well as with profiling enabled. The -pg
flag of gcc
enables profiling, while the -g
flag enables debug information.
gcc -pg -g mod179.c -o mod179
Next, you can run the program. As a side effect of the profiling instrumentation, the program dumps the profiling data in a file gmon.out
at the conclusion of the profiling operation.
$ ./mod179.exe $ ls
gmon.out mod179.c mod179.exe
To post-process the profiling data, use gprof
. This displays
two tables. The first is a sorted list of call data per function.
The second table is a sorted list of call graph edges with profile data.
Here is the first table.
$ gprof mod179.exe
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls us/call us/call name
40.79 0.31 0.31 __fentry__
30.26 0.54 0.23 _mcount_private
14.47 0.65 0.11 15931000 0.01 0.01 modulo
6.58 0.70 0.05 47793000 0.00 0.00 modk
5.26 0.74 0.04 47793000 0.00 0.00 divk
1.32 0.75 0.01 178000 0.06 1.18 prod
1.32 0.76 0.01 main
The table indicates that most of the execution time goes into
the modulo
function. The two higher-ranked functions, _mcount_private
and __fentry__
, are part of the profiling framework and represent
overhead. As you may suspect, profiling causes overhead in the execution
time because of the instrumented mcount
calls. Luckily they are detected
and reported by gprof
.
Further, the difference between self
and total
is that self
only
related to the function, while total
includes the function and all
it’s decendants. For example, the main function is called only once,
but will account for all of the execution time. Most of that time is
spent in functions called by main
. Thus, the self time of main
is negligible while the total time of main
covers almost the entire
program.
The second table produced by gprof is shown next. This table allows you to keep track of the paths. It shows precisely how you end up in each lower-level function by listing all of the functions called along that path. The table is ranked according to the contribution of the call subgraph to the total execution time. In addition, for each ranked function, the table lists the parents in the call graph, so that you know who is responsible for the function calls.
Call graph (explanation follows)
granularity: each sample hit covers 4 byte(s) for 1.32% of 0.76 seconds
index % time self children called name
<spontaneous>
[1] 40.8 0.31 0.00 __fentry__ [1]
-----------------------------------------------
<spontaneous>
[2] 30.3 0.23 0.00 _mcount_private [2]
-----------------------------------------------
<spontaneous>
[3] 28.9 0.01 0.21 main [3]
0.01 0.20 178000/178000 prod [4]
-----------------------------------------------
0.01 0.20 178000/178000 main [3]
[4] 27.6 0.01 0.20 178000 prod [4]
0.11 0.09 15931000/15931000 modulo [5]
-----------------------------------------------
0.11 0.09 15931000/15931000 prod [4]
[5] 26.3 0.11 0.09 15931000 modulo [5]
0.05 0.00 47793000/47793000 modk [6]
0.04 0.00 47793000/47793000 divk [7]
-----------------------------------------------
0.05 0.00 47793000/47793000 modulo [5]
[6] 6.6 0.05 0.00 47793000 modk [6]
-----------------------------------------------
0.04 0.00 47793000/47793000 modulo [5]
[7] 5.3 0.04 0.00 47793000 divk [7]
-----------------------------------------------
gprof
is not perfect and has some caveats. The most important of these
is that the timing data is collected in a statistical manner, by sampling
the PC. If the profiled program is too short, you will end up with
significant sampling error. Make sure that you profile a sufficiently long
trace to minimize this error.
Further reading: gprof: a call graph execution profiler.
Profiling on a target
You can use gprof
also on Nios2. In that case, you have to prepare
a board support package that support profiling. After running the program,
you have to download the profiling information from the target to the host.
Figure: Enable profiling in BSP Editor
This creates a board support package that has profiling enabled. In addition, when you recreate the application, the application will be created with the profiling flags as well.
$ nios2-app-generate-makefile --bsp-dir ../hal_bsp --elf-name main.elf --src-files mod179.c
$ make
...
nios2-elf-gcc -xc -MP -MMD -c -I../hal_bsp//HAL/inc -I../hal_bsp/ -I../hal_bsp//drivers/inc -pipe -D__hal__ -DALT_PROVIDE_GMON -DALT_NO_INSTRUCTION_EMULATION -DALT_SINGLE_THREADED -O0 -g -Wall -mno-hw-div -mno-hw-mul -mno-hw-mulx -pg -mgpopt=global -o obj/default/mod179.o mod179.c
...
[main build complete]
You can run the program as usual, but you have to add the --write-gmon
flag. Open a nios2-terminal and run the program as follows. Note that the program exits upon completion and downloads the grof data.
$ nios2-download main.elf --write-gmon gmon.out --go
Using cable "DE-SoC [USB-1]", device 2, instance 0x00
Processor is already paused
Initializing CPU cache (if present)
OK
Downloaded 74KB in 0.1s
Verified OK
Running target program until exit
Uploaded GMON data: 6K in 0.0s
Leaving target processor paused
You can now consult the profiling data using nios2-elf-gprof
.
Here is the top-level data of each table. You can verify that a significant
amount of time is spent in compiler-instrinsic functions such as
__mulsi3
, __umoddi3
, and so forth. This is because we are using a
Nios II/e, which does not have any hardware multiplication.
Second, the Nios program executes signicantly slower than the
same program as it runs on the host (A 50MHz CPU with 0.1 CPI vs a 2GHz CPU with a »1 CPI!). Therefore, the table below shows the profile
for the extended_profiling
flag set to 0.
$ nios2-elf-gprof.exe main.elf
Flat profile:
Each sample counts as 0.001 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
25.28 0.27 0.27 15931 0.02 0.04 modulo
18.53 0.46 0.20 47793 0.00 0.00 modk
14.31 0.62 0.15 47793 0.00 0.00 divk
10.73 0.73 0.11 alt_get_errno
7.58 0.81 0.08 178 0.45 4.10 prod
6.85 0.88 0.07 4 18.18 18.18 altera_avalon_jtag_uart_close
3.25 0.92 0.03 51362 0.00 0.00 __mulsi3
2.45 0.94 0.03 356 0.07 0.27 ___vfprintf_internal_r
1.63 0.96 0.02 834 0.02 0.03 __udivdi3
1.58 0.98 0.02 834 0.02 0.03 __umoddi3
1.13 0.99 0.01 534 0.02 0.02 __sfvwrite_r
...
Call graph (explanation follows)
granularity: each sample hit covers 32 byte(s) for 0.09% of 1.06 seconds
index % time self children called name
0.00 0.93 1/1 _start [2]
[1] 87.7 0.00 0.93 1 alt_main [1]
0.01 0.84 1/1 main [3]
0.00 0.06 1/1 exit [15]
0.00 0.02 1/4 close [12]
0.00 0.00 1/1 alt_io_redirect [46]
0.00 0.00 1/1 atexit [73]
0.00 0.00 1/1 _do_ctors [121]
0.00 0.00 1/1 __register_exitproc [117]
0.00 0.00 1/1 alt_irq_init [68]
0.00 0.00 1/1 alt_sys_init [70]
-----------------------------------------------
<spontaneous>
[2] 87.7 0.00 0.93 _start [2]
0.00 0.93 1/1 alt_main [1]
0.00 0.00 1/1 alt_load [69]
-----------------------------------------------
0.01 0.84 1/1 alt_main [1]
[3] 80.7 0.01 0.84 1 main [3]
0.08 0.65 178/178 prod [4]
0.00 0.11 178/178 printf [9]
-----------------------------------------------
0.08 0.65 178/178 main [3]
[4] 68.8 0.08 0.65 178 prod [4]
0.27 0.37 15931/15931 modulo [5]
0.01 0.00 15931/51362 __mulsi3 [19]
Conclusions
We have discussed two different tools, gdb
and gprof
, and their use on a host processor and an a target processor. Both tools are a significant help to software developers. gdb
helps to track down bugs. gprof
helps to identify where a program spends its cycles. When moving the debugging and profiling from a host to a target, you have to take extra steps to prepare the target. The debugging user interface and profile analysis steps always execute on the host.