GDB


 * This article was originally written from a 32-bit PowerPC architecture perspective and register information will vary across architectures. 64-bit Power offset information will be added at a later date.

Compile Your Binary with Debugging Symbols
Runtime debugging with GDB is difficult if you don't have the debugging symbols embedded into your program, though not impossible (there are times where adding -g will actually magically make the program work where it was failing before.)

Compile with the -g flag to get debugging symbols embedded in the application binary: «user@host»:~/dir§ gcc -g test.c

Now GDB, objdump, nm, and all of the other binary investigation tools can gather extended (readable) symbol information from the binary.

A tutorial on how to locate problem points when you can't use the debugging symbols will be covered later (basically compile a version of the library with symbols and note the offsets and compare the offsets in the debug version with the non-symbol version). You have to have a spot-on copy of the source and compile with the same compiler and the same options.

Attaching GDB to a running process
Often when attempting to debug a threading problem you'll get a case where gdb won't catch a hang if you attempt to invoke the program from within GDB. In such a case you'll have to attach gdb to a running program that has hung. To do so find the applications pid using ps -afx then use the following gdb invocation:

«user@host»:~/dir§ gdb GNU gdb Red Hat Linux (6.3.0.0-0.31rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "ppc64-redhat-linux-gnu". (gdb) attach 25912

Manual re-construction of a backtrace
As a series of cascading function calls gets progressively deeper the stack increases in size. Before branching to another function the currently running function saves a return address into the "link register", lr. When the function is called it gets its own stack space and it stores the "link register" into its stack-frame in a place called the "LR Save word". This is the address to which it is supposed to branch when it is ready to return control to its calling function. Each function call does this and this builds a call-stack as each progressivly deeper function is called. A backtrace is the series of return addresses for a stack of function calls.

Sometimes a backtrace gets corrupted in GDB because GDB can't figure out how to construct it properly. This will require a manual backtrace reconstruction. We do this by manually rebuilding the stack, frame by frame. Don't worry, it isn't too bad, just time consuming.

You have to have the ABI handy for your particular architecture to know how the stack-frame is constructed.


 * The x86ManualBacktrace tutorial shows how to manually construct a backtrace on x86 based upon the i386 32-bit ABI


 * This tutorial is based upon the ppc 32-bit ELF ABI.

Corrupted gdb Backtrace
Take the following corrupted backtrace as an example:

(gdb) bt   at pthread.c:1216 at restart.h:34 Previous frame inner to this frame (corrupt stack?)
 * 1) 0 0x10002b2c in __pthread_sigsuspend (set=0x4) at pt-sigsuspend.c:54
 * 2) 1 0x10001c2c in __pthread_wait_for_restart_signal (self=0x4)
 * 1) 2 0x100008c8 in pthread_join (thread_id=16386, thread_return=0xffffe070)
 * 1) 3 0x10000428 in finish  at test.c:35
 * 2) 4 0x100001ec in __do_global_dtors_aux
 * 3) 5 0x10056c78 in _fini
 * 4) 6 0x10007238 in __libc_csu_fini  at elf-init.c:81
 * 5) 7 0x10007238 in __libc_csu_fini  at elf-init.c:81
 * 6) 8 0x10007238 in __libc_csu_fini  at elf-init.c:81
 * 7) 9 0x10007238 in __libc_csu_fini  at elf-init.c:81

Notice how frames #6,#7,#8 and #9 have the same Program Counter (0x10007238)? This indicates that got confused at that point. This may or may not indicate real stack corruption. For this we'll have to investigate the stack. Sometimes the corruption can be quite extensive.


 * The following how-to will show how to manually reconstruct the back-trace.

Register relevance to backtraces
A good place to start investigation is with the current state of the registers: (gdb) info reg r0            0xb2     178 r1            0xffffdeb0       4294958768 r2            0x100c5400       269243392 r3            0x4      4 r4            0x8      8 r5            0xffffdef0       4294958832 r6            0x8      8 r7            0x10080000       268959744 r8            0xffffffc0       4294967232 r9            0x0      0 r10           0x0      0 r11           0x1f     31 r12           0x44000428       1140851752 r13           0x10082814       268970004 r14           0x0      0 r15           0x0      0 r16           0x0      0 r17           0x0      0 r18           0x0      0 r19           0x0      0 r20           0x10080000       268959744 r21           0x1000032c       268436268 r22           0x10007254       268464724 r23           0x100071d4       268464596 r24           0x0      0 r25           0x1      1 r26           0xffffe334       4294959924 r27           0xffffe070       4294959216 r28           0x0      0 r29           0x4002   16386 r30           0x10080000       268959744 r31           0xffffdef0       4294958832 pc            0x10002b2c       268446508 cr            0x34000428       872416296 lr            0x10001c2c       268442668 ctr           0x100071d4       268464596 xer           0x20000000       536870912

On Power hardware there are three really important registers that'll help us rebuild the stack and capture our back-trace. These are "general purpose register 1", i.e. r1, the "program counter register", i.e. pc and the "link register", i.e lr. The "count register", i.e. ctr can be useful as well, due to the fact that it is often used for branching to function pointers.


 * The "program counter" register pc is denoted by the ppc32 elf abi as the register that holds the pointer to the current instruction (or instruction that a hang or crash waits on).
 * The ppc32 elf abi denotes the "Link Register, lr as a register that is volatile across each function call. A function will update the lr with the address that a yet to-be-called function should return to when it is done.  There is no stricture on when a function, preparing to branch, should fill in the lr.  The lr isn't the most reliable indicator because the program could crash after it set the lr but before it branched to the next function or it could have crashed before it set the lr, prior to a branch.  The abi indicates that a called function must save the lr into the stack frame's LR save location immediately after it is invoked so that it knows which function it is supposed to branch back to when it returns.
 * Per the ppc32 elf abi, "general purpose register 1", 'r1', always holds the current stack-frame pointer and it is always valid. The contents of the stack-frame pointer (first word) is always the BC (Back Chain) pointer to the previously allocated Stack-Frame.  In ppc32 the second word at the stack-frame pointer address (stack-frame pointer + 0x4) is always the LR Save area.  It is where the address found in the Link Register when the function is entered is required to be stored.

We know that the last instruction is stored in the pc, 0x10002b2c, so we can keep that in mind when we rebuild our backtrace.

Examining the Stack-Frame in memory
Next, we'll begin to reconstruct the backtrace by locating all of the stack-frame pointers. Lets take a look at the memory comprising the stack by investigating the first stack-frame pointer, pointed to by r1. I cheat and exclude parts of the stack that are irrelevant. (gdb) x /200w 0xffffdeb0 0xffffdeb0:    0xffffdf50      0x10001c1c      0x00000000      0x00000000 0xffffdec0:    0x00000000      0x00000000      0x00000000      0x00000000 0xffffded0:    0x00000000      0x00000000      0x00000000      0x00000000 0xffffdee0:    0x00000000      0x00000000      0x00000000      0x00000000 0xffffdef0:    0xffffdf00      0x00000000      0x100e0000      0x00000000 0xffffdf00:    0xffffdf20      0x00000000      0x00000000      0x00000000 0xffffdf10:    0x00000000      0x00000000      0x000003e0      0x1007d29c 0xffffdf20:    0xffffdf40      0x00000000      0x00000000      0x00000000 0xffffdf30:    0xffffdf50      0x00000000      0x00000000      0xffffdf50 0xffffdf40:    0xffffdf50      0x00004002      0x100c1ba0      0x1007dc64 0xffffdf50:    0xffffe030      0x100008c8      0x100be000      0xffffdf70 0xffffdf60:    0xffffe000      0x10001c1c      0x00000000      0x00000000 0xffffdf70:    0x00000000      0x00000000      0x00000000      0x00000000 0xffffdf80:    0x00000000      0x00000000      0x00000000      0x00000000 0xffffdf90:    0x00000000      0x00000000      0x00000000      0x00000000 0xffffdfa0:    0x00000000      0x00000000      0x00000000      0x00000000 0xffffdfb0:    0x00000003      0x00000004      0x100bec40      0x100bec28 0xffffdfc0:    0xffffdfd0      0x00000001      0x1007db44      0x100be000 0xffffdfd0:    0xffffe000      0x10004dd0      0x00003362      0x44000422 0xffffdfe0:    0x00000000      0x00003362      0x00000000      0x80000000 0xffffdff0:    0x100be000      0x00000000      0x100be000      0x00000094 0xffffe000:    0x1007dc64      0x1000045c      0x00000000      0x100071d4 0xffffe010:    0x100be000      0x00000002      0x00000000      0x00000000 0xffffe020:    0x00000000      0x00000000      0x10080000      0xffffe030 0xffffe030:    0xffffe060      0x10000428      0x00000000      0x1007d29c 0xffffe040:    0xffffe070      0x10012e14      0x00000000      0x00000000 0xffffe050:    0x00000023      0x00000000      0x00000000      0x10080000 0xffffe060:    0xffffe080      0x100001ec      0xffffe304      0x00000000 0xffffe070:    0x00000000      0xffffe418      0x00000000      0xffffffff 0xffffe080:    0xffffe0a0      0x10056c78      0x00000000      0x1007d094 0xffffe090:    0xffffe0a0      0x100066d8      0xffffe304      0x00000000 0xffffe0a0:    0xffffe0c0      0x10007238      0x00000000      0xffffe0b0 0xffffe0b0:    0x00000000      0x00000000      0x00000000      0x10080000 0xffffe0c0:    0xffffe0e0      0x100080c8      0x00000000      0xffffe1b2 0xffffe0d0:    0xffffe0e0      0xffffe418      0x00000000      0x10080000 0xffffe0e0:    0xffffe2f0      0x10006d80      0x00000000      0x00000000 0xffffe0f0:    0x00000000      0x00000000      0x00000000      0x00000000 ... ... 0xffffe250:    0x00000000      0x00000000      0x00000000      0x00000000


 * In the output above 0xffffdeb0, taken from r1, is the current (as of the hang or crash) stack-frame pointer, which coincides with the instruction in the pc register. The value at the stack-frame pointer is the address of the back-chain pointer to the previous stack frame.  So address 0xffffdf50 is the address of the previous stack frame.


 * As mentionted earlier, the second word of a stack frame is the LR Save Word. So address 0xffffdeb0 is the LR Save Word for the current function, which is the instruction address to which we blr (branch to link register) when this function returns to its calling function.  We use this address to determine which function the current function was called from.

Gathering the saved Program Counters
We can compose the following stack-frame table by following the backchain pointer and recording the LR Save Word for each stack-frame, e.g.


 * NOTE: When the backchain pointer for a stack-frame is 0x00000000 we know we've reached the start of the program.

stack frame ptr  backchain ptr     LR save word 0xffffdeb0:    0xffffdf50      0x10001c1c 0xffffdf50:    0xffffe030      0x100008c8 0xffffe030:    0xffffe060      0x10000428 0xffffe060:    0xffffe080      0x100001ec 0xffffe080:    0xffffe0a0      0x10056c78 0xffffe0a0:    0xffffe0c0      0x10007238 0xffffe0c0:    0xffffe0e0      0x100080c8 0xffffe0e0:    0xffffe2f0      0x10006d80 0xffffe250:    0x00000000      0x00000000

Use objdump to attach symbols to addresses
The next thing to do is to use another terminal to objdump the disassembly and symbol information from the binary so that we can see which functions the instruction pointers stored in the LR Save Words reside in.

«user@host»:~/dir§ objdump -tD a.out > a.dis

For the backtrace we're really only interested in the program counters (the values in LR save word) so we construct a backtrace table using the value in the pc as the first address:


 * 1) 0 0x10002b2c
 * 2) 1 0x10001c1c
 * 3) 2 0x100008c8
 * 4) 3 0x10000428
 * 5) 4 0x100001ec
 * 6) 5 0x10056c78
 * 7) 6 0x10007238
 * 8) 7 0x100080c8
 * 9) 8 0x10006d80

Investigate objdump disassembly for program counters
Now, start looking up the addresses in the disassembly file. Remember, if you don't see symbol names you either didn't build with the -g option or you didn't ask objdump for the symbol information.

So looking for address 0x10002b2c gives the following, such that we know that 0x10002b2c resides in function __pthread_sigsuspend: 10002b20 <__pthread_sigsuspend>: 10002b20:      38 00 00 b2     li      r0,178 10002b24:      38 80 00 08     li      r4,8 10002b28:      44 00 00 02     sc 10002b2c:       7c 00 00 26     mfcr    r0 10002b30:       4e 80 00 20     blr

We'll do one more example since the first instruction address is usually a bit different than the rest because it is usually the instruction that caused the crash, and not a function return address like the remainder of the address pointers will be. Look at the next instruction in the list, 0x10001c1c.

10001be4 <__pthread_wait_for_restart_signal>: 10001be4:      94 21 ff 60     stwu    r1,-160(r1) 10001be8:      7c 08 02 a6     mflr    r0 10001bec:       93 e1 00 9c     stw     r31,156(r1) 10001bf0:      3b e1 00 10     addi    r31,r1,16 10001bf4:      93 c1 00 98     stw     r30,152(r1) 10001bf8:      38 80 00 00     li      r4,0 10001bfc:      7f e5 fb 78     mr      r5,r31 10001c00:      38 60 00 02     li      r3,2 10001c04:      3f c0 10 08     lis     r30,4104 10001c08:      90 01 00 a4     stw     r0,164(r1) 10001c0c:      48 00 60 e5     bl      10007cf0 <__sigprocmask> 10001c10:      80 9e a8 24     lwz     r4,-22492(r30) 10001c14:      7f e3 fb 78     mr      r3,r31 10001c18:      48 00 63 0d     bl      10007f24 10001c1c:      38 00 00 00     li      r0,0 10001c20:      90 02 8c 24     stw     r0,-29660(r2) 10001c24:      7f e3 fb 78     mr      r3,r31 10001c28:      48 00 0e f9     bl      10002b20 <__pthread_sigsuspend> 10001c2c:      81 3e a8 24     lwz     r9,-22492(r30) 10001c30:      80 02 8c 24     lwz     r0,-29660(r2) 10001c34:      7f 80 48 00     cmpw    cr7,r0,r9 10001c38:      40 9e ff ec     bne+    cr7,10001c24 <__pthread_wait_for_restart_signal+0x40> 10001c3c:      7c 00 04 ac     sync 10001c40:      80 01 00 a4     lwz     r0,164(r1) 10001c44:      83 c1 00 98     lwz     r30,152(r1) 10001c48:      83 e1 00 9c     lwz     r31,156(r1) 10001c4c:      7c 08 03 a6     mtlr    r0 10001c50:       38 21 00 a0     addi    r1,r1,160 10001c54:      4e 80 00 20     blr

This one is interesting because we know we are in __pthread_wait_for_restart_signal but the LR is before the __pthread_sigsuspend function call which is the call we just made. This is because __pthread_wait_for_restart_signal probably includes a loop in the code and the compiler decided to have the called function immediately execute again. Continue to trace each instruction address in our backtrace and rebuild it until you get the following:

Apply symbols to rebuilt backtrace

 * 1) 0 0x10002b2c  in __pthread_sigsuspend
 * 2) 1 0x10001c1c  in __pthread_wait_for_restart_signal
 * 3) 2 0x100008c8  in pthread_join
 * 4) 3 0x10000428  in finish
 * 5) 4 0x100001ec  in __do_global_dtors_aux
 * 6) 5 0x10056c78  in _fini
 * 7) 6 0x10007238  in __libc_csu_fini
 * 8) 7 0x100080c8  in exit
 * 9) 8 0x10006d80  in __libc_start_main

Congratulations, you've successfully reconstructed a backtrace.

The End

Credits

 * The original content of this tutorial was provided by Ryan S. Arnold, aka RandomTask, from his engineering journal.