Generate Stack Traces on Crash Portably in C++
There are three basic steps to getting this done, trapping signals, getting the stack frames, and then demangling the c++ symbols.
Trapping Signals
When a program crashes the operating system will sent it a signal, to give it one last chance to wrap up what it is doing, before the OS terminates the process. Your program can intercept the signal, and run code. You could for example try to recover from the error. In this case all we want to do is print out a stack trace.
Trapping the signal is pretty straight-forward. But first a little housekeeping. We’re going to create this system as a trivial c++ class that only has a constructor:
class KxStackTrace { public: KxStackTrace(); };
We will instantiate this class as a singleton in global scope, or you could instantiate it in main()
. The constructor will register the signal handlers, so all that matters is that a single instance is created sometime before you need it.
To start handling signals you only need to register a signal handler using the signal
function:
#include <stdio.h> #include <signal.h> KxStackTrace::KxStackTrace() { signal( SIGABRT, abortHandler ); signal( SIGSEGV, abortHandler ); signal( SIGILL, abortHandler ); signal( SIGFPE, abortHandler ); }
The signal()
function is a posix standard and is supported in Windows, Linux and OSX. When a given signal is generated it will call the handler function in the second argument. You can trap any operating system signal. In this case we are trapping the most important signals that are generated when the OS detects an execution problem.
SIGABRT
is generated when the program calls the abort()
function, such as when an assert()
triggers.
SIGSEGV
and SIGBUS
are generated when the program makes an illegal memory access, such as reading unaligned memory, dereferencing a NULL pointer, reading memory out of bounds etc. SIGBUS
is not supported under Windows.
SIGILL
is generated when the program tries to execute a malformed instruction. This happens when the execution pointer starts reading non-program data, or when a pointer to a function is corrupted.
SIGFPE
is generated when executing an illegal floating point instruction, most commonly division by zero or floating point overflow.
To get a list of signals supported by your operating system run the following command:
$ kill -l 1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL 5) SIGTRAP 6) SIGABRT 7) SIGEMT 8) SIGFPE 9) SIGKILL 10) SIGBUS 11) SIGSEGV 12) SIGSYS 13) SIGPIPE 14) SIGALRM 15) SIGTERM 16) SIGURG 17) SIGSTOP 18) SIGTSTP 19) SIGCONT 20) SIGCHLD 21) SIGTTIN 22) SIGTTOU 23) SIGIO 24) SIGXCPU 25) SIGXFSZ 26) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29) SIGPWR 30) SIGUSR1
Now we implement the signal handler function itself:
void abortHandler( int signum ) { // associate each signal with a signal name string. const char* name = NULL; switch( signum ) { case SIGABRT: name = "SIGABRT"; break; case SIGSEGV: name = "SIGSEGV"; break; case SIGBUS: name = "SIGBUS"; break; case SIGILL: name = "SIGILL"; break; case SIGFPE: name = "SIGFPE"; break; } // Notify the user which signal was caught. We use printf, because this is the // most basic output function. Once you get a crash, it is possible that more // complex output systems like streams and the like may be corrupted. So we // make the most basic call possible to the lowest level, most // standard print function. if ( name ) fprintf( stderr, "Caught signal %d (%s)\n", signum, name ); else fprintf( stderr, "Caught signal %d\n", signum ); // Dump a stack trace. // This is the function we will be implementing next. printStackTrace(); // If you caught one of the above signals, it is likely you just // want to quit your program right now. exit( signum ); }
Using SigAction on POSIX Systems
The example above will work on both Windows and POSIX systems. On any reasonably modern POSIX platform you can use the newer sigaction()
function. This does the same thing but provides additional information about the signal, and a little more control over which signals you handle.
void KxStackTrace::KxStackTrace() { struct sigaction sa; sa.sa_flags = SA_SIGINFO; sa.sa_sigaction = abortHandler; sigemptyset( &sa.sa_mask ); sigaction( SIGABRT, &sa, NULL ); sigaction( SIGSEGV, &sa, NULL ); sigaction( SIGBUS, &sa, NULL ); sigaction( SIGILL, &sa, NULL ); sigaction( SIGFPE, &sa, NULL ); sigaction( SIGPIPE, &sa, NULL ); }
The handler call will need to have some additional arguments:
void abortHandler( int signum, siginfo_t* si, void* unused ) { }
We won’t use the extra information in our code here, but with sigaction()
you can find out stuff like the process id and user id of the sending process ( e.g. if a kill signal was sent from another process ). And you can get the address of the code that generated the signal, in the case of a crash. Its nice to know you have options.
Printing a Basic Stack Trace
Linux and OsX
On Linux and its Posix variants ( like OsX ), you will want to use backtrace()
and backtrace_symbols()
. The data is returned in buffers allocated by malloc()
. To be clean you will want to free them before you exit.
backtrace()
returns the raw stack trace information, as an array of stack frames. If the length is zero then it is likely the stack was corrupt, and you are done. If you get some stack frames then you can turn them into a somewhat readable form using backtrace_symbols()
:
#include <execinfo.h> static inline void printStackTrace( FILE *out = stderr, unsigned int max_frames = 63 ) { fprintf(out, "stack trace:\n"); // storage array for stack trace address data void* addrlist[max_frames+1]; // retrieve current stack addresses u32 addrlen = backtrace( addrlist, sizeof( addrlist ) / sizeof( void* )); if ( addrlen == 0 ) { fprintf( out, " \n" ); return; } // create readable strings to each frame. char** symbollist = backtrace_symbols( addrlist, addrlen ); // print the stack trace. for ( u32 i = 4; i < addrlen; i++ ) fprintf( out, "%s\n", symbollist[i]); free(symbollist); }
Note that we don’t print the first four stack frames. These are not interesting because they will be the functions called by the stack trace code itself. The program function that generated the signal will be at offset 4.
On Linux the output looks something like this:
Caught signal 6 (SIGABRT) stack trace: /lib/libc.so.6(abort+0x180) [0x7f96afdf9f70] /lib/libc.so.6(__assert_fail+0xf1) [0x7f96afdf02b1] ./bin/linux-x86_64/debug/kxf_test() [0x4359fd] ./bin/linux-x86_64/debug/kxf_test(_Z8testPathv+0x29b4) [0x4156e8] ./bin/linux-x86_64/debug/kxf_test(_Z6doTestPKcPFvvE+0xc8) [0x424c89] ./bin/linux-x86_64/debug/kxf_test(main+0xe4) [0x424de2] /lib/libc.so.6(__libc_start_main+0xfd) [0x7f96afde3c4d] ./bin/linux-x86_64/debug/kxf_test() [0x412c79]
Note that the C++ symbols are still mangled and unreadable. The output for OsX will be somewhat different, but similarly unreadble. We will need to demangle the symbols. More on that later.
In the mean time, if you have a program that dumps mangled stack traces, you can cut them, and paste them into my online gcc c++ demangler, to get demangled symbols.
Windows
Getting and printing the stack trace on Windows is a nightmare. To start with the functions needed to dump the stack are not available by default, but must be loaded in using dbghelp.dll
or imagehlp.dll
. The stack dumping function itself has changed from StackWalk
to StackWalk64
. Finding your symbols themselves is complex.
I use a prebuilt stack dumping class called StackWalker, modified somewhat to remove stuff I don’t need.
You could use the StackWalker
unmodified, and derive from that class to change the output. I just simplified the code to make the changes I need. If you use StackWalker, then dumping a stack trace becomes pretty easy:
#include "StackWalker.h" static inline void printStackTrace( FILE *out = stderr ) { fprintf( out, "stack trace:\n" ); StackWalker sw; sw.ShowCallstack(); }
Unlike the POSIX version, this one will print your c++ stack symbols demangled by default, which is nice. It also prints the source file name and line number, which the POSIX version doesn’t do.
Caught signal 22 (SIGABRT) stack trace: f:\dd\vctools\crt_bld\self_x86\crt\src\abort.c (74): abort f:\dd\vctools\crt_bld\self_x86\crt\src\assert.c (336): _wassert c:\users\rafael\desktop\milk2\kxf\src\win32\kxf_pathos.cpp (740): KxfPath::listFiles c:\users\rafael\desktop\milk2\kxf\src\kxf_test\test_path.cpp (213): testPath c:\users\rafael\desktop\milk2\kxf\src\kxf_test\kxf_test.cpp (25): doTest c:\users\rafael\desktop\milk2\kxf\src\kxf_test\kxf_test.cpp (55): main f:\dd\vctools\crt_bld\self_x86\crt\src\crt0.c (278): __tmainCRTStartup f:\dd\vctools\crt_bld\self_x86\crt\src\crt0.c (189): mainCRTStartup 75A2339A (kernel32): (filename not available): BaseThreadInitThunk 77679EF2 (ntdll): (filename not available): RtlInitializeExceptionChain 77679EC5 (ntdll): (filename not available): RtlInitializeExceptionChain
Demanging GCC C++ Symbols
We’re almost done. We just need to make the POSIX output more readable. Demangling gcc symbols is easy:
#include <cxxabi.h> ... int status; size_t unmangledNameSize = 1024; char unmangledName[1024]; char* ret = abi::__cxa_demangle( mangledName, &unmangledName[0], &unmangledNameSize, &status ); ...
You have to allocate the buffer for the unmangled name yourself. In the above example we just grab a small buffer off the stack. The entire function is below. It has a bunch of uninteresting string parsing code that parses out the mangled names, passes them to __cxa_demangle()
, and prints them out. It does it slightly differently for Linux and OsX.
#include <execinfo.h> #include <errno.h> #include <cxxabi.h> static inline void printStackTrace( FILE *out = stderr, unsigned int max_frames = 63 ) { fprintf(out, "stack trace:\n"); // storage array for stack trace address data void* addrlist[max_frames+1]; // retrieve current stack addresses unsigned int addrlen = backtrace( addrlist, sizeof( addrlist ) / sizeof( void* )); if ( addrlen == 0 ) { fprintf( out, " \n" ); return; } // resolve addresses into strings containing "filename(function+address)", // Actually it will be ## program address function + offset // this array must be free()-ed char** symbollist = backtrace_symbols( addrlist, addrlen ); size_t funcnamesize = 1024; char funcname[1024]; // iterate over the returned symbol lines. skip the first, it is the // address of this function. for ( unsigned int i = 4; i < addrlen; i++ ) { char* begin_name = NULL; char* begin_offset = NULL; char* end_offset = NULL; // find parentheses and +address offset surrounding the mangled name #ifdef DARWIN // OSX style stack trace for ( char *p = symbollist[i]; *p; ++p ) { if (( *p == '_' ) && ( *(p-1) == ' ' )) begin_name = p-1; else if ( *p == '+' ) begin_offset = p-1; } if ( begin_name && begin_offset && ( begin_name < begin_offset )) { *begin_name++ = '\0'; *begin_offset++ = '\0'; // mangled name is now in [begin_name, begin_offset) and caller // offset in [begin_offset, end_offset). now apply // __cxa_demangle(): int status; char* ret = abi::__cxa_demangle( begin_name, &funcname[0], &funcnamesize, &status ); if ( status == 0 ) { funcname = ret; // use possibly realloc()-ed string fprintf( out, " %-30s %-40s %s\n", symbollist[i], funcname, begin_offset ); } else { // demangling failed. Output function name as a C function with // no arguments. fprintf( out, " %-30s %-38s() %s\n", symbollist[i], begin_name, begin_offset ); } #else // !DARWIN - but is posix // not OSX style // ./module(function+0x15c) [0x8048a6d] for ( char *p = symbollist[i]; *p; ++p ) { if ( *p == '(' ) begin_name = p; else if ( *p == '+' ) begin_offset = p; else if ( *p == ')' && ( begin_offset || begin_name )) end_offset = p; } if ( begin_name && end_offset && ( begin_name < end_offset )) { *begin_name++ = '\0'; *end_offset++ = '\0'; if ( begin_offset ) *begin_offset++ = '\0'; // mangled name is now in [begin_name, begin_offset) and caller // offset in [begin_offset, end_offset). now apply // __cxa_demangle(): int status = 0; char* ret = abi::__cxa_demangle( begin_name, funcname, &funcnamesize, &status ); char* fname = begin_name; if ( status == 0 ) fname = ret; if ( begin_offset ) { fprintf( out, " %-30s ( %-40s + %-6s) %s\n", symbollist[i], fname, begin_offset, end_offset ); } else { fprintf( out, " %-30s ( %-40s %-6s) %s\n", symbollist[i], fname, "", end_offset ); } #endif // !DARWIN - but is posix } else { // couldn't parse the line? print the whole line. fprintf(out, " %-40s\n", symbollist[i]); } } free(symbollist); }
The new stack trace output looks something liks this:
kxf_test: src/posix/kxf_pathos.cpp:824: virtual bool KxfPath::listFiles(
KxVector&, const KxfPath&, bool) const: Assertion `0′ failed.
Caught signal 6 (SIGABRT)
stack trace:
/lib/libc.so.6 ( abort + 0x180 ) [0x7f1b6987ff70]
/lib/libc.so.6 ( __assert_fail + 0xf1 ) [0x7f1b698762b1]
./bin/linux-x86_64/debug/kxf_test ( ) [0x435a5d]
./bin/linux-x86_64/debug/kxf_test ( testPath() + 0x29b4) [0x415748]
./bin/linux-x86_64/debug/kxf_test ( doTest(char const*, void (*)()) + 0xc8 ) [0x424ce9]
./bin/linux-x86_64/debug/kxf_test ( main + 0xe4 ) [0x424e42]
/lib/libc.so.6 ( __libc_start_main + 0xfd ) [0x7f1b69869c4d]
./bin/linux-x86_64/debug/kxf_test ( ) [0x412cd9]
Global Objects with Side Effects
There is one final thing to be aware of. This is implemented as a trivial class that simply has a single constructor that registers the signal handlers. The singleton object does not ever have to be referred to anywhere else in your program. If you initialize the object inside of main()
, or in any of your regular code everything will work fine. But if you do, crash dumps will not be available until after the start of main()
. It would be better to instantiate this object in global scope so that the constructor runs before main. The problem is that if you do this in a library, and the global object is never referred to anywhere else, the linker ( at least in gcc ), will not link this object in and the constructor will never fire. The answer is to allocate it as a member of some other global object that you know for certain will be referenced in any program. In my case, I just allocate the object as part of my memory manager, which will always be used:
class KxMemGlobals { ... KxStackTrace mStackTrace; ... };
Resources:
C/C++ signal handling | Lists all the standard signals |
CodeProject: StackWalker | A Windows StackWalking class. |
It’s worth noting that backtrace_symbols won’t reliably work inside a signal handler. backtrace_symbols calls malloc, so if your program happened to crash while inside malloc then your signal handler won’t be able to acquire the heap lock and will deadlock.
[…] found the following site which details how to get a stack trace GENERATE STACK TRACES ON CRASH PORTABLY IN C++. But I’m not sure if this will work on a remote system. I’m guessing this will need to […]
backtrace_symbols_fd() does not call malloc
Great explanation, nice code — thank you!
However, I would like to use the code in my Apache-Licensed project, is your code free, what license would you like to be applied? Thank you again!
Stephan
This example code is public domain. Use it however you want.
https://stackwalker.codeplex.com/license
Also, the StackWalker project is licensed under the New BSD License
Excellent tutorial.
Unfortunately, I have a problem: calling StackWalker::ShowCallstack() only shows me the callstack of the signal hanlder, that is:
#0: StackWalker::ShowCallStack
#1: printStackTrace
#2: aborthandler
I’m running Windows 7 64 and a 32-bit application compiled with MSVC 2013, which might be the source of the problem, according to this thread I found: http://stackwalker.codeplex.com/discussions/642642
Any ideas that might help me?
Thanks in advance.
[…] an earlier article I described how to generate stack traces when your programs crashes. In that article we had file and line information for Win32 but not Linux. In this article we […]
Looks like at some point this code got corrupted and some entries were replaced with HTML entities, for example: `if ( begin_name && end_offset && ( begin_name < end_offset ))`