Printing Stack Traces with File and Line

In an earlier article I described how to generate stack traces when your programs crashes. In that article we had file and line information for Win32 but not Linux. In this article we describe how to add the file and line information to a Linux/GCC stack trace.

Building Binaries with File and Line Information

This will only work if you build your program with file and line information. In GCC you have to add the -g flag. This will increase the size of your binaries quite a bit.

You also should consider the optimization level. With no optimization your file and line information will be completely correct. If you add optimization by adding for example the -O3 flag, many functions will be inlined, some code may be run out of order etc. In this case the compiler will do its best to give you correct file and line information, but this may not always be possible.

Extracting the Symbols

As was described in a previous article, getting a stack trace is a matter of getting a series of program addresses where each function call happened. You get this pretty easily by calling a function in cxxabi.h called backtrace.

Getting the symbols is trickier. You have to load all the symbol tables, and then search in the tables for each address. The functions you need are in libbfd.a, and in the bfd.h header.

Roughly speaking, for each address in the stack, we will lookup the correct symbol table, load it, and search it for the closest file and line to that address and return that result. This code is similar to what the utility addr2line does.

You’re going to need the following headers:

// include all the headers you're going to need.
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <execinfo.h>
#include <bfd.h>
#include <dlfcn.h>
#include <link.h>

Now lets look at the top level function:

char** backtraceSymbols( void* const* addrList, int numAddr )
{
   char*** locations = (char***) alloca( sizeof( char** ) * numAddr );

   // initialize the bfd library
   bfd_init(); 

   int total = 0;
   u32 idx = numAddr;
   for ( s32 i = 0; i < numAddr; i++ )
   {
      // find which executable, or library the symbol is from
      FileMatch match( addrList[--idx] );
      dl_iterate_phdr( findMatchingFile, &match );

      // adjust the address in the global space of your binary to an
      // offset in the relevant library
      bfd_vma addr  = (bfd_vma)( addrList[idx] );
              addr -= (bfd_vma)( match.mBase );

      // lookup the symbol
      if ( match.mFile && strlen( match.mFile ))
         locations[idx] = processFile( match.mFile,      &addr, 1 );
      else
         locations[idx] = processFile( "/proc/self/exe", &addr, 1 );

      total += strlen( locations[idx][0] ) + 1;
   }

   // return all the file and line information for each address
   char** final = (char**)malloc( total + ( numAddr * sizeof( char* )));
   char* f_strings = (char*)( final + numAddr );

   for ( s32 i = 0; i < numAddr; i++ )
   {
      strcpy( f_strings, locations[i][0] );
      free( locations[i] );
      final[i] = f_strings;
      f_strings += strlen( f_strings ) + 1;
   }

   return final;
}

This function takes in an array of addresses from the stack trace, and gets file and line for each one. What is important in this code the the first loop over the addresses. The second loop is a clever way of returning all the strings as an array of strings but where deleting the base array deletes all the strings as well – interesting – but not critical to this project.

dl_iterate_phdr() will iterate over all the symbol tables linked to the current program looking for a table that matches with the given address. This is done by passing in a pointer to a match function, findMatchingFile(), and the address to a data structure match, that this function will write to.

We initialize each FileMatch structure with the address we are searching for, and call the iterator. When it returns, the name of the binary file that this symbol is from will be in there if it can find it. This is not the name of the source file, that comes later.

Once found we will call processFile to extract the exact line number and source file name.

Finding the Correct Symbol Table

class FileMatch 
{
public:
   FileMatch( void* addr ) : mAddress( addr ), mFile( NULL ), mBase( NULL ) {}

   void*       mAddress;
   const char* mFile;
   void*       mBase;
};

static int findMatchingFile( struct dl_phdr_info* info, size_t size, void* data )
{
   FileMatch* match = (FileMatch*)data;

   for ( u32 i = 0; i < info->dlpi_phnum; i++ )
   {
      const ElfW(Phdr)& phdr = info->dlpi_phdr[i];

      if ( phdr.p_type == PT_LOAD ) 
      {
         ElfW(Addr) vaddr = phdr.p_vaddr + info->dlpi_addr;
         ElfW(Addr) maddr = ElfW(Addr)(match->mAddress);
         if (( maddr >= vaddr ) && 
             ( maddr < vaddr + phdr.p_memsz )) 
         {
            match->mFile =        info->dlpi_name;
            match->mBase = (void*)info->dlpi_addr;
            return 1;
         }
      }
   }
   return 0;
}

So for each linked binary ( e.g. each library or shared library ) our function findMatchingFile is called. If the address in question is in the right range, we found our match, and we store the name of the binary file, and the offset that binary is mapped into the larger program.

Its possible for some addresses not to be mapped to any binary. Some reasons would be that the binary in question was compiled without symbols – or it could be that the address is completely bogus. So you have to be prepared for it not to find anything.

Finding the Right Binary and Loading it’s Symbols

At this point we will have an address that has been translated from the address space of the main program, into the address space of the correct binary ( e.g. into an offset in a library ). Now we have to load that library’s symbol table, and look up the file and line.

Here is our main function to this this:

static char** processFile( const char* fileName, bfd_vma* addr, int naddr )
{
   bfd* abfd = bfd_openr( fileName, NULL );
   if ( !abfd )
   {
      printf( "Error opening bfd file \"%s\"\n", fileName );
      return NULL;
   }

   if ( bfd_check_format( abfd, bfd_archive ) )
   {
      printf( "Cannot get addresses from archive \"%s\"\n", fileName );
      bfd_close( abfd );
      return NULL;
   }

   char** matching;
   if ( !bfd_check_format_matches( abfd, bfd_object, &matching )) 
   {
      printf( "Format does not match for archive \"%s\"\n", fileName );
      bfd_close( abfd );
      return NULL;
   }

   asymbol** syms = kstSlurpSymtab( abfd, fileName );
   if ( !syms )
   {
      printf( "Failed to read symbol table for archive \"%s\"\n", fileName );
      bfd_close( abfd );
      return NULL;
   }

   char** retBuf = translateAddressesBuf( abfd, addr, naddr, syms );

   free( syms );

   bfd_close( abfd );
   return retBuf;
}

Pretty straight forward function. Open the binary, do a few checks, load the symbol table, and then do the actual file/line extraction in translateAddressesBuf().

Now lets look at how we load the symbol table. Given a file name, we return an array of symbols. When we are done with them, we will have to call free on them.

static asymbol** kstSlurpSymtab( bfd* abfd, const char* fileName )
{
   if ( !( bfd_get_file_flags( abfd ) & HAS_SYMS ))
   {
      printf( "Error bfd file \"%s\" flagged as having no symbols.\n", fileName );
      return NULL;
   }

   asymbol** syms;
   unsigned int size;

   long symcount = bfd_read_minisymbols( abfd, false, (void**)&syms, &size );
   if ( symcount == 0 )
        symcount = bfd_read_minisymbols( abfd, true,  (void**)&syms, &size );

   if ( symcount < 0 ) 
   {
      printf( "Error bfd file \"%s\", found no symbols.\n", fileName );
      return NULL;
   }

   return syms;
}

Extracting File and Line

This function will look up a series of addresses in the current set of symbol and return file and line for each one, if it can find them:

static char** translateAddressesBuf( bfd* abfd, bfd_vma* addr, int numAddr, asymbol** syms )
{
   char** ret_buf = NULL;
   s32    total   = 0;

   char   b;
   char*  buf     = &b;
   s32    len     = 0;

   for ( u32 state = 0; state < 2; state++ ) 
   {
      if ( state == 1 ) 
      {
         ret_buf = (char**)malloc( total + ( sizeof(char*) * numAddr ));
         buf = (char*)(ret_buf + numAddr);
         len = total;
      }

      for ( s32 i = 0; i < numAddr; i++ )
      {
         FileLineDesc desc( syms, addr[i] );

         if ( state == 1 )
            ret_buf[i] = buf;
     
         bfd_map_over_sections( abfd, FindAddressInSection, (void*)&desc );
     
         if ( !desc.mFound ) 
         {
            total += snprintf( buf, len, "[0x%llx] \?\? \?\?:0", (long long unsigned int) addr[i] ) + 1;

         } else {

            const char* name = desc.mFunctionname;
            if ( name == NULL || *name == '\0' )
               name = "??";
            if ( desc.mFilename != NULL ) 
            {
               char* h = strrchr( desc.mFilename, '/' );
               if ( h != NULL )
                  desc.mFilename = h + 1;
            }
            total += snprintf( buf, len, "%s:%u %s", desc.mFilename ? desc.mFilename : "??", desc.mLine, name ) + 1;
            // elog << "\"" << buf << "\"\n";
         }
      }

      if ( state == 1 ) 
      {
         buf = buf + total + 1;
      }
   }

   return ret_buf;
}

We run through the loop twice. Once we get everything and count up how big a buffer we need to store all the strings. The second time, we actually store the strings.

Each time we iterate over all the sections of the symbol table with bfd_map_over_sections(). And this works in a similar way to the other iterator. We provide an interation function, and a data structure it can work on.

In this case we have a class that does most of the work FileLineDesc. FindAddressInSection() is just a small pass through function that calls into a method of FileLineDesc

And here is the code that final part:

class FileLineDesc
{
public:
   FileLineDesc( asymbol** syms, bfd_vma pc ) : mPc( pc ), mFound( false ), mSyms( syms ) {}

   void findAddressInSection( bfd* abfd, asection* section );

   bfd_vma      mPc;
   char*        mFilename;
   char*        mFunctionname;
   unsigned int mLine;
   int          mFound;
   asymbol**    mSyms;
};

void FileLineDesc::findAddressInSection( bfd* abfd, asection* section )
{
   if ( mFound )
      return;

   if (( bfd_get_section_flags( abfd, section ) & SEC_ALLOC ) == 0 )
      return;

   bfd_vma vma = bfd_get_section_vma( abfd, section );
   if ( mPc < vma )
      return;

   bfd_size_type size = bfd_section_size( abfd, section );
   if ( mPc >= ( vma + size ))
      return;

   mFound = bfd_find_nearest_line( abfd, section, mSyms, ( mPc - vma ),
                                   (const char**)&mFilename, (const char**)&mFunctionname, &mLine );
}

static void findAddressInSection( bfd* abfd, asection* section, void* data )
{
   FileLineDesc* desc = (FileLineDesc*)data;
   assert( desc );
   return desc->findAddressInSection( abfd, section );
}

Results

The returned file and line information will match one to one with the addresses in the call stack from backtrace(). So you can just match them up and print a nice call stack with file and line. Something like this: