Packing Data Files into Compiled Executables
Have you ever wanted to distribute a compiled binary that included data files packed into the executable file?
Embedding a Data File Before Compilation
You can do this before compilation by encoding the file into a binary representation, and then compiling that data into statically allocated buffers.
For example you could take a text file that looks something like this:
[STD_STYLE] <style type="text/css"> body { text-align:center; padding:0; margin:1px 0 0; font: normal 12px Arial, Helvetica, Sans-serif; color:#333; } .... etc ... </div> </body> </html>
And use a tool to convert it to c code:
unsigned int std_tpl_len = 7807; static unsigned char std_tpl_buf[] = { 0x5b, 0x53, 0x54, 0x44, 0x5f, 0x53, 0x54, 0x59, 0x4c, 0x45, 0x5d, 0xa, 0x3c, 0x73, 0x74, 0x79, ... etc ... 0x62, 0x6f, 0x64, 0x79, 0x3e, 0xa, 0x3c, 0x2f, 0x68, 0x74, 0x6d, 0x6c, 0x3e, 0xa, 0xa, 0x0, }; const char* std_tpl = (const char*)&std_tpl_buf[0];
Then in your program, instead of calling fopen()
to read the file, you just use the statically allocated buffer, in this case std_tpl
just as you would a buffer you normally would have read in from a file.
The function to convert a file into c code looks something like this:
void convertToC( char* fileName ) { u32 srcLen; char* srcBuf = readFromFile( fileName, srcLen ); // generate a name char tmp[1024]; char* ptmp = &tmp[0]; char* symp = fileName; while ( *symp ) { *ptmp = ( isalnum( *symp ) ? *symp : '_' ); ptmp++; symp++; } *ptmp = '\0'; ptmp = &tmp[0]; // We're just going to dump it to the screen. // But you will want to write this to a file. cout << "unsigned int " << ptmp << "_len = " << srcLen-1 << ";\n" << "static unsigned char " << ptmp << "_buf[] = {\n" << KXS_PUSH_TAB; // iterate over file buf and generate hex output: os << hex; for ( u32 i = 0; i < len; i++ ) { if ( i && !( i % 16 )) cout << "\n"; cout << (u32)data[i] << ", "; } cout << dec << "\n};\nconst char* " << ptmp << " = (const char*)&" << ptmp << "_buf[0];\n"; }
Embedding Data After Compilation
But what if you want to embed the data files into your binary after compilation? This is useful for example if you want to embed license files into each binary, or if you have different data for each user. This way you don’t have to send the data and the binary as separate files for each installation. It can be sent as a custom binary to each user.
An easy trick is to just write the data files at the end of the binary. Windows, Linux and OsX binaries will all allow you to do this without affecting the functioning of the binary itself.
The trick is to create a magic value that you write at the end of the executable, plus an offset into the binary where the packed data payload starts. If the value exists, then your program knows that the data from that offset to the end of the binary is packed data.
Packing data into an executable looks like this:
// the magic as a string is: KJPK #define PACK_MAGIC 0x4B4A414B // an object that holds the magic value. class PackEnd { public: PackEnd( u32 offset ) { mMagic = PACK_MAGIC; mOffset = offset; } PackEnd() { ; } u32 mMagic; u32 mOffset; }; void packDataFiles( const char* fileName, const char* execName ) { const char* sourceFileName = getExecutableName(); // read in the binary executable - error handling removed. u32 execLen; const char* execBuf = readFile( execName, execLen ); // examine the last 8 bytes of the binary. PackEnd* end = (PackEnd*)( execBuf + ( execLen - sizeof( PackEnd ))); // If this is packed data, remove it. We just have to set the end of the // binary back to the true end before the payload. if ( end->mMagic == PACK_MAGIC ) execLen = end->mOffset; // KxSerialObj, is a buffer container, that we will be writing data into. KxSerialObj serObj; // write the executable out. serObj.write( execBuf, execLen ); // read in the payload file. u32 fileLen; const char* fileBuf = readFile( fileName, fileLen ); // append it to the binary serObj.write( fileBuf, fileLen ); // write out the pack structure serObj << (u32)PACK_MAGIC << execLen; // We're using the current name of the executable to write to. // but you should probably write to a modified version of the exe // name. e.g. myapp_new.exe writeFile( execName, serObj.getBuffer(), serObj.getLenth() ); }
Reading it while you are running is just a matter of first getting the file name of the running executable, reading it into memory, and extracting a pointer into the data payload.
To find the executable file name on Windows:
const char* getExecutableName() { static char buf[MAX_PATH] = { '\0' }; if ( buf[0] == '\0') GetModuleFileName( NULL, buf, MAX_PATH ); return buf; }
Or you could do it like this:
const char* getExecutableName() { static char buf[MAX_PATH] = { '\0' }; if ( buf[0] == '\0') { HANDLE hSnapshot = ::CreateToolhelp32Snapshot( TH32CS_SNAPMODULE, GetCurrentProcessId() ); MODULEENTRY32 me32 = {0}; me32.dwSize = sizeof(MODULEENTRY32); Module32First( hSnapshot, &me32 ); strcpy( buf, me32.szExePath; CloseHandle(hSnapshot); } return buf; }
On POSIX systems, like Linux and OsX, you get the executable name like this:
#include <unistd.h> #ifdef DARWIN #include <sys/param.h> #include <mach-o/dyld.h> #endif // !DARWIN const char* getExecutableName() { static char buf[MAX_PATH] = { '\0' }; if ( !buf[0] ) { #ifdef DARWIN u32 size = 0; _NSGetExecutablePath( 0, &size ); _NSGetExecutablePath( buf, &size ); #else //!DARWIN // on linux, you can get a symlink directly to the binary // through the /proc directory s32 len; if (( len = readlink( "/proc/self/exe", buf, sizeof( buf ) - 1 )) == -1 ) buf[len] = '\0'; #endif //!DARWIN return buf; }
So to get a pointer to the packed data, would be something like this:
const char* getPackDataFile( KxSymbol fileName ) { const char* execFileName = getExecutableName(); u32 execLen; const char* execFile = readFile( execFileName, execLen ); JamPackEnd* end = (JamPackEnd*)( execBuf + ( execLen - sizeof( PackEnd ))); if ( end->mMagic != PACK_MAGIC ) return ( execFile + end->mOffset ); }
This is a very simple example. For a real application you might want to consider multiple files with file names, etc. An easy way to do this would be to zip all the files you want to pack into a zip file, and then use an unzip library to read files out of the zip file.
This is a really interesting post. I am curious on the limitations of using a static buffer vs linking an .obj using ld.exe or similar and retrieving via extern char.
Interesting idea… but I’m not sure how to do that directly. The data that you link in would have to be packed into an object file – so you’d have to pack the data into an ELF file. I don’t know an easier way to do that than to transform it into text and compile it.