Identifying Image Format from the First Few “Magic” Bytes in C++

All popular image file formats ( jpeg, png, gif, etc. ) can be identified from the first few bytes in the file. This is a good thing, because you cannot always trust file name extensions to be correct, and because images these days are often transferred in other ways – via http or embedded in other documents – where the image data may not have a file name.

A function to identify the most common formats is easy to write. First lets define an enumeration for all the file types we will support:

enum ImageFileType
   IMAGE_FILE_JPG,      // joint photographic experts group - .jpeg or .jpg
   IMAGE_FILE_PNG,      // portable network graphics
   IMAGE_FILE_GIF,      // graphics interchange format 
   IMAGE_FILE_TIFF,     // tagged image file format
   IMAGE_FILE_BMP,      // Microsoft bitmap format
   IMAGE_FILE_WEBP,     // Google WebP format, a type of .riff file
   IMAGE_FILE_ICO,      // Microsoft icon format
   IMAGE_FILE_INVALID,  // unidentified image types.

And now the image type detection function:

ImageFileType getImageTypeByMagic( const u8* data, u32 len )
   if ( len < 16 ) return IMAGE_FILE_INVALID;

   // .jpg:  FF D8 FF
   // .png:  89 50 4E 47 0D 0A 1A 0A
   // .gif:  GIF87a      
   //        GIF89a
   // .tiff: 49 49 2A 00
   //        4D 4D 00 2A
   // .bmp:  BM 
   // .webp: RIFF ???? WEBP 
   // .ico   00 00 01 00
   //        00 00 02 00 ( cursor files )

   switch ( data[0] )
      case (u8)'\xFF':
         return ( !strncmp( (const char*)data, "\xFF\xD8\xFF", 3 )) ? 

      case (u8)'\x89':
         return ( !strncmp( (const char*)data, 
                            "\x89\x50\x4E\x47\x0D\x0A\x1A\x0A", 8 )) ?

      case 'G':
         return ( !strncmp( (const char*)data, "GIF87a", 6 ) || 
                  !strncmp( (const char*)data, "GIF89a", 6 ) ) ? 

      case 'I':
         return ( !strncmp( (const char*)data, "\x49\x49\x2A\x00", 4 )) ? 

      case 'M':
         return ( !strncmp( (const char*)data, "\x4D\x4D\x00\x2A", 4 )) ? 

      case 'B':
         return (( data[1] == 'M' )) ? 

      case 'R':
         if ( strncmp( (const char*)data,     "RIFF", 4 )) 
            return IMAGE_FILE_INVALID;
         if ( strncmp( (const char*)(data+8), "WEBP", 4 )) 
            return IMAGE_FILE_INVALID;
         return IMAGE_FILE_WEBP;

      case '\0':
         if ( !strncmp( (const char*)data, "\x00\x00\x01\x00", 4 )) 
            return IMAGE_FILE_ICO;
         if ( !strncmp( (const char*)data, "\x00\x00\x02\x00", 4 )) 
            return IMAGE_FILE_ICO;
         return IMAGE_FILE_INVALID;

         return IMAGE_FILE_INVALID;


Like a lot of digital image formats, jpeg consists of a container format (JFIF), and then a codec format (JPEG proper). In theory the Jfif container can hold images encoded with other codecs.

All JFIF containers start with these three bytes:


In practice it is enough to detect just those. If you want to be more stringent and detect that the codec is indeed jpeg then you can also detect with the following, where ?? can be any value:

FF D8 FF E0 ?? ?? 4A 46 49 46 00
FF D8 FF E1 ?? ?? 4A 46 49 46 00

All other strings, will be other codecs packed into a JFIF container. Most of these are proprietary codecs for digital cameras.

PNG – Portable Network Graphics

The PNG specification simply lists the following 8 bytes as the file signature:

89 50 4E 47 0D 0A 1A 0A

Bytes 1-3 is the string “PNG”, followed by a CR-Lf sequence and then a control-z character.

GIF – Compuserve Graphics Interchange Format

These files simply start with one of two identifying strings: “GIF87a” or “GIF89a”. Both formats are in common use. The GIF87a was the original format. GIF89a is an improved format that adds animation and transparency.

Tiff – Tag Image File Format

Tiff is one of those formats that consists of a container that can hold one or more images stored using some other encoding. Tiff images can even contain other image formats – like jpeg.

There is a “little endian” and a “big-endian” version of the format with different signatures.

To detect if a file is a tiff container, check the first 4 bytes.

49 49 2A 00 // little endian
4D 4D 00 2A // big endian


The .bmp format starts with 2 bytes “BM”.


WebP files are technically RIFF files. RIFF is a container format like TIFF. WebP files are RIFF files that contain a single WEBP chunk.

To detect it, first check that the first 4 bytes are “RIFF”, and then that bytes 8-11 are “WEBP”.


The ICO format was designed by Microsoft to contain icons and cursors, has two variants, for icon images (.ico), and cursor images (.cur). Except for the header, both formats are identical.

The file signatures are:

00 00 01 00 // .ico format
00 00 02 00 // .cur format


File Signatures Table Gary Kessler’s list of magic bytes at the beginning of many popular file types.
List of file signatures Wikipedia’s list of magic bytes, less complete than Kessler’s list
JPEG Wikipedia entry about JPEG
CCITT Recommendation T.81 The original specification for jpeg.
JFIF, JPEG File Interchange Format, Version 1.02 The Library of Congress’ reference on jpeg.
Portable Network Graphics (PNG) Specification and Extensions The Libpng website has a section for PNG specifications.
GIF Wikipedia article about the GIF format.
Graphics Interchange Format (GIF) Specification The original 1987 Gif specification
Graphics Interchange Format version 89a The orignal specification for GIF89a
Tagged Image File Format Wikipedia article about the TIFF format.
BMP file format Wikipedia article about the BMP format.
WebP – A new image format for the Web Google’s page about WebP
WebP Wikipedia article about the WebP format.
Multimedia Programming Interface and Data Specifications 1.0 Microsoft’s original RIFF Specification, the container format for WebP
ICO (file format) Wikipedia article about the ICO format.