On any computer system with a small amount of main memory and no swap device you need to be extremely careful about how you allocate main system memory. For most data structures in a game you want to have them allocated statically. This way you don't have to worry about memory allocations failing during run-time. For some game systems however it really helps to have small pools of dynamically allocated memory. These pools can be quite small. For example for some of my GBA games I use a tiny buffer of 4k as scratch memory for the AI routines of all the enemies that are on the screen.

For these small general-purpose memory pools you cannot make too many assumptions about how the memory will be used. You want an allocator that will quickly give you memory chunks of any desired size. Because the pools are so small it is important that the memory overhead of dynamically managing these pools be very small as well. We present here a general-purpose memory manager that uses just 3% of the managed buffer for memory management overhead. This allocator must of course also coalesce adjacent free'd buffers and must choose the areas it allocates from carefully, to minimize dead space between allocated blocks (otherwise what is the point of getting the management overhead down to 3 percent?) It can manage buffers as small as 32 bytes and as large as 512k.

In the version we present here the memory manager manages the memory using 8 byte memory blocks. Of course you can use memory blocks of any size you want. Raising the block size will decrease the percentage of memory used for overhead, but will increase the amount of dead memory that cannot be allocated when the user asks for memory regions that are not multiples of the block size. Decreasing the block size does the reverse. So for a block size of 16 the overhead of the memory management data structures falls to 1.5%. The amount of "dead memory" will depend on the user's allocation pattern. When implementing a system like this you will want to consider the block size carefully.

The trick to making a memory manager with low memory overhead is to store as much of the heap management data structures as possible in the unallocated memory blocks. In our memory manager we use two basic data structures, a memory map that indicates which blocks are allocated, and an ordered binary tree that keeps track of all the free blocks. The memory map is stored in the beginning of the allocated buffer and is not available for allocation, but the binary tree, is stored in the free blocks themselves, and so costs us nothing.

For each memory block in the buffer we need to store two pieces of information in the memory map:

  • Is this block allocated or free?
  • And if it is allocated is this the last block in the allocated region?

To store these two pieces of information in the memory map we use 2 bits per block. So we can manage up to 32 bytes of data per byte of memory map. (This is where the 3% memory overhead comes from.)

For free blocks we don't need to store any information in the memory map. Once we know its free we know that we can find out all we want to know about it from the binary tree node that is stored inside the free block itself.

The memory manager also has a small 8-byte header that contains the total number of blocks being managed, an index into the root of the binary tree of free blocks, and a pointer to the first managed node.

The memory map follows the header, which is of the right size to store two bits of information per node. Finally each free block may contain a node for the binary tree. The tree nodes hold a pointer to the parent node, a pointer to all that node's smaller children, a pointer to all of the node's larger children and the size of the node. Each free node structure is 8-bytes, just fitting into the block.

The manager initializes all the bits in the memory map to free, with the binary tree holding a single node whose size is as big as all of the allocatable memory in the buffer.

To allocate a block you round the memory requested to the next block size, and then you walk the binary tree searching for the smallest available block that is large enough to allocate the required memory. Once an appropriate tree node is found, we shrink the node by the requested size. If no memory is left over the node is removed from the tree, if there is memory left over the node is resized and re-linked to the appropriate place in the tree. Finally you need to update the memory map to flag all of the allocated blocks as used, and you need to set the last block bit for the last block in the allocated region.

To free a block you find its area in the memory map, which you can do arithmetically from the pointer passed in, and start clearing allocated bits until you find a map entry that is labeled as the last block in the region. Then you check in the memory map to see if the areas adjacent to this block are free. If they are you merge the free'd block with its neighbors.

For a reasonably well-balanced tree the allocation operation will average m + log(n) time where n is the number of free nodes, and m is the size of the block. The free operation is linear with the size of the block. Our current implementation uses just a sorted binary tree. We do not do any automatic rebalancing, so it is possible, though unlikely, for the tree to become extremely unbalanced and slow the allocator down significantly when there are large numbers of free nodes. If this becomes a problem it is straightforward to implement a rebalancing routine like red-black. We have not found it to be necessary.

Below is some example code for a general purpose memory manager.


// the odd bit indicates used or unused. For used sections the even bit // indicates the end of a used section. #define MEM_UNUSED 0x0 #define MEM_USED 0x1 #define MEM_END 0x3 // each free region of memory in the memory pool starts with this structure // it is a binary tree of free nodes, with three pointers, one pointer to nodes // smaller than itself, another to nodes larger and finally one pointer to // parent nodes. If the heap gets very fractured *and* the tree becomes very // unbalanced this could be slow. typedef struct { u16 m_smaller_idx; u16 m_larger_idx; u16 m_parent_idx; u16 m_size; } mem_node_t; #define MEM_BAD_IDX 0xFFFF // this structure is 8 bytes. or one alloc block. // it is stored at the top of the memory pool buffer // to keep track of what memory has been allocated typedef struct { u16 m_num_nodes; u16 m_free_root; mem_node_t* m_nodes; u8 m_alloc_map[0]; // tricky tricky } mem_manager_t; // memory is managed in blocks of 8 bytes. void mem_init_pool( void* pool_memory, u32 size ) { mem_manager_t* p_manager = (mem_manager_t*)pool_memory; mem_node_t* p_node; u32 header_size = (( sizeof( mem_manager_t ) + 7 ) >> 3 ); u32 map_size; u32 num_blocks; u32 blocks_for_map; u32 i; // size is now in 8 byte blocks. If size were not a multiple of 8 // then the remainder is lost... as it should be. size >>= 3; // we cannot deal with buffers smaller than 4 blocks. ASSERT( size > 4 ); // this is the size of the memory pool in blocks. num_blocks = size - header_size; // determine the size of the alloc_map. 2 bits are stored in the // alloc map to keep track of one block. This means each // byte of alloca_map keeps track of the allocation of 4 blocks. map_size = (( num_blocks + 3 ) >> 2 ); // we must use some number of blocks to store the alloc map. // this is the number of blocks we need to store the map. blocks_for_map = (( map_size + 7 ) >> 3 ); // so now reduce the number of blocks to manage to account // for the blocks that we had to give to the map. num_blocks -= blocks_for_map; // total header size in blocks. header_size += blocks_for_map; // initialize the header p_manager->m_num_nodes = num_blocks; p_manager->m_free_root = 0; p_manager->m_nodes = (((mem_node_t*)pool_memory) + header_size ); for ( i = 0; i < map_size; i++ ) p_manager->m_alloc_map[i] = MEM_UNUSED; // initialize the single node. p_node = p_manager->m_nodes; p_node->m_smaller_idx = MEM_BAD_IDX; p_node->m_larger_idx = MEM_BAD_IDX; p_node->m_parent_idx = MEM_BAD_IDX; p_node->m_size = num_blocks; p_node = p_manager->m_nodes + ( num_blocks - 1 ); p_node->m_size = num_blocks; } #define MEM_NODE_PTR( IDX ) ( p_manager->m_nodes + IDX ) void mem_delete_node( mem_manager_t* p_manager, u16 node_idx ) { u16 x, y, z; // find successor node, if any z = node_idx; if (( MEM_NODE_PTR( z )->m_smaller_idx == MEM_BAD_IDX ) || ( MEM_NODE_PTR( z )->m_larger_idx == MEM_BAD_IDX )) { y = z; } else { y = MEM_NODE_PTR( z )->m_larger_idx; while( MEM_NODE_PTR( y )->m_smaller_idx != MEM_BAD_IDX ) y = MEM_NODE_PTR( y )->m_smaller_idx; } // x is y's only child if ( MEM_NODE_PTR( y )->m_smaller_idx != MEM_BAD_IDX ) { x = MEM_NODE_PTR( y )->m_smaller_idx; } else { x = MEM_NODE_PTR( y )->m_larger_idx; } // remove y from the parent chain if ( x != MEM_BAD_IDX ) MEM_NODE_PTR( x )->m_parent_idx = MEM_NODE_PTR( y )->m_parent_idx; if ( MEM_NODE_PTR( y )->m_parent_idx != MEM_BAD_IDX ) { u16 py = MEM_NODE_PTR( y )->m_parent_idx; if ( y == MEM_NODE_PTR( py )->m_smaller_idx ) MEM_NODE_PTR( py )->m_smaller_idx = x; else MEM_NODE_PTR( py )->m_larger_idx = x; } else { p_manager->m_free_root = x; } // if z and y are not the same, replace z with y if ( y != z ) { mem_node_t* y_node = MEM_NODE_PTR( y ); mem_node_t* z_node = MEM_NODE_PTR( z ); y_node->m_smaller_idx = z_node->m_smaller_idx; if ( y_node->m_smaller_idx != MEM_BAD_IDX ) MEM_NODE_PTR( y_node->m_smaller_idx )->m_parent_idx = y; y_node->m_larger_idx = z_node->m_larger_idx; if ( y_node->m_larger_idx != MEM_BAD_IDX ) MEM_NODE_PTR( y_node->m_larger_idx )->m_parent_idx = y; y_node->m_parent_idx = z_node->m_parent_idx; if ( z_node->m_parent_idx != MEM_BAD_IDX ) { mem_node_t* p_z = MEM_NODE_PTR( z_node->m_parent_idx ); if ( z == p_z->m_smaller_idx ) p_z->m_smaller_idx = y; else p_z->m_larger_idx = y; } else { p_manager->m_free_root = y; } } } void mem_insert_node( mem_manager_t* p_manager, u16 node_idx ) { // we assume this node's size is set correctly. // we will set all the other header values. mem_node_t* p_node = MEM_NODE_PTR( node_idx ); u32 current, parent; u32 size = p_node->m_size; // find the parent current = p_manager->m_free_root; parent = MEM_BAD_IDX; while ( current != MEM_BAD_IDX ) { mem_node_t* cur_node = MEM_NODE_PTR( current ); parent = current; current = (( size < cur_node->m_size ) ? cur_node->m_smaller_idx : cur_node->m_larger_idx ); } // link the new node to the parent p_node->m_parent_idx = parent; p_node->m_smaller_idx = MEM_BAD_IDX; p_node->m_larger_idx = MEM_BAD_IDX; // link the parent to the new node if ( parent != MEM_BAD_IDX ) { mem_node_t* parent_node = MEM_NODE_PTR( parent ); if ( size < parent_node->m_size ) parent_node->m_smaller_idx = node_idx; else parent_node->m_larger_idx = node_idx; } else { p_manager->m_free_root = node_idx; } } void mem_alloc_map_set_used( mem_manager_t* p_manager, u32 best_idx, u32 size ) { u32 byte_idx = ( best_idx >> 2 ); u32 bit_idx = best_idx & 0x3; u32 i, bytes_left; u8 byte, value; // set the first bits to MEM_USED byte = p_manager->m_alloc_map[byte_idx]; value = ( MEM_USED << ( bit_idx << 1 )); while (( bit_idx < 4 ) && ( size != 0 )) { byte |= value; bit_idx++; size--; value <<= 2; } if ( size == 0 ) goto final_byte; // put the byte back into the alloc map. p_manager->m_alloc_map[byte_idx++] = byte; // copy all the complete bytes over bytes_left = ( size >> 2 ); size &= 0x3; for ( i = 0; i < bytes_left; i++ ) p_manager->m_alloc_map[byte_idx++] = 0x55; if ( size == 0 ) { // set the last map indicator to MEM_END p_manager->m_alloc_map[byte_idx-1] |= ( 0x2 << 6 ); return; } // set the last bits in the final byte to MEM_USED bit_idx = size; value = MEM_USED; byte = p_manager->m_alloc_map[byte_idx]; while( size ) { byte |= value; size--; value <<= 2; } final_byte: // on the final byte clear one bit to indicate the end of a run of allocated // blocks ( MEM_END ). bit_idx--; byte |= ( 0x2 << ( bit_idx << 1 )); p_manager->m_alloc_map[byte_idx] = byte; } // returns the size of the buffer being freed u32 mem_alloc_map_set_unused( mem_manager_t* p_manager, u32 best_idx ) { u32 byte_idx = ( best_idx >> 2 ); u32 bit_idx = best_idx & 0x3; u8 byte, value1, value2; u32 size = 0; // set the first bits to MEM_UNUSED scan_byte: byte = p_manager->m_alloc_map[byte_idx]; value1 = ( 0x1 << ( bit_idx << 1 )); value2 = ( 0x2 << ( bit_idx << 1 )); while ( bit_idx < 4 ) { // clear the first bit in the set. byte &= ~value1; size++; bit_idx++; // if the second bit is on then this is the end of the run. if ( byte & value2 ) { // clear the second bit, store the modified byte and return size; byte &= ~value2; p_manager->m_alloc_map[byte_idx] = byte; return size; } value1 <<= 2; value2 <<= 2; } p_manager->m_alloc_map[byte_idx++] = byte; // while whole bytes are marked used just scan through them fast. while ( p_manager->m_alloc_map[byte_idx] == 0x55 ) { size += 4; p_manager->m_alloc_map[byte_idx++] = MEM_UNUSED; } // scan the last byte. We should only jump to the top once. bit_idx = 0; goto scan_byte; } bool mem_pointer_is_valid( void* pool_memory, void* buffer ) { mem_manager_t* p_manager = ( mem_manager_t*)pool_memory; if (((u8*)buffer) < ((u8*)p_manager->m_nodes )) return FALSE; if ((((u32)buffer) & 0x7 ) != 0 ) return FALSE; if (((u8*)buffer) >= ((u8*)p_manager->m_nodes) + ( p_manager->m_num_nodes << 3 )) return FALSE; return TRUE; } void* mem_alloc( void* pool_memory, u32 size ) { // interpret pool_memory as a mem_manager_t. mem_manager_t* p_manager = ( mem_manager_t*)pool_memory; u32 idx = p_manager->m_free_root; u32 best_idx = MEM_BAD_IDX; mem_node_t* p_node; if ( size == 0 ) return NULL; // convert the desired size into the nearest block size size = (( size + 7 ) >> 3 ); // find a node that is the right size while ( idx != MEM_BAD_IDX ) { p_node = p_manager->m_nodes + idx; if ( p_node->m_size == size ) { // found one exactly the right size. my search is done. best_idx = idx; goto found_node; } else if ( p_node->m_size > size ) { // found one that is big enough. best_idx = idx; idx = p_node->m_smaller_idx; } else { idx = p_node->m_larger_idx; } } // found no appropriate nodes. return ASSERT( best_idx != MEM_BAD_IDX ); if ( best_idx == MEM_BAD_IDX ) return NULL; found_node: // best_idx points to the best node. // unlink the current node mem_delete_node( p_manager, best_idx ); p_node = MEM_NODE_PTR( best_idx ); if ( size != p_node->m_size ) { // there is some memory left over to return to the allocator mem_node_t* new_node = MEM_NODE_PTR( best_idx + size ); new_node->m_size = ( p_node->m_size - size ); new_node = MEM_NODE_PTR( best_idx + ( p_node->m_size - 1 )); new_node->m_size = ( p_node->m_size - size ); // relink it to a new place mem_insert_node( p_manager, best_idx + size ); } // update the alloc map from best_idx to best_idx + size mem_alloc_map_set_used( p_manager, best_idx, size ); // assert that the pointer returned is a valid pointer ASSERT( mem_pointer_is_valid( pool_memory, p_node )); return p_node; } void mem_free( void* pool_memory, void* buffer ) { mem_manager_t* p_manager; u32 node_idx; u32 size; u32 below_idx, above_idx; u32 byte_idx; u32 bit_idx; u8 byte, value; if ( buffer == NULL ) return; // make sure the pointer passed in is even a possible free pointer ASSERT( mem_pointer_is_valid( pool_memory, buffer )); p_manager = ( mem_manager_t*)pool_memory; node_idx = ((((u32)buffer) - ((u32)p_manager->m_nodes)) >> 3 ); size = mem_alloc_map_set_unused( p_manager, node_idx ); // if we have an unused region below us... merge with that. // check if the 2 bit flag directly after us is unused. below_idx = node_idx + size; // don't try a merge below if we already are the last chunk in the pool if ( below_idx >= p_manager->m_num_nodes ) goto merge_above; byte_idx = ( below_idx >> 2 ); bit_idx = below_idx & 0x3; byte = p_manager->m_alloc_map[byte_idx]; value = ( 0x3 << ( bit_idx << 1 )); if (( byte & value ) == 0 ) { // it is, so read the appropriate block entry and merge. size += MEM_NODE_PTR( below_idx )->m_size; mem_delete_node( p_manager, below_idx ); } merge_above: // if we have an unused region above us... merge with that. // if we are the first block then we can't merge above if ( node_idx == 0 ) goto create_node; // first just check if the flag directly before us is unused above_idx = node_idx - 1; byte_idx = ( above_idx >> 2 ); bit_idx = above_idx & 0x3; byte = p_manager->m_alloc_map[byte_idx]; value = ( 0x3 << ( bit_idx << 1 )); if (( byte & value ) == 0 ) { // ok it is unused, so read the appropriate block entry and merge. u32 above_size = MEM_NODE_PTR( above_idx )->m_size; size += above_size; node_idx -= above_size; mem_delete_node( p_manager, node_idx ); } create_node: // create the new node MEM_NODE_PTR( node_idx )->m_size = size; MEM_NODE_PTR( node_idx + ( size - 1 ))->m_size = size; mem_insert_node( p_manager, node_idx ); } void mem_stats( void* pool_memory, u32* amount_free, u32* largest_block, u32* num_free_blocks ) { // interpret pool_memory as a mem_manager_t. mem_manager_t* p_manager = ( mem_manager_t*)pool_memory; u32 idx = p_manager->m_free_root; mem_node_t* p_node; u32 stack[0xFF]; u32 stack_idx = 1; stack[0] = MEM_BAD_IDX; *amount_free = 0; *largest_block = 0; *num_free_blocks = 0; while ( idx != MEM_BAD_IDX ) { // get the node pointer p_node = p_manager->m_nodes + idx; // add this node to the stats (*num_free_blocks)++; (*amount_free) += p_node->m_size; if ((*largest_block) < p_node->m_size ) (*largest_block) = p_node->m_size; // iterate to the next node. if ( p_node->m_larger_idx != MEM_BAD_IDX ) stack[stack_idx++] = p_node->m_larger_idx; if ( p_node->m_smaller_idx != MEM_BAD_IDX ) idx = p_node->m_smaller_idx; else idx = stack[--stack_idx]; } // computed data was in blocks... translate it to bytes (*amount_free) = ((*amount_free ) << 3 ); (*largest_block) = ((*largest_block) << 3 ); }