On any computer system with a small amount of main memory and no swap device
you need to be extremely careful
about how you allocate main system memory. For most data structures in
a game you want to have them allocated statically. This way you don't have
to worry about memory allocations failing during run-time. For some game systems
however it really helps to have small pools of dynamically allocated memory.
These pools can be quite small. For example for some of my GBA games I use a tiny buffer of 4k as
scratch memory for the AI routines of all the enemies that are on the screen.
For these small general-purpose memory
pools you cannot make too many assumptions about how the memory will be used.
You want an allocator that will quickly give you memory chunks of any desired
size. Because the pools are so small it is important that the memory overhead
of dynamically managing these pools be very small as well. We present here a
general-purpose memory manager that uses just 3% of the managed buffer for memory
management overhead. This allocator must of course also coalesce adjacent
free'd buffers and must choose the areas it allocates from carefully, to
minimize dead space between allocated blocks (otherwise what is the point of
getting the management overhead down to 3 percent?) It can manage buffers as
small as 32 bytes and as large as 512k.
In the version we present here the
memory manager manages the memory using 8 byte memory blocks. Of course you can
use memory blocks of any size you want. Raising the block size will decrease
the percentage of memory used for overhead, but will increase the amount of
dead memory that cannot be allocated when the user asks for memory regions that
are not multiples of the block size. Decreasing the block size does the
reverse. So for a block size of 16 the overhead of the memory management data
structures falls to 1.5%. The amount of "dead memory" will depend on
the user's allocation pattern. When implementing a system like this you will
want to consider the block size carefully.
The trick to making a memory manager
with low memory overhead is to store as much of the heap management data
structures as possible in the unallocated memory blocks. In our memory manager
we use two basic data structures, a memory map that indicates which blocks are
allocated, and an ordered binary tree that keeps track of all the free blocks.
The memory map is stored in the beginning of the allocated buffer and is not
available for allocation, but the binary tree, is stored in the free blocks
themselves, and so costs us nothing.
For each memory block in the buffer we
need to store two pieces of information in the memory map:
- Is this block allocated or free?
- And if it is allocated is this the last
block in the allocated region?
To store these two pieces of
information in the memory map we use 2 bits per block. So we can manage up to
32 bytes of data per byte of memory map. (This is where the 3% memory overhead
comes from.)
For free blocks we don't need to store
any information in the memory map. Once we know its free we know that we can
find out all we want to know about it from the binary tree node that is stored
inside the free block itself.
The memory manager also has a small
8-byte header that contains the total number of blocks being managed, an index
into the root of the binary tree of free blocks, and a pointer to the first
managed node.
The memory map follows the header,
which is of the right size to store two bits of information per node. Finally
each free block may contain a node for the binary tree. The tree nodes hold a
pointer to the parent node, a pointer to all that node's smaller children, a
pointer to all of the node's larger children and the size of the node. Each
free node structure is 8-bytes, just fitting into the block.
The manager initializes all the bits in
the memory map to free, with the binary tree holding a single node whose size
is as big as all of the allocatable memory in the buffer.
To allocate a block you round the
memory requested to the next block size, and then you walk the binary tree
searching for the smallest available block that is large enough to allocate the
required memory. Once an appropriate tree node is found, we shrink the node by
the requested size. If no memory is left over the node is removed from the
tree, if there is memory left over the node is resized and re-linked to the
appropriate place in the tree. Finally you need to update the memory map to
flag all of the allocated blocks as used, and you need to set the last block
bit for the last block in the allocated region.
To free a block you find its area in
the memory map, which you can do arithmetically from the pointer passed in, and
start clearing allocated bits until you find a map entry that is labeled as the
last block in the region. Then you check in the memory map to see if the areas
adjacent to this block are free. If they are you merge the free'd block with
its neighbors.
For a reasonably well-balanced tree the
allocation operation will average m + log(n) time where n is the number
of free nodes, and m is the size of the block. The free operation is linear
with the size of the block. Our current implementation uses just a sorted
binary tree. We do not do any automatic rebalancing, so it is possible, though
unlikely, for the tree to become extremely unbalanced and slow the allocator
down significantly when there are large numbers of free nodes. If this becomes
a problem it is straightforward to implement a rebalancing routine like red-black.
We have not found it to be necessary.
Below is some example code for a general purpose memory manager.
// the odd bit indicates used or unused. For used sections the even bit
// indicates the end of a used section.
#define MEM_UNUSED 0x0
#define MEM_USED 0x1
#define MEM_END 0x3
// each free region of memory in the memory pool starts with this structure
// it is a binary tree of free nodes, with three pointers, one pointer to nodes
// smaller than itself, another to nodes larger and finally one pointer to
// parent nodes. If the heap gets very fractured *and* the tree becomes very
// unbalanced this could be slow.
typedef struct
{
u16 m_smaller_idx;
u16 m_larger_idx;
u16 m_parent_idx;
u16 m_size;
} mem_node_t;
#define MEM_BAD_IDX 0xFFFF
// this structure is 8 bytes. or one alloc block.
// it is stored at the top of the memory pool buffer
// to keep track of what memory has been allocated
typedef struct
{
u16 m_num_nodes;
u16 m_free_root;
mem_node_t* m_nodes;
u8 m_alloc_map[0]; // tricky tricky
} mem_manager_t;
// memory is managed in blocks of 8 bytes.
void mem_init_pool( void* pool_memory, u32 size )
{
mem_manager_t* p_manager = (mem_manager_t*)pool_memory;
mem_node_t* p_node;
u32 header_size = (( sizeof( mem_manager_t ) + 7 ) >> 3 );
u32 map_size;
u32 num_blocks;
u32 blocks_for_map;
u32 i;
// size is now in 8 byte blocks. If size were not a multiple of 8
// then the remainder is lost... as it should be.
size >>= 3;
// we cannot deal with buffers smaller than 4 blocks.
ASSERT( size > 4 );
// this is the size of the memory pool in blocks.
num_blocks = size - header_size;
// determine the size of the alloc_map. 2 bits are stored in the
// alloc map to keep track of one block. This means each
// byte of alloca_map keeps track of the allocation of 4 blocks.
map_size = (( num_blocks + 3 ) >> 2 );
// we must use some number of blocks to store the alloc map.
// this is the number of blocks we need to store the map.
blocks_for_map = (( map_size + 7 ) >> 3 );
// so now reduce the number of blocks to manage to account
// for the blocks that we had to give to the map.
num_blocks -= blocks_for_map;
// total header size in blocks.
header_size += blocks_for_map;
// initialize the header
p_manager->m_num_nodes = num_blocks;
p_manager->m_free_root = 0;
p_manager->m_nodes = (((mem_node_t*)pool_memory) + header_size );
for ( i = 0; i < map_size; i++ )
p_manager->m_alloc_map[i] = MEM_UNUSED;
// initialize the single node.
p_node = p_manager->m_nodes;
p_node->m_smaller_idx = MEM_BAD_IDX;
p_node->m_larger_idx = MEM_BAD_IDX;
p_node->m_parent_idx = MEM_BAD_IDX;
p_node->m_size = num_blocks;
p_node = p_manager->m_nodes + ( num_blocks - 1 );
p_node->m_size = num_blocks;
}
#define MEM_NODE_PTR( IDX ) ( p_manager->m_nodes + IDX )
void mem_delete_node( mem_manager_t* p_manager, u16 node_idx )
{
u16 x, y, z;
// find successor node, if any
z = node_idx;
if (( MEM_NODE_PTR( z )->m_smaller_idx == MEM_BAD_IDX ) ||
( MEM_NODE_PTR( z )->m_larger_idx == MEM_BAD_IDX ))
{
y = z;
} else {
y = MEM_NODE_PTR( z )->m_larger_idx;
while( MEM_NODE_PTR( y )->m_smaller_idx != MEM_BAD_IDX )
y = MEM_NODE_PTR( y )->m_smaller_idx;
}
// x is y's only child
if ( MEM_NODE_PTR( y )->m_smaller_idx != MEM_BAD_IDX )
{
x = MEM_NODE_PTR( y )->m_smaller_idx;
} else {
x = MEM_NODE_PTR( y )->m_larger_idx;
}
// remove y from the parent chain
if ( x != MEM_BAD_IDX )
MEM_NODE_PTR( x )->m_parent_idx = MEM_NODE_PTR( y )->m_parent_idx;
if ( MEM_NODE_PTR( y )->m_parent_idx != MEM_BAD_IDX )
{
u16 py = MEM_NODE_PTR( y )->m_parent_idx;
if ( y == MEM_NODE_PTR( py )->m_smaller_idx )
MEM_NODE_PTR( py )->m_smaller_idx = x;
else
MEM_NODE_PTR( py )->m_larger_idx = x;
} else {
p_manager->m_free_root = x;
}
// if z and y are not the same, replace z with y
if ( y != z )
{
mem_node_t* y_node = MEM_NODE_PTR( y );
mem_node_t* z_node = MEM_NODE_PTR( z );
y_node->m_smaller_idx = z_node->m_smaller_idx;
if ( y_node->m_smaller_idx != MEM_BAD_IDX )
MEM_NODE_PTR( y_node->m_smaller_idx )->m_parent_idx = y;
y_node->m_larger_idx = z_node->m_larger_idx;
if ( y_node->m_larger_idx != MEM_BAD_IDX )
MEM_NODE_PTR( y_node->m_larger_idx )->m_parent_idx = y;
y_node->m_parent_idx = z_node->m_parent_idx;
if ( z_node->m_parent_idx != MEM_BAD_IDX )
{
mem_node_t* p_z = MEM_NODE_PTR( z_node->m_parent_idx );
if ( z == p_z->m_smaller_idx )
p_z->m_smaller_idx = y;
else
p_z->m_larger_idx = y;
} else {
p_manager->m_free_root = y;
}
}
}
void mem_insert_node( mem_manager_t* p_manager, u16 node_idx )
{
// we assume this node's size is set correctly.
// we will set all the other header values.
mem_node_t* p_node = MEM_NODE_PTR( node_idx );
u32 current, parent;
u32 size = p_node->m_size;
// find the parent
current = p_manager->m_free_root;
parent = MEM_BAD_IDX;
while ( current != MEM_BAD_IDX )
{
mem_node_t* cur_node = MEM_NODE_PTR( current );
parent = current;
current = (( size < cur_node->m_size ) ?
cur_node->m_smaller_idx : cur_node->m_larger_idx );
}
// link the new node to the parent
p_node->m_parent_idx = parent;
p_node->m_smaller_idx = MEM_BAD_IDX;
p_node->m_larger_idx = MEM_BAD_IDX;
// link the parent to the new node
if ( parent != MEM_BAD_IDX )
{
mem_node_t* parent_node = MEM_NODE_PTR( parent );
if ( size < parent_node->m_size )
parent_node->m_smaller_idx = node_idx;
else
parent_node->m_larger_idx = node_idx;
} else {
p_manager->m_free_root = node_idx;
}
}
void mem_alloc_map_set_used( mem_manager_t* p_manager,
u32 best_idx, u32 size )
{
u32 byte_idx = ( best_idx >> 2 );
u32 bit_idx = best_idx & 0x3;
u32 i, bytes_left;
u8 byte, value;
// set the first bits to MEM_USED
byte = p_manager->m_alloc_map[byte_idx];
value = ( MEM_USED << ( bit_idx << 1 ));
while (( bit_idx < 4 ) && ( size != 0 ))
{
byte |= value;
bit_idx++;
size--;
value <<= 2;
}
if ( size == 0 ) goto final_byte;
// put the byte back into the alloc map.
p_manager->m_alloc_map[byte_idx++] = byte;
// copy all the complete bytes over
bytes_left = ( size >> 2 );
size &= 0x3;
for ( i = 0; i < bytes_left; i++ )
p_manager->m_alloc_map[byte_idx++] = 0x55;
if ( size == 0 )
{
// set the last map indicator to MEM_END
p_manager->m_alloc_map[byte_idx-1] |= ( 0x2 << 6 );
return;
}
// set the last bits in the final byte to MEM_USED
bit_idx = size;
value = MEM_USED;
byte = p_manager->m_alloc_map[byte_idx];
while( size )
{
byte |= value;
size--;
value <<= 2;
}
final_byte:
// on the final byte clear one bit to indicate the end of a run of allocated
// blocks ( MEM_END ).
bit_idx--;
byte |= ( 0x2 << ( bit_idx << 1 ));
p_manager->m_alloc_map[byte_idx] = byte;
}
// returns the size of the buffer being freed
u32 mem_alloc_map_set_unused( mem_manager_t* p_manager, u32 best_idx )
{
u32 byte_idx = ( best_idx >> 2 );
u32 bit_idx = best_idx & 0x3;
u8 byte, value1, value2;
u32 size = 0;
// set the first bits to MEM_UNUSED
scan_byte:
byte = p_manager->m_alloc_map[byte_idx];
value1 = ( 0x1 << ( bit_idx << 1 ));
value2 = ( 0x2 << ( bit_idx << 1 ));
while ( bit_idx < 4 )
{
// clear the first bit in the set.
byte &= ~value1;
size++;
bit_idx++;
// if the second bit is on then this is the end of the run.
if ( byte & value2 )
{
// clear the second bit, store the modified byte and return size;
byte &= ~value2;
p_manager->m_alloc_map[byte_idx] = byte;
return size;
}
value1 <<= 2;
value2 <<= 2;
}
p_manager->m_alloc_map[byte_idx++] = byte;
// while whole bytes are marked used just scan through them fast.
while ( p_manager->m_alloc_map[byte_idx] == 0x55 )
{
size += 4;
p_manager->m_alloc_map[byte_idx++] = MEM_UNUSED;
}
// scan the last byte. We should only jump to the top once.
bit_idx = 0;
goto scan_byte;
}
bool mem_pointer_is_valid( void* pool_memory, void* buffer )
{
mem_manager_t* p_manager = ( mem_manager_t*)pool_memory;
if (((u8*)buffer) < ((u8*)p_manager->m_nodes ))
return FALSE;
if ((((u32)buffer) & 0x7 ) != 0 )
return FALSE;
if (((u8*)buffer) >=
((u8*)p_manager->m_nodes) +
( p_manager->m_num_nodes << 3 ))
return FALSE;
return TRUE;
}
void* mem_alloc( void* pool_memory, u32 size )
{
// interpret pool_memory as a mem_manager_t.
mem_manager_t* p_manager = ( mem_manager_t*)pool_memory;
u32 idx = p_manager->m_free_root;
u32 best_idx = MEM_BAD_IDX;
mem_node_t* p_node;
if ( size == 0 ) return NULL;
// convert the desired size into the nearest block size
size = (( size + 7 ) >> 3 );
// find a node that is the right size
while ( idx != MEM_BAD_IDX )
{
p_node = p_manager->m_nodes + idx;
if ( p_node->m_size == size )
{
// found one exactly the right size. my search is done.
best_idx = idx;
goto found_node;
} else if ( p_node->m_size > size ) {
// found one that is big enough.
best_idx = idx;
idx = p_node->m_smaller_idx;
} else {
idx = p_node->m_larger_idx;
}
}
// found no appropriate nodes. return
ASSERT( best_idx != MEM_BAD_IDX );
if ( best_idx == MEM_BAD_IDX ) return NULL;
found_node:
// best_idx points to the best node.
// unlink the current node
mem_delete_node( p_manager, best_idx );
p_node = MEM_NODE_PTR( best_idx );
if ( size != p_node->m_size )
{
// there is some memory left over to return to the allocator
mem_node_t* new_node = MEM_NODE_PTR( best_idx + size );
new_node->m_size = ( p_node->m_size - size );
new_node = MEM_NODE_PTR( best_idx + ( p_node->m_size - 1 ));
new_node->m_size = ( p_node->m_size - size );
// relink it to a new place
mem_insert_node( p_manager, best_idx + size );
}
// update the alloc map from best_idx to best_idx + size
mem_alloc_map_set_used( p_manager, best_idx, size );
// assert that the pointer returned is a valid pointer
ASSERT( mem_pointer_is_valid( pool_memory, p_node ));
return p_node;
}
void mem_free( void* pool_memory, void* buffer )
{
mem_manager_t* p_manager;
u32 node_idx;
u32 size;
u32 below_idx, above_idx;
u32 byte_idx;
u32 bit_idx;
u8 byte, value;
if ( buffer == NULL ) return;
// make sure the pointer passed in is even a possible free pointer
ASSERT( mem_pointer_is_valid( pool_memory, buffer ));
p_manager = ( mem_manager_t*)pool_memory;
node_idx = ((((u32)buffer) - ((u32)p_manager->m_nodes)) >> 3 );
size = mem_alloc_map_set_unused( p_manager, node_idx );
// if we have an unused region below us... merge with that.
// check if the 2 bit flag directly after us is unused.
below_idx = node_idx + size;
// don't try a merge below if we already are the last chunk in the pool
if ( below_idx >= p_manager->m_num_nodes ) goto merge_above;
byte_idx = ( below_idx >> 2 );
bit_idx = below_idx & 0x3;
byte = p_manager->m_alloc_map[byte_idx];
value = ( 0x3 << ( bit_idx << 1 ));
if (( byte & value ) == 0 )
{
// it is, so read the appropriate block entry and merge.
size += MEM_NODE_PTR( below_idx )->m_size;
mem_delete_node( p_manager, below_idx );
}
merge_above:
// if we have an unused region above us... merge with that.
// if we are the first block then we can't merge above
if ( node_idx == 0 ) goto create_node;
// first just check if the flag directly before us is unused
above_idx = node_idx - 1;
byte_idx = ( above_idx >> 2 );
bit_idx = above_idx & 0x3;
byte = p_manager->m_alloc_map[byte_idx];
value = ( 0x3 << ( bit_idx << 1 ));
if (( byte & value ) == 0 )
{
// ok it is unused, so read the appropriate block entry and merge.
u32 above_size = MEM_NODE_PTR( above_idx )->m_size;
size += above_size;
node_idx -= above_size;
mem_delete_node( p_manager, node_idx );
}
create_node:
// create the new node
MEM_NODE_PTR( node_idx )->m_size = size;
MEM_NODE_PTR( node_idx + ( size - 1 ))->m_size = size;
mem_insert_node( p_manager, node_idx );
}
void mem_stats( void* pool_memory, u32* amount_free, u32* largest_block,
u32* num_free_blocks )
{
// interpret pool_memory as a mem_manager_t.
mem_manager_t* p_manager = ( mem_manager_t*)pool_memory;
u32 idx = p_manager->m_free_root;
mem_node_t* p_node;
u32 stack[0xFF];
u32 stack_idx = 1;
stack[0] = MEM_BAD_IDX;
*amount_free = 0;
*largest_block = 0;
*num_free_blocks = 0;
while ( idx != MEM_BAD_IDX )
{
// get the node pointer
p_node = p_manager->m_nodes + idx;
// add this node to the stats
(*num_free_blocks)++;
(*amount_free) += p_node->m_size;
if ((*largest_block) < p_node->m_size )
(*largest_block) = p_node->m_size;
// iterate to the next node.
if ( p_node->m_larger_idx != MEM_BAD_IDX )
stack[stack_idx++] = p_node->m_larger_idx;
if ( p_node->m_smaller_idx != MEM_BAD_IDX )
idx = p_node->m_smaller_idx;
else
idx = stack[--stack_idx];
}
// computed data was in blocks... translate it to bytes
(*amount_free) = ((*amount_free ) << 3 );
(*largest_block) = ((*largest_block) << 3 );
}