Memory Management

Memory managers (MMs) can be added to programs or systems to improve on the memory management scheme implemented by the target OS. Typical improvements include memory access and allocation speed, combating memory fragmentation, and sometimes even both. The implementation of dedicated MMs can range from a simple indirection scheme such as a suballocator that resides between program and OS, to full-blown applications that completely take over the OS role in memory management. This section discusses theories and techniques that can be used by MM software.

Suballocators

Suballocators are MMs that are placed between a piece of software and the operating system. Suballocators take their memory from, and (eventually) return their memory to, the OS. In between they perform MM tasks for the software.

Quick and Dirty MM

The simplest form of a suballocator MM is one that actually only provides memory. The following example demonstrates this:

// DIRTY MM
unsigned char membuf[512000];
unsigned char *memptr = (unsigned char *)membuf;

inline void *DirtyMalloc(long size)
{
    void *p = (void *)memptr;
    memptr += size;
    return(p);
}

When an application requests memory from this providing function, a slice of a preallocated block is returned. Although not a memory manager to take seriously, this DirtyMalloc does have some interesting characteristics: It is incredibly fast (MMs do not come any faster than this) and it will not give you any fragmentation. The downside to this DirtyMalloc is that memory is not actually managed at all; instead, only the whole initially claimed block can be freed at once (on program termination). But still there is some use for this kind of DirtyMalloc. Imagine you have to build a gigantic list of tiny blocks, sort it, print it, and then exit. In that case, you do not actually need a full memory manager as all the memory is used only once and can be thrown away immediately after usage. This code will be several times faster than a nicely implemented program that uses dynamic allocators such as malloc or new.

The Freelist MM

Dynamic memory is memory that your program can allocate and release at any time while it is running. It is allocated through the OS from a portion of the system memory resources called the heap. To obtain dynamic memory, your program has to invoke OS calls. Then, when that memory is no longer needed, it should be released. This is done again through OS calls. When your program frequently uses dynamic memory allocation, slowdown will be incurred through the use of those OS calls. This is why it is interesting to consider not releasing all memory back to the OS immediately, but rather keeping some or all of it on hand within your program for later use. A list, which we will refer to as the freelist, will keep track of the free memory blocks. This technique is particularly effective when memory blocks of the same size are allocated and released in quick succession. Think of the use of structures as elements of a list or array. Memory that was used to house a deleted list element can be used again when a new list element needs to be created. Thistime, however, no OS calls are needed.

Listing 9.1 presents the functionality you need to create a freelist. The functionality is presented in the form of a base class so it can be explained in a small separate listing and used in further listings in this section. The freelist is not exactly a normal base class though. Later in this section you will see why.

Code Listing 9.1. Freelist Basic Functionality
#include <stdio.h>
#include <stdlib.h>

//Base class definition.
class FreeListBase
{
     static FreeListBase* freelist;       // pointer to free list

     FreeListBase* next;

public:
    FreeListBase()  { } ;
    ~FreeListBase() { } ;

    inline void* operator new(size_t);  // overloaded new()
    inline void operator delete(void*);    // overloaded delete()
} ;

inline void* FreeListBase::operator new(size_t sz)
{
  if (freelist)
  {
         // get memory from the freelist if it is not empty.
         FreeListBase* p = freelist;
         freelist = freelist->next;
         return p;
  }
  return malloc(sz);  // call malloc() otherwise
}

inline void FreeListBase::operator delete(void* vp)
{

    FreeListBase* p = (FreeListBase*)vp;
    // link released memory into the freelist.
    p->next = freelist;
    freelist = p;
}
// Set the freelist pointer to NULL.
FreeListBase* FreeListBase::freelist = NULL;

The class FreeListBase overloads two important C++ operators, namely new and delete. Because FreeListBase has its own implementation of these operators, the standard new and delete operators are not invoked when you create or destroy FreeListBase objects. So, for example, when you do the following:

FreeListBase *pointerToAFreeList = new FreeListBase;
delete pointerToAFreeListBase;

the new and delete implementations of FreeListBase itself are invoked. You can easily test this by single stepping through this sample with a debugger. This setup provides you with a hook to do whatever you want at the moment a new instance of your class is created. The FreeListBase takes this opportunity to do two things:

  • When a FreeListBase object (or an object derived from a FreeListBase) is deleted, a pointer to its memory address is placed at the front of the freelist. The pointer freelist always points to the first piece of released memory in the freelist. A new piece of memory that is added to the list will have its next pointer pointing to the object that was previously at the front of the list.

  • When a new FreeListBase object is created, the freelist is consulted. If the freelist is not empty, a pointer to the piece of memory at the front of the freelist is returned. The pointer freelist is set to point at the next piece of memory in the list. If, however, the freelist is empty, memory is allocated via the OS through the use of the OS call malloc. The reason you will want to use malloc here is that you need a piece of memory of a certain size (as provided by the parameter given in the call to new). This way any class derived from FreeListBase (which will have a different size) will still receive the correct amount of memory. In fact, the standard new operator itself also uses malloc functionality.

For this to work, however, the freelist pointer has to be set to NULL at some time before it is used, to indicate that the list is still empty. As the same freelist pointer should be accessible for all—future—instances of FreeListBase, it is a static member of the FreeListBase class. It therefore needs to be set globally—that is, outside the member functions of the class. This is what is done in the last line of Listing 9.1.

FreeListBase* FreeListBase::freelist = NULL;

Note that the constructor of FreeListBase is still called when a new FreeListBase object is created. You cannot use the constructor to implement the freelist functionality, however, as it is invoked after memory allocation for the new object has taken place.

Now let's start using FreeListBase and compare its speed to that of using only conventional OS calls.

Listing 9.2 shows two classes with which it is possible to store information on different temperatures. One class uses the FreeListBase as a base class; the other does not.

Code Listing 9.2. Using the FreeListBase
class TemperatureUsingFreelist : public FreeListBase
{
public:
    int     ID;
    int     maxTemp;
    int     minTemp;
    int     currentTemp;

    int average() { return (maxTemp + minTemp)/2;}
} ;

class TemperatureNotUsingFreelist
{
public:
    int     ID;
    int     maxTemp;
    int     minTemp;
    int     currentTemp;

    int average() { return (maxTemp + minTemp)/2;}
} ;

When an instance of TemperatureUsingFreeList is deleted, its freelist will remember the memory address of the instance and save it for future use. When an instance of TemperatureNotUsingFreeList is deleted, however, its memory is returned to the OS. A newly created TemperatureNotUsingFreeList instance will therefore again claim memory via the OS call new.

You can time both techniques (freelist memory management versus OS memory management) with two loops. One loop repeatedly creates and destroys TemperatureUsingFreeList instances, the other loop repeatedly creates and destroys TemperatureNotUsingFreeList . You can use these loops in the timing setup introduced in Chapter 5, "Measuring Time and Complexity." Here is an example loop:

for (int i = 1 ; i < 1000000; i++)
{
   TemperatureUsingFreelist*  t1;     // or TemperatureNotUsingFreelist

   t1 = new TemperatureUsingFreelist; // or TemperatureNotUsingFreelist
   delete t1;
   t1 = new TemperatureUsingFreelist; // or TemperatureNotUsingFreelist
   delete t1;
}

Although the exact timing results will differ per system, the relations between timing the above loop with TemperatureUsingFreeList instances and timing the above loop with TemperatureNotUsingFreeList instances should look something like this:

Using freelist      160
Not using freelist  990

This shows that using a freelist is significantly faster when memory is actively reused. Timing results will be equal for both classes when no instances are deleted within the timing loop as the freelist is not used in that case.

As you know, the freelist functionality was placed in a base class to explain its functionality clearly in a separate listing. When you use the freelist technique in practice, however, it is best to incorporate its functionality directly into the class that uses it. This is important as the FreeListBase class contains a static pointer to the available memory blocks. This same pointer would be used by all classes derived from FreeListBase . When you inherit from the FreeListBase class, the freelist can at some point contain blocks of different sizes, placed there by different classes. Clearly this is undesirable as the new operator simply returns the first available block, regardless of its size.

When you do decide you want to use a general freelist that is shared between different classes, simply augment the FreeListBase with a size parameter (which must be set during the delete operator) and a size check (which must be done in the new operator). Note, however, that the extra set and check will add some overhead to your FreeListBase . This means you have a tradeoff between speed (using a freelist per class is fastest) and footprint (using a shared free-list makes the code smaller).

Similarly, you could decide to expand the delete operator to be more intelligent. It could, for example, start to release memory back to the OS when the list reaches a certain size.

An example of a freelist can be found in the companion file 09Source01.cpp.

Simple Stack Memory Management

This section shows how a simple-but-effective MM scheme can be integrated into a class which provides stack functionality. Listing 9.3 shows a simple implementation of a stack class that relies on the OS for memory management.

Code Listing 9.3. Stack Without Dedicated Memory Management
#define MAXSIZE 100000

class Stack
{
 struct elem
 {
  elem  *previous;
  char  *name;
  int   size;
  int   id;
 } ;


public:
    Stack() {  last = NULL; totalSize = 0;}

    // store { name,id}
    void push(const char *s, const int nr);

    // retrieve { name,id}
    void pop(char *s, int &nr);

private:
    elem    *last;
    int     totalSize;
} ;


inline void Stack::push(const char *s, const int nr)
{  // add new item to the top of the stack

    int newSize = strlen(s) + 1;
    if (totalSize + newSize > MAXSIZE)
    {
        cout << "Error, Stack Overflow!!" << endl;
    }
    else
    {

        elem *newElem = new elem;
        newElem->name = new char [newSize];

        strcpy(newElem->name, s);
        newElem->id         = nr;
        newElem->previous   = last;
        newElem->size       = newSize;

        last                = newElem;
        totalSize           += newSize;
    }
}

inline void Stack::pop(char *s, int &nr)
{  // return item from the top of the stack and free it
    if (last != NULL)
    {
        elem *popped = last;

        strcpy(s, popped->name);
        nr  = popped->id;
        last    = popped->previous;

        totalSize -= popped->size;
        delete [] popped->name;
        delete popped;

    }
    else
    {
        cout << "Error, Stack Underflow!!" << endl;
    }
}

This stack class can be used for storing strings and corresponding IDs. Storage is done in a last-in first-out (LIFO) fashion. Usage can be as follows:

Stack            q;
char             name[NAME_SIZE];
unsigned long    id;

q.push("Myname", 123);  // Push name + ID on stack
q.pop(name, id);        // Pop name + ID from the stack.

When this class is used intensely, memory will slowly start to fragment as small blocks of memory are allocated and released repeatedly. Even though it might seem that the memory blocks used by this stack are found in a continuous block of memory because the class behaves as a stack, this is not the case. Typically, different parts of a program will call this kind of storage class at different times. In between, other classes will also use memory resources, which makes it extremely unlikely that allocated memory will be found in a continuous block.

BigChunkStack MM

As the fragmentation caused by the stack class of the previous section is a direct result of the dynamic nature in which it deals with memory requests, the simplest solution for combating fragmentation in this case is to allocate a large chunk of memory immediately and divide this according to incoming requests. Listing 9.4 shows a class that does just that.

Code Listing 9.4. Stack with Dedicated Memory Management
#include <stdio.h>
#include <string.h>
#include <iostream.h>

#define MAXSIZE 100000


class BigChunkStack
{
 struct elem
 {
  int   id;
  int   previousElemSize;
  int   nameSize;
  char  name;
 } ;


public:
    BigChunkStack()
    {  totalSize = 0;
      emptyElemSize = sizeof(elem); lastElemSize = 0;}

    // store { name,id}
    void push(const char *s, const int nr);

    // retrieve { name,id}
    void pop(char *s, int &nr);

private:
    char    pool[MAXSIZE];
    int     totalSize;
    int     emptyElemSize;

    int     lastElemSize;
} ;


inline void BigChunkStack::push(const char *s, const int nr)
{  // add new item to the top of the stack

    int newStringSize   = strlen(s) + 1;
    int totalElemSize   = newStringSize + emptyElemSize;

    if (totalSize + totalElemSize > MAXSIZE)
    {
        cout << "Error, Stack Overflow!!" << endl;
    }
    else
    {

        elem *newElem       = (elem*)  (pool + totalSize);

        newElem->id         = nr;
        newElem->nameSize   = newStringSize;
        newElem->previousElemSize = lastElemSize;
        strcpy(&newElem->name, s);

        lastElemSize = totalElemSize;
        totalSize  += totalElemSize;
    }
}


inline void BigChunkStack::pop(char *s, int &nr)
{  // return item from the top of the stack and free it
    if (totalSize != 0)
    {
        totalSize       -= lastElemSize;
        elem *popped    = (elem*) (pool + totalSize);
        lastElemSize    = popped->previousElemSize;

        strcpy(s, &popped->name);
        nr      = popped->id;

    }
    else
    {
        cout << "Error, Stack Underflow!!"  << endl;
    }
}

The BigChunkStack class can be used in exactly the same manner as the Stack class in Listing 9.3. This means that for the user of the class, the dedicated memory management scheme of BigChunkStack is completely hidden, which is desirable. The dynamic behavior of the BigChunkStack is very different from that of the Stack class. At creation time, an instance of BigChunkStack will contain a block of memory which it calls the pool and which has the size MAXSIZE. This memory pool will be used to contain stack elements that are pushed onto the BigChunkStack. The pool will not fragment and, when an instance of BigChunkStack is destroyed, the pool will be returned to the OS in one piece. At first glance this technique might seem somewhat wasteful because of the large piece of memory that is in constant use. You should keep in mind, however, that the fragmented memory caused by the use of the Stack class will not be very useful after a while in any case. When a program has a runtime requirement that it should always be able to place a certain number of elements onto a stack, the usage of BigChunkStack will help guarantee this. This is especially interesting when a stack has a life span equal to, or close to, that of the program using it. This kind of MM system even makes it possible to calculate beforehand the minimal amount of memory that will be needed for a certain program to run correctly for an indefinite period of time.

A downside to the BigChunkStack is that the pool size must be known at compile time, or at least, with some minor changes to the code, at the time the BigChunkStack is created. For certain system requirements, this need not be a problem though, and it might even be advantageous. Later in this chapter, in the section "Resizable Data Structures," you will see how to make these kinds of implementations more dynamic. Chapter 12, "Optimizing IO" contains examples of combining allocation of large memory blocks with maximizing IO throughput. This chapter focuses on more MM theory.

Memory Management Theory

Volumes have been written on theories of memory management. The proposed techniques range from subtle variations on a theme to implementations with completely different underlying concepts. As this book does not have memory management as its main topic but rather looks at it from the viewpoint of performance and footprint, this section discusses some of the main memory management concepts from a practical angle.

When creating an MM, design decisions need to be made concerning the following functional areas:

  • How to acquire and store initial memory to be managed

  • How to select memory for allocation requests

  • How to sort/merge released memory

How to Acquire and Store Initial Memory to Be Managed

In this chapter you have already seen two ways of storing memory to be managed by an MM: the freelist example used a dynamic list of free memory blocks which could be used to accommodate memory requests and the BigChunkStack used a large block of memory and broke it into smaller pieces to accommodate memory requests. Many variations on these concepts are of course possible. A combination of both concepts seems to be an ideal starting point for your MM. But whether your MM allocates all the memory it will use at the time of its initialization or claims more and more memory from the OS as it is needed, your MM will somehow have to keep track of the blocks of free memory that it can use. The remainder of this chapter refers to this list of blocks of free memory as the MM's freelist.

More will be said about memory block size considerations in the following section as this is really a topic in its own right.

How to Select Memory for Allocation Requests

After choosing schemes for storing free memory blocks and claiming initial memory for the manager, you are ready to decide how your MM will respond to memory allocation requests. When memory is requested from your MM, it will have to choose which block to use from those it manages. Here are someblock selection methods you might find interesting.

Best Fit

The Best Fit method entails searching for the smallest free block that is large enough to accommodate a certain memory request. The following example demonstrates the Best Fit method for a given MM with four memory blocks. The MM in this example is called on to accommodate two consecutive memory requests:

Initial Free Blocks:       A(160KB), B(70KB), C(99KB), D(120KB)

Request 1)
block size:            100KB
Best Fit:              D(120KB)

New Free Blocks:          A(160KB), B(70KB), C(99KB), D'(20KB)

Request 2)
block size:            130KB
Best Fit:              A(160KB)

New Free Blocks:          A'(30KB), B(70KB), C(99KB), D'(20KB)

At the start of the preceding example an MM has the free blocks of memory A through D. The sizes of the various blocks can be found after the block name between parentheses. The first request made to the example MM is for a block of 100KB. Only two blocks of the MM are actually large enough to accommodate this request: A(160KB) and D(120KB). The Best Fit method dictates that block D is used, as its size is closest to that of the requested memory block. After Request 1 has been dealt with, the MM contains a new block called D'. Block D' represents what is left of the original Block D after Request 1 has been serviced. The size of Block D' equals 120KB–100KB = 20KB.

The second request made to the example MM is for a block of 130KB. This time Block A is chosen, leaving a Block A' of 160KB–130KB = 30KB.

The idea behind the Best Fit method is that when a free block of exactly the right size can be found, it is selected and no further fragmentation is incurred. In all other cases, the amount of memory that is left over from a selected block is as small as possible. This means that in using the Best Fit method we ensure that large blocks of free memory are reserved as much as possible for large memory requests. This principle is demonstrated in the example; if Block A had been chosen to accommodate the first request, the MM would not have had a block large enough to accommodate the second request.

The danger with the Best Fit method is, of course, that the blocks left over after accommodating a memory request (like Blocks D' and A') are too small to be used again.

Worst Fit

Unsurprisingly, the Worst Fit method does the opposite of the Best Fit method. Worst Fit entails always using the largest free block available to accommodate a memory request. The following example demonstrates the Worst Fit method for an MM. The example MM has four memory blocks and it is called on to accommodate two consecutive memory requests:

Initial Free Blocks:       A(160KB), B(70KB), C(99KB), D(120KB)

Request 1)
block size:            70KB
Worst Fit:             A(160KB)

New Free Blocks:          A'(90KB), B(70KB), C(99KB), D(120KB)

Request 2)
block size:            40KB
Worst Fit:             D(120KB)

New Free Blocks:          A'(90KB), B(70KB), C(99KB), D'(80KB)

At the start of the preceding example the MM has free memory blocks A through D. Again the sizes of the various blocks can be found after the block name between parentheses. The first request made to the example MM is for a block of 70KB. The largest block available to the MMs is A(160KB). The Worst Fit method dictates that this block should be chosen. After Request 1 has been dealt with, the MM has an Element A' that is 160KB–70KB or 90KB in size.

The second request made to the MM is for a block of 40KB. This time Block D is the largest block available, leaving a Block D' of 120KB–40KB or 80KB in size.

The idea behind the Worst Fit method is that when a request has been accommodated, the amount of memory that is left over from a selected block is as large as possible. This means that in using the Worst Fit method we try to ensure that leftover blocks are likely to be usable in the future.

The danger with the Worst Fit method is that after a while no block in the freelist will be large enough to accommodate a request for a large piece of memory.

First Fit

The First Fit method entails finding the first free block that is large enough to accommodate a certain memory request. This method allows memory requests to be dealt with expediently. As opposed to Best Fit and Worst Fit, not all the blocks managed by the MM need to be examined during block selection. What exactly happens as far as fragmentation is concerned depends on how the elements are ordered in the freelist.

How to Sort/Merge Released Memory

When a block of memory is released and thus returned to your MM, a reference to the released memory needs to be inserted somewhere in your MM's freelist. The method used to determine where to insert a released block into the freelist has a large impact on how your MM actually works. Consider an MM that orders the blocks of its freelist by size, with the largest block at the beginning of the list and the smallest at the end. When the First Fit method is used on a freelist ordered this way, the MM actually implements a Worst Fit algorithm because it will always select the largest available block. Similarly, when an MM orders the blocks of its free-list by size but with the smallest block first, the First Fit method actually translates into a Best Fit method. Clearly it pays to spend some time on the sorting mechanism to make sure your MM will perform as optimally as possible.

Other sorting methods follow:

  • By memory address

    This method sorts the freelist entries on the value of the address of the first byte of each free block. Using this method is particularly interesting when you want to design an MM that is able to recombine fragmented memory back into larger blocks. Whenever a reference to a freshly released block is inserted in the freelist, a quick check can be done to determine whether the preceding and/or following block in the freelist connects to the new block. When this is the case, several freelist entries can be replaced by a new entry (with the address of the first byte of the first block and the size of the blocks combined).

  • By release time

    This method keeps the released blocks in chronological order. The algorithm that implements this method is very straightforward: References to released blocks are always inserted at the beginning of the freelist, or perhaps always at the end of the freelist. By placing new blocks at the beginning of the freelist, the MM can be very fast when blocks of memory of the same size are allocated and deleted in quick succession.

In this section, you have seen several well-known techniques for MM implementation. No rule requires you to choose between these techniques. Depending on the requirements of your program, you may choose to mix several techniques together or create an entirely new technique that better suits your requirements. Another important thing to realize is that you can use as many, or as few, MMs in your programs as you see fit. You do not necessarily need to create one main MM to handle all memory requests. In fact, as you have seen in the Freelist and BigChunkStack examples, you could create a dedicated MM for every single class if that would help you. You might even decide to use OS memory management for most memory requests and write your own MMs only for specific (critical) classes.

In the summary of this chapter you can find a table with an overview of these methods.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset