Memory Management

Garbage collection, and memory management in general, will be the first and last things you work on. It is the apparent source of the most obvious performance problems, those that are quickest to fix, and will be something that you need to constantly monitor to keep in check. I say “apparent source” because as we will see, many problems are actually due to an incorrect understanding of the garbage collector’s behavior and expectations. You need to think of memory performance just as much as CPU performance. This is also true for unmanaged code performance, but in .NET it is a little more prominent, as well as easier to deal with. It is so fundamental to smooth .NET operation, that the most significant chunk of this book’s content is dedicated to just this topic.

Many people get very nervous when they think of the overhead garbage collection can cause. Once you understand it, though, it becomes straightforward to optimize your program for its operation. In the Introduction, you saw that the garbage collector can actually give you better overall heap performance in many cases because it deals with allocation and fragmentation better. In many ways, .NET’s memory management strategy, including the garbage collector, can actually be a benefit to your application, not a drawback.

I am covering garbage collection at the beginning of the book because so many of the concepts that come later will relate back to this chapter. Understanding the effect your program has on the garbage collector is so fundamental to achieving good performance, that it affects nearly everything else.

Memory Allocation

There are significant differences between how typical native heaps work and how the CLR’s garbage collected heaps work. The native heap in Windows maintains free lists to know where to put new allocations. Many long-running native code applications struggle with fragmentation. Time spent in memory allocation gradually increases as the allocator spends more and more time traversing the free lists looking for an open spot. Memory use continues to grow and, inevitably, the process will need to be restarted to begin the cycle anew. Some native programs deal with this by replacing the default implementation of malloc with custom allocation schemes that work hard to reduce this fragmentation. Windows also provides low-fragmentation heaps, which the CLR uses internally.

In .NET, memory allocation is trivial because it usually happens at the end of a memory segment and is not much more than a few instructions, such as additions, decrements, and a comparison in the normal case. In these simple cases, there are no free lists to traverse and little possibility of fragmentation. GC heaps can actually be more efficient because objects allocated together in time tend to be near one another on the heap, improving locality.

In the default allocation path, a small code stub will check the desired object’s size against the space remaining in a small allocation buffer. As long as the allocation fits, it is extremely fast and has no contention. Once the allocation buffer is exhausted, the GC allocator will take over and find a spot for the object (this may involve the use of free lists). Then a new allocation buffer will be reserved for future allocation requests.

The assembly code for this process is only a handful of instructions and useful to examine.

The C# to demonstrate this is just a simple allocation:

class MyObject { 
  int x; 
  int y; 
  int z;
}

static void Main(string[] args)
{
  var x = new MyObject();
}

Here is the breakdown of the calling code for the allocation:

; Copy method table pointer for the class into 
; ecx as argument to new()
; You can use !dumpmt to examine this value.
mov   ecx,3F3838h 

; Call new
call  003e2100

; Copy return value (address of object) into a register
mov   edi,eax

Here is the actual allocation:

; NOTE: Most code addresses removed for formatting reasons.
;
; Set eax to value 0x14, the size of the object to
; allocate, which comes from the method table
mov   eax,dword ptr [ecx+4] ds:002b:003f383c=00000014

; Put allocation buffer information into edx
mov   edx,dword ptr fs:[0E30h]

; edx+40 contains the address of the next available byte 
; for allocation. Add that value to the desired size.
add   eax,dword ptr [edx+40h]

; Compare the intended allocation against the 
; end of the allocation buffer.
cmp   eax,dword ptr [edx+44h]

; If we spill over the allocation buffer, 
; jump to the slow path
ja    003e211b

; update the pointer to the next free 
; byte (0x14 bytes past old value)
mov   dword ptr [edx+40h],eax

; Subtract the object size from the pointer to 
; get to the start of the new obj
sub   eax,dword ptr [ecx+4]

; Put the method table pointer into the
; first 4 bytes of the object. 
; eax now points to new object
mov   dword ptr [eax],ecx

; Return to caller
ret

; Slow Path - call into CLR method
003e211b  jmp clr!JIT_New (71763534)

In summary, this involves one direct method call and only nine instructions in the helper stub. That is hard to beat.

If you are using some configuration options such as server GC, then there is not even contention for the fast or the slow allocation path because there is a heap for every processor. .NET trades the simplicity in the allocation path for more complexity during de-allocation, but you do not have to deal with this complexity directly. You just need to learn how to optimize for it, which is what you will learn how to do in this chapter.

There are some ways to force the allocator to go down the slow path, however. If the allocation buffer is not large enough or the end of the segment has been reached, then the slow path will be called. In addition, if the object being allocated implements a finalizer, then the garbage collector needs to do more bookkeeping to track the object’s lifetime, thus it will call the slow path as well.

Garbage Collection Operation

The details of how the garbage collector makes decisions are continually being refined, especially as .NET becomes more prevalent in high-performance systems. The following explanation may contain details that will change in upcoming .NET versions, but the overall picture is unlikely to change drastically in the near future.

In a managed process, there are two types of heaps: unmanaged and managed. Unmanaged heaps are allocated with the VirtualAlloc Windows API and used by the operating system and CLR for unmanaged memory such as that for the Windows API, OS data structures, and even much of the CLR. The CLR allocates all managed .NET objects on the managed heap, also called the GC heap, because the objects on it are subject to garbage collection.

The managed heap is further divided into two types of heaps: the small object heap and the large object heap (LOH). Each one is assigned its own segments, which are blocks of memory belonging to that heap. Both the small object heap and the large object heap can have multiple segments assigned to them. The size of each segment can vary depending on your configuration and hardware platform.

Configuration 32-bit Segment Size 64-bit Segment Size
Workstation GC 16 MB 256 MB
Server GC 64 MB 4 GB
Server GC with > 4 logical processors 32 MB 2 GB
Server GC with > 8 logical processors 16 MB 1 GB

The small object heap segments are further divided into generations. There are three generations, referenced casually as gen 0, gen 1, and gen 2. Gen 0 and gen 1 are always in the same segment, but gen 2 can span multiple segments, as can the large object heap. The segment that contains gen 0 and gen 1 is called the ephemeral segment.

Initial heap layout.
Initial heap layout.

To start with, the small object heap is made up of one segment and the large object heap is another segment. Gen 2 and gen 1 start off at only a few bytes in size because they are empty so far.

Objects allocated on the small object heap pass through a lifetime process that needs some explanation. The CLR allocates all objects that are less than 85,000 bytes in size on the small object heap. They are always allocated in gen 0, usually at the end of the current used space. This is why allocations in .NET are extremely fast, as seen at the beginning of this chapter. If the fast allocation path fails, then the objects may be placed anywhere they can fit inside gen 0’s boundaries. If it will not fit in an existing spot, then the allocator will expand the current boundaries of gen 0 to accommodate the new object. This expansion occurs at the end of the used space towards the end of the segment. If this pushes past the end of the segment, it may trigger a garbage collection. The existing gen 1 space is untouched.

For small objects (less than 85,000 bytes), objects always begin their life in gen 0. As long as they are still alive, the GC will promote them to subsequent generations each time a collection happens. Garbage collections of gen 0 and gen 1 are sometimes called ephemeral collections.

When a garbage collection occurs, a compaction may occur, in which case the GC physically moves the objects to a new location to free space in the segment. If no compaction occurs, the boundaries are merely redrawn.

Heap layout after garbage collection.
Heap layout after garbage collection.

The individual objects have not moved, but the boundary lines have.

Compaction may occur in the collection of any generation and this is a relatively expensive process because the GC must fix up all of the references to those objects so they point to the new location, which may require pausing all managed threads. Because of this expense, the garbage collector will only do compaction when it is productive to do so, based on some internal metrics.

Once an object reaches gen 2, it remains there for the remainder of its lifetime. This does not mean that gen 2 grows forever—if the objects in gen 2 finally die off and an entire segment has no live objects, then the garbage collector can return the segment to the operating system or it can just hold on to it for future use. Process working set memory is not guaranteed to drop during a collection.

So what does alive mean? If the GC can reach the object via any of the known GC roots, following the graph of object references, then it is alive. A root can be the static variables in your program, the threads which have the stacks from all running methods (thus references to local variables), strong GC handles (such as pinned handles), and the finalizer queue. Note that you may have objects that no longer have roots to them, but if the objects are in gen 2, then a gen 0 collection will not clean them up. They will have to wait for a full collection.

If gen 0 ever starts to fill up a segment and a collection cannot compact it enough, then the GC will allocate a new segment. The new segment will house a new gen 1 and gen 0 while the old segment is converted to gen 2. Everything from the old generation 0 becomes part of the new generation 1 and the old generation 1 is likewise promoted to generation 2 (which conveniently does not have to be copied).

Heap layout after more allocations and collections cause new segments to be allocated.
Heap layout after more allocations and collections cause new segments to be allocated.

If gen 2 continues to grow, then it can span multiple segments. The LOH can also span multiple segments. Regardless of how many segments there are, generations 0 and 1 will always exist in the same segment. This knowledge of segments will come in handy later when we are trying to figure out which objects live where on the heap.

The large object heap obeys different rules. Any object that is at least 85,000 bytes in size is allocated on the LOH automatically and does not pass through the generational model—put another way, it is allocated directly in gen 2. The only types of objects that normally exceed this size are arrays and strings. For performance reasons, the LOH is not automatically compacted during collection and is thus easily susceptible to fragmentation. However, starting in .NET 4.5.1, you can compact it on-demand. Like gen 2, if memory in the LOH is no longer needed, then it can be reclaimed for other portions of the heap, but we will see later that ideally you do not want memory on the large object heap to be garbage collected at all.

In the LOH, the garbage collector always uses a free list to determine where to best place allocated objects. We will explore some techniques in this chapter to reduce fragmentation on this heap.

Note If you go poking around at the objects in the LOH in a debugger, you will notice that not only can the entire heap be smaller than 85,000 bytes in size, but that it can also have objects that are smaller than that size allocated on that heap. These objects are usually allocated by the CLR and you can ignore them.

A garbage collection runs for a specific generation and all generations below it. If it collects gen 1, it will also collect gen 0. If it collects gen 2, then all generations are collected, and the large object heap is collected as well. When a gen 0 or gen 1 collection occurs, the program is paused for the duration of the collection. For a gen 2 collection, portions of the collection can occur on a background thread, depending on the configuration options.

There are four phases to a garbage collection:

  1. Suspension: All managed threads in the application are forced to pause before a collection can occur. It is worth noting that suspension can only occur at certain safe points in code, like at a ret instruction. Native threads are not suspended and will keep running unless they transition into managed code, at which point they too will be suspended. If you have a lot of threads, a significant portion of garbage collection time can be spent just suspending threads.
  2. Mark: Starting from each root, the garbage collector follows every object reference and marks those objects as seen. Roots include thread stacks, pinned GC handles, and static objects.
  3. Compact: Reduce memory fragmentation by relocating objects to be next to each other and update all references to point to the new locations. This happens on the small object heap when needed and there is no way to control it. On the large object heap, compaction does not happen automatically at all, but you can instruct the garbage collector to compact it on-demand.
  4. Resume: The managed threads are allowed to resume.

The mark phase does not actually need to touch every object on the heap; it will only go through the target portion of the heap. For example, a gen 0 collection considers objects only from gen 0, a gen 1 collection will mark objects in both gen 0 and gen 1, and a gen 2, or full, collection, will need to traverse every live object in the heap, making it potentially very expensive.

An additional wrinkle here is that an object in a higher generation may be a root for an object in a lower generation. To track objects across all generations, the GC uses a card table that summarizes the heap with an array of bits that each represent a heap range. The bit is set “dirty” on a memory write in the corresponding range. When a collection happens, the GC will also consider any objects located in a dirty region as roots. This enables the GC to traverse only a subset of objects in the higher generation and it is not as expensive as a full collection for that generation.

There are a couple of important consequences to the behavior described above.

First, the time it takes to do a garbage collection is almost entirely dependent on the number of live objects in the collected generation, not the number of objects you allocated. This means that if you allocate a tree of a million objects, as long as you cut off that root reference before the next GC, those million objects contribute nothing to the amount of time the GC takes.

Second, the frequency of a garbage collection is primarily determined by how much memory is allocated in a specific generation. Once that amount passes an internal threshold, a GC will happen for that generation. The threshold continually changes and the GC adapts to your process’s behavior. If doing a collection on a particular generation is productive (it promotes many objects), then it will happen more frequently, and the converse is true. Another trigger for GCs is the total available memory on a machine, independent of your application. If available memory drops below a certain threshold, garbage collection may happen more frequently in an attempt to reduce the overall heap size.

From this description, it may feel like garbage collections are out of your control. This could not be farther from the truth! Manipulating GC behavior by controlling your memory allocation patterns is usually possible. It requires understanding of how the GC works, your allocation rate, how well you control object lifetimes, and what configuration options are available to you. Let’s take a closer look at those configuration options next.

Detailed Heap Layout

Now that we have seen how it works conceptually, examine these more detailed heap diagrams, drawn from a debugger session via !eeheap -gc .

Initial heap layout of a sample application with an ephemeral and a large object segment. The total heap size in the figure is about 258KB with about 4KB being used by the CLR directly. Gen 1 and gen 2 are only 12 bytes each.
Initial heap layout of a sample application with an ephemeral and a large object segment. The total heap size in the figure is about 258KB with about 4KB being used by the CLR directly. Gen 1 and gen 2 are only 12 bytes each.
After many small allocations and a large array, the ephemeral heap is unchanged—none of the allocations changed any boundaries. However, the LOH expanded from 17KB to over 400KB. Then a GC will happen.
After many small allocations and a large array, the ephemeral heap is unchanged—none of the allocations changed any boundaries. However, the LOH expanded from 17KB to over 400KB. Then a GC will happen.
After the collection, the large object heap is unchanged, but there are significant modifications to the ephemeral heap. Gen 1 grew significantly, and gen 0 shrunk correspondingly.
After the collection, the large object heap is unchanged, but there are significant modifications to the ephemeral heap. Gen 1 grew significantly, and gen 0 shrunk correspondingly.

Configuration Options

The .NET Framework does not give you very many ways to configure the garbage collector out the box. It is best to think of this as “less rope to hang yourself with.” For the most part, the garbage collector configures and tunes itself based on your hardware configuration, available resources, and application behavior. What few options are provided are for very high-level behaviors, and are mainly determined by the type of program you are developing.

Workstation vs. Server GC

The most important choice you have is whether to use workstation or server garbage collection.

Workstation GC is the default. In this mode, all GCs happen on the same thread that triggered the collection and run at the same priority. For simple apps, especially those that run on interactive workstations where many managed processes run, this makes the most sense. For computers with a single processor, this is the only option and trying to configure anything else will not have any effect.

Server GC creates a dedicated thread for each logical processor or core. These threads run at highest priority (THREAD_PRIORITY_HIGHEST), but are always kept in a suspended state until a GC is required. All garbage collections happen on these threads, not the application’s threads. After the GC, they sleep again.

In addition, the CLR creates a separate heap for each processor. Within each processor heap, there is a small object heap and a large object heap. From your application’s perspective, this is all logically the same heap—your code does not know which heap objects belong to and object references exist between all the heaps (they all share the same virtual address space).

Having multiple heaps gives a couple of advantages:

  • Garbage collection happens in parallel. Each GC thread collects one of the heaps. This can make garbage collection significantly faster than in workstation GC.
  • In some cases, allocations can happen faster, especially on the large object portion of the heap, where allocations are spread across all the heaps.

There are other internal differences as well such as larger segment sizes, which can mean a longer time between garbage collections.

You configure server GC in the app.config file inside the <runtime> element:

<configuration>
   <runtime>
    <gcServer enabled="true"/>
   </runtime>
</configuration>

Should you use workstation or server GC? If your app is running on a multi-processor machine dedicated to just your application, then the choice is clear: server GC. It will provide the lowest latency collection in most situations. However, server GC also means a much higher working set which means you will get closer to physical memory limits. With more objects in memory, garbage collections may start taking longer, eating away at the advantage.

On the other hand, if you need to share the machine with multiple managed processes, the choice is not so clear. Server GC creates many high-priority threads and if multiple apps do that, they can all negatively affect one other with conflicting thread scheduling. In this case, it might be better to use workstation GC.

If you really want to use server GC in multiple applications on the same machine, another option is to affinitize the competing applications to specific processors. The CLR will create heaps only for the processors which are enabled for that application.

Whichever one you pick, most of the tips in this book apply to both types of collection.

Background GC

Background GC changes how the garbage collector performs gen 2 collections by allowing it to occur more often in the background while other threads are executing. Gen 0 and gen 1 collections remain foreground GCs that block all application threads from executing.

Background GC works by having a dedicated thread for garbage collecting generation 2. For server GC there will be an additional thread per logical processor, in addition to the one already created for server GC in the first place. Yes, this means if you use server GC and background GC, you will have two threads per processor dedicated to GC, but this is not particularly concerning. It is not a big deal for processes to have many threads, especially when most of them are doing nothing most of the time. One thread is for foreground GC and runs at highest priority, but it is suspended most of the time. The thread for background GC runs at a lower priority concurrently with your application’s threads and will be suspended when the foreground GC threads become active, so that you do not have competing GC modes occurring simultaneously.

If you are using workstation GC, then background GC is always enabled. Starting with .NET 4.5, it is enabled on server GC by default, but you do have the ability to turn it off.

This configuration will turn off the background GC:

<configuration>
   <runtime>
     <gcConcurrent enabled="false"/>
   </runtime>
</configuration>

In practice, there should rarely ever be a reason to disable background GC. It will usually cause worse performance and more frequent foreground collections. If you want to prevent these background GC threads from ever taking CPU time from your application, but do not mind a potential increase in full, blocking GC latency or frequency, then you can turn this off. You should measure the impact carefully.

Latency Modes

The garbage collector has a number of latency modes, most of them accessed via the GCSettings.LatencyMode property. The mode should rarely be changed, but the options can be useful at times.

Interactive is the default GC latency mode when concurrent garbage collection is enabled (which is on by default). This mode allows collections to run in the background.

Batch mode disables all concurrent garbage collection and forces collections to occur in a single batch. It is intrusive because it forces your program to stop completely during all GCs. It should not regularly be used, especially in programs with a user interface.

There are two low-latency modes you can use for a limited time. If you have periods of time that require critical performance, you can tell the GC not to perform expensive gen 2 collections.

  • LowLatency: For workstation GC only, it will suppress gen 2 collections.
  • SustainedLowLatency: For workstation and server GC, it will suppress full gen 2 collections, but it will allow background gen 2 collections. You must enable background GC for this option to take effect.

Both modes will greatly increase the size of the managed heap because compaction will not occur. If your process uses a lot of memory, you should avoid this feature.

Right before entering one of these modes, it is a good idea to force a last full GC by calling GC.Collect(2, GCCollectionMode.Forced). Once your code leaves this mode, do another GC.

You should never use either of the low-latency modes by default. It is designed for applications that must run without serious interruptions for a long time, but not 100% of the time. A good example is stock trading. During market hours, you do not want full garbage collections happening. When the market closes, you turn this mode off and perform full GCs until the market reopens.

Only turn on a low-latency mode if all of the following criteria apply:

  • The latency of a full garbage collection is never acceptable during normal operation.
  • The application’s memory usage is far lower than available memory. (If you want low-latency mode, then max out your physical memory.)
  • Your program can survive long enough until it turns off low-latency mode, restarts itself, or manually performs a full collection.

Finally, starting in .NET 4.6, you can declare regions where garbage collections are disallowed, using the NoGCRegion mode. This attempts to put the GC in a mode where it will not allow a GC to happen at all. It cannot be set via this property, however. Instead, you must use the TryStartNoGCRegion method.

There are some significant caveats:

  • You must specify your total expected memory allocation up front.
  • The amount you requested must be at most the size of an ephemeral segment. (See earlier in this chapter for a discussion of segment sizes.)
  • The CLR must be able to immediately allocate what you requested for both the ephemeral and the large object heap memory.

There are a number of overloads of TryStartNoGCRegion, but the following example demonstrates the one with all of the options:

bool success = GC.TryStartNoGCRegion(
    totalSize: 2000000, 
    lohSize: 1000000, 
    disallowFullBlockingGC: true);

if (success)
{
    try
    {
        // do allocations
    }
    finally
    {
        if (GCSettings.LatencyMode == GCLatencyMode.NoGCRegion)
        {
            GC.EndNoGCRegion();
        }
    }
}

The totalSize parameter is the total number of bytes that you expect to allocate in the region. The lohSize parameter indicates how many of them you expect to be on the large object heap. The difference between totalSize and lohSize is the amount you expect to allocate on the ephemeral heap, and must be less than or equal to the size of the ephemeral heap (which size is given at the beginning of this chapter). By default, if the memory cannot be allocated by the CLR, it will do a full blocking GC to attempt to free some space. The disallowFullBlockingGC parameter can disable this functionality.

You should only call EndNoGCRegion if the previous call to TryStartNoGCRegion succeeded. You cannot nest calls to TryStartNoGCRegion.

If your memory allocations go over the amount you reserved, the guarantee is no longer honored and a garbage collection could happen.

Note The low-latency or no-GC modes are not absolute guarantees. If the system is running low on memory and the garbage collector has the choice between doing a full collection or throwing an OutOfMemoryException, it will perform a full collection regardless of your mode setting.

Alternative latency modes are rarely used and you should think twice about using them because of the potential unintended consequences. If you think it is useful, perform careful measurement to make sure. Tweaking the latency mode may cause other performance problems such as having more ephemeral collections (gen 0 and 1) in an attempt to deal with the lack of full collections. You may just trade one set of problems for another.

Large Objects

By default, arrays are limited to both UInt32.MaxValue in number of elements and 2 GB in actual size. Using a configuration option, you can allow larger array sizes, but the maximum number of elements remains the same.

<configuration>  
  <runtime>  
    <gcAllowVeryLargeObjects enabled="true" />  
  </runtime>  
</configuration> 

This allows 64-bit processes to have arrays that span more than 2 GB in size. However:

  • The maximum number of elements is UInt32.MaxValue (4,294,967,295).
  • The maximum index of any dimension is 2,147,483,591 for single-byte element arrays, or 2,146,435,071 for other types.
  • The maximum size of other objects is unaffected.

Advanced Options

Some GC options must be configured before the process starts because they are required during CLR initialization. In general, these settings will very rarely be necessary and you should strongly consider whether you need them.

These settings are configured via environment variables which are set on the command-line before you launch the process (which will receive a copy of the current environment).

Limit Heap Count

In server GC, there is a heap and at least one thread created for each processor. There may be times when you want to use fewer processors for GC, perhaps in tandem with changing the application’s processor affinity mask.

// Limit to using the first 16 processors
Process currentProcess = Process.GetCurrentProcess();
long mask = (long)currentProcess.ProcessorAffinity;
mask &= 0xFFFF; 
currentProcess.ProcessorAffinity = (IntPtr)mask;

If the application is launched with processor affinity already applied, then server GC will automatically restrict the number of heaps and threads it creates for garbage collecting.

However, this limits the number of processors the application can use for general work as well. If you want your application to use all of the processors for its own work, but only run GC on a subset of those processors, you need to set the GCHeapCount variable, which was introduced to CoreCLR in mid-2016, or .NET Framework 4.7.

SET COMPLUS_GCHeapCount=<n>

This option is only valid when using server GC. Replace <n> with a number less than the number of logical processors in use.

You may want to use this if you need the benefits of server GC, but need to limit the amount of CPU used during GCs. Because server GCs run at a high priority, having a thread per core will stall all other processes on the machine. Usually, this is by design and there is an assumption that a server GC app “owns” the machine, but this option is there if you want to free up some processors. For example, you may have a 64-processor server and you want the parallelism and dedicated, fast GC threads that come with server GC, but 64 heaps may be overkill if you need to be more frugal and ensure other processes do not starve during GCs. In addition, you will lessen the amount of memory overhead if your total memory requirements are more modest.

Disable GC Thread Affinity

In normal circumstances, each server GC thread is affinitized to run on a specific logical processor. This means that during a GC, it is a virtual guarantee that the GC thread will take over the processor as the highest-priority running thread.

With the following setting, you can turn off affinitization, which will allow GC threads to run on any available processor. This will ensure that the server GC process will cooperate better with other processes.

SET COMPLUS_GCNoAffinitize=1

This setting is designed to work well with COMPLUS_GCHeapCount when you are improving the cooperation between your server GC application with other processes on the machine.

By turning this on, you are explicitly stating that you want more cooperation and less exclusivity. This means there is no chance that this setting improves your application’s performance, but it might improve your overall system performance.

Verify Heap

When optimizing code to achieve the highest levels of performance it is unfortunately common to take shortcuts that can lead to bugs such as corrupting program state or even the heap structure itself. Heap corruption in .NET applications is almost always the result of buggy unmanaged code in the same process. However, it is still possible in managed-only applications and can indicate a bug within the CLR itself. When this happens, it can be extremely hard to debug as the crash will not happen at a deterministic place.

You can use the !VerifyHeap command to verify the heap within the debugger.

0:006> !VerifyHeap
object 04b05980: bad member 00000066 at 04B05984
Last good object: 04B057E4.
0:006> !do 04B057E4
Name:        System.Int32[]
MethodTable: 62281938
EEClass:     61e09600
Size:        412(0x19c) bytes
Array:       Rank 1, Number of elements 100, Type Int32
Fields:
None

Also, it can be tricky to deliberately get the heap into a state where problems manifest reliably. The heap can be in an in-between state while a GC is happening so you need to take care to ensure you validate the heap only outside of a GC.

Thankfully, there is an easy way to do this outside of the debugger. You can turn on an option to cause the heap to be verified before and after every GC.

SET COMPLUS_HeapVerify=1

Turning on heap verification will cause performance to suffer as each GC will now force the heap to be validated, a process which will take longer depending on the size of your heap. If corruption is detected, an exception will be thrown and the process will be terminated.

Performance Tips

Reduce Allocation Rate

This almost goes without saying, but if you reduce the amount of memory you are allocating, you reduce the pressure on the garbage collector to operate. You can also reduce memory fragmentation and CPU usage as well. It can take some creativity to achieve this goal and it might conflict with other design goals.

Critically examine each object and ask yourself:

  • Do I really need this object at all?
  • Does it have fields that I can get rid of?
  • Can I reduce the size of arrays?
  • Can I reduce the size of primitives (Int64 to Int32, for example)?
  • Are some objects used only in rare circumstances and can therefore be initialized only when needed?
  • Can I convert some classes to structs so they live on the stack, or as part of another object, and have no per-instance overhead?
  • Am I allocating a lot of memory, to use only a small portion of it?
  • Can I get this information in some other way?
  • Can I allocate memory up front?

Story In a server that handled user requests, we found out that one type of common request caused more memory to be allocated than the size of a heap segment. Since the CLR caps the maximum size of segments and gen 0 must exist in a single segment, we were guaranteed a GC on every single request. This is not a good spot to be in because there are few options besides reducing memory allocations.

The Most Important Rule

There is one fundamental rule for high-performance programming with regard to the garbage collector. In fact, the garbage collector was explicitly designed with this idea in mind:

Collect objects in gen 0 or not at all.

Put differently, you want objects to have an extremely short lifetime so that the garbage collector will never touch them at all, or, if you cannot do that, they should go to gen 2 as fast as possible and stay there forever, never to be collected. This means that you maintain a reference to long-lived objects forever. Often, this also means pooling reusable objects, especially anything on the large object heap.

Garbage collections get more expensive in each generation. You want to ensure there are many gen 0/1 collections and very few gen 2 collections. Even with background GC for gen 2, there is still a CPU cost that you would rather not pay: a processor the rest of your program should be using.

Note You may have heard the myth that you should have 10 gen 0 collections for each gen 1 collection and 10 gen 1 collections for each gen 2 collection. This is not true. Just understand that you want to have lots of fast gen 0 collections and very few of the expensive gen 2 collections.

You want to avoid objects being promoted to gen 1 because those that are will tend to also be promoted to gen 2 in due course. Gen 1 is a sort of buffer before you get to gen 2.

Ideally, every object you allocate goes out of scope by the time the next gen 0 comes around. You can measure how long that interval is and compare it to the duration that data is alive in your application. See the end of the chapter for how to use tools to discover this information.

Obeying this rule requires a fundamental shift in your mindset if you are not used to it. It will inform nearly every aspect of your application, so get used to it early and think about it often.

Reduce Object Lifetime

The shorter an object’s lifetime, the less chance it has of being promoted to the next generation when a GC comes along. In general, you should not allocate objects until right before you need them. The exception would be when the cost of object creation is so high it makes sense to create them at an earlier point when it will not interfere with other processing.

On the other side of the object use, you want to make sure that objects go out of scope as soon as possible. For local variables, this can be after the last local usage, even before the end of the method. You can lexically scope it narrower by using the { } brackets, but this will probably not make a practical difference because the compiler will generally recognize when a local object is no longer used anyway. If your code spreads out operations on an object, try to reduce the time between the first and last uses so that the GC can collect the object as early as possible.

Rarely, you may find a need to explicitly null out a reference to a temporary object if it is a member or static field on a long-lived object. You would do this only if you want to prevent this object from being promoted by the garbage collector. First, try to change the design to make the reference a local variable where object life time is not as much an issue. If you decide to null out a field, this may make the code slightly more complicated because you will have more checks for null values scattered around. This can also create a tension between efficiency and always having full state available, particularly for debugging. One option to get around that problem is to convert the object you want to null out to another form. For example, serialize an XML document hierarchy to a string, or a temporary state object to a log message that can more efficiently record the state for debugging later. This technique is usually only necessary for large, temporary object graphs that are in fields for convenience purposes.

Another way to manage this balance is to have variable behavior: run your program (or a specific portion of your program, say for a specific request) in a mode that does not null out references but keeps them around as long as possible for easier debugging.

Balance Allocations

As described at the beginning of this chapter, the GC works by following object references. In server GC, it does this on multiple threads at once. You want to exploit parallelism as much as possible, and if one thread hits a very long chain of nested objects, the entire garbage collection process will not finish until that long-running thread is complete. In addition, if a particular thread allocates more memory than others, it will trigger a GC more often than if the same allocations were spread across multiple heaps.

Thankfully, there are load-balancing algorithms. For allocations, when the GC detects that heaps are becoming unbalanced, it will start forcing allocations to occur on different heaps. This functionality has existed for the small object heap for many CLR versions, but balancing the large object heap has only happened since version 4.5. On the collection side, cores that run out of collection work can steal work from other heaps.

Problems with unbalanced heaps are less common now with these GC features, but if you suspect too-frequent, or long GC pauses, it may be worth looking at your code for the presence of deep object trees or a thread bias for allocations.

If you do find that a single thread is responsible for most of the allocations, investigate ways to spread this responsibility around. Ensure that you are using Task objects or the thread pool to even out the possibility of different threads handling different requests. Avoid the pattern of a single thread processing a queue of requests and doing the bulk of allocations before handing off the work to other threads to finish processing.

Reduce References Between Objects

Objects that have many references to other objects will take more time for the garbage collector to traverse. A long GC pause time is often an indication of a large, complex object graph.

Another danger is that it becomes much harder to predict object lifetimes if you cannot easily determine all of the possible references to them. Reducing this complexity is a worthy goal just for sane code practices, but it also makes debugging and fixing performance problems easier.

Also, be aware that references between objects of different generations can cause inefficiencies in the garbage collector, specifically references from older objects to newer objects. For example, if an object in generation 2 has a reference to an object in generation 0, then every time a gen 0 GC occurs, a portion of gen 2 objects will also have to be scanned to see if they are still holding onto this reference to a generation 0 object. It is not as expensive as a full GC, but it is still unnecessary work if you can avoid it.

Avoid Pinning

Pinning an object fixes it in place so that the garbage collector cannot move it. Pinning exists so that you can safely pass managed memory references to unmanaged code. It is most commonly used to pass arrays or strings to unmanaged code, but is also used to gain direct fixed memory access to data structures or fields. If you are not doing interop with unmanaged code and you do not have any unsafe code, then you should not have the need to pin at all. However, even if you avoid explicit pinning in your own code, there are plenty of common APIs that need to do it anyway.

While the pinning operation itself is inexpensive, it throws a bit of a wrench into the garbage collector’s operation by increasing the likelihood of fragmentation. The garbage collector tracks those pinned objects so that it can use the free spaces between them, but if you have excessive pinning, it can still cause fragmentation and heap growth.

Pinning can be either explicit or implicit. Explicit pinning is performed with use of a GCHandle of type GCHandleType.Pinned or the fixed keyword and must be inside code marked as unsafe. The difference between using fixed or a handle is analogous to the difference between using and explicitly calling Dispose. fixed is more convenient, but cannot be used in asynchronous situations, whereas you can pass around a handle and dispose of it in the callback.

Implicit pinning is more common, but can be harder to see and more difficult to remove. The most obvious source of pinning will be any objects passed to unmanaged code via Platform Invoke (P/Invoke). This is not just your own code—managed APIs that you call can, and often do, call native code, which will require pinning.

The CLR will also have pinned objects in its own data structures, but these should normally not be a concern.

Ideally, you should eliminate as much pinning as you can. If you cannot quite do that, follow the same rules for garbage collection: keep lifetime as short as possible. If objects are only pinned briefly then there is less chance for them to affect the next garbage collection. You also want to avoid having very many pinned objects at the same time. Pinning objects located in gen 2 or the LOH is generally fine because these objects are unlikely to move anyway. This can lead to a strategy of either allocating large buffers on the large object heap and giving out portions of them as needed, or allocating small buffers on the small object heap, but before pinning, ensure they are promoted to gen 2. This takes a bit of management on your part, but it can completely avoid the issue of having pinned buffers during a gen 0 GC.

Avoid Finalizers

Never implement a finalizer unless it is required. Finalizers are code, triggered by the garbage collector to cleanup unmanaged resources. They are called from a single thread, one after the other, and only after the garbage collector declares the object dead after a collection. This means that if your class implements a finalizer, you are guaranteeing that it will stay in memory even after the collection that should have killed it. There is also additional bookkeeping to be done on each GC as the finalizer list needs to be continually updated if the object is relocated. All of this combines to decrease overall GC efficiency and ensure that your program will dedicate more CPU resources to cleaning up your object.

Not only that, but an object with a finalizer is slower to allocate. Instead of the “fast path” allocator, it must do extra bookkeeping to ensure the GC tracks the object for its lifetime.

If you do implement a finalizer, you must also implement the IDisposable interface to enable explicit cleanup, and call GC.SuppressFinalize(this) in the Dispose method to remove the object from the finalization queue. As long as you call Dispose before the next collection, it will clean up the object properly without the need for the finalizer to run. The following example correctly demonstrates this pattern. Note that you can (and often should) implement the Dispose pattern without implementing a finalizer.

class Foo : IDisposable
{
  private bool disposed = false;
  private IntPtr handle;
  private IDisposable managedResource;

  ~Foo()  // Finalizer
  {
    Dispose(false);
  }
  
  public void Dispose()
  {
    Dispose(true);
    GC.SuppressFinalize(this);
  }
  
  protected virtual void Dispose(bool disposing)
  {
    if (this.disposed)
    {
      return;
    }
    if (disposing)
    {
      // Not safe to do this from finalizer
      this.managedResource.Dispose();
    }

    // Cleanup unmanaged resources that are safe to 
    // do so in a finalizer
    UnsafeClose(this.handle);

    // If the base class is IDisposable object 
    // make sure you call base.Dispose(disposing);
    this.disposed = true;
  }  
}

All cleanup logic is centralized in the Dispose(bool) method. Everything else just calls it. The disposing variable indicates whether a developer explicitly called Dispose. If they did, then it is safe to Dispose of all resources. However, if this method is called via the finalizer, then there is no guarantee any referenced objects are still valid, so only those unmanaged resources explicitly owned by this object can be safely cleaned up in this method. In the context of a finalizer, very few assumptions can be made about the state of objects referenced by this object. The code must be simple and touch only memory guaranteed to belong only to this object and still be valid. Typically this means that you should not access any other finalizable object, or any other disposable object (unless you can guarantee its validity).

Only mark the protected version of Dispose virtual and allow it to be overriden by child types. The disposed field tracks whether the object has already been disposed, allowing the Dispose method to be called more than once.

Dispose methods and finalizers should never throw exceptions. Should an exception occur during a finalizer’s execution, then the process will terminate. Finalizers should also be careful doing any kind of I/O, even as simple as logging.

Properly implementing this pattern is important to ensure that it works correctly with polymorphic types. You will have to exercise judgment on whether to implement finalizers on base types that themselves do not have unmanaged resources, but may have derived types that do have such resources. It may be required in some cases to take the performance hit for correctness, but this should be avoided if at all possible.

Any type that contains instances of other IDisposable types must itself implement IDisposable. In this way, IDisposable has a way of spreading through your data structures. Properly implemented, it should be easy to dispose of all the resources merely by calling the root IDisposable’s Dispose method.

Note You may have heard that finalizers are guaranteed to run. This is generally true, but not absolutely so. If a program is force-terminated then no more code runs and the process dies immediately. The finalizer thread is triggered by a garbage collection, so if there are no garbage collections, finalizers will not run. There is also a time limit to how long all of the finalizers are given on process shutdown. If your finalizer is at the end of the list, it may be skipped. Moreover, because finalizers execute sequentially, if another finalizer has an infinite loop bug in it, then no finalizers after it will ever run. This can lead to memory leaks. For all these reasons, you should not rely on finalizers to clean up state external to your process.

Avoid Large Object Allocations

Not all allocations go to the same heap. Objects over a certain size will go to the large object heap and immediately be in gen 2. The boundary for large object allocations was set at 85,000 bytes by doing a statistical analysis of programs of the day. Any object of that size or greater is judged to be “large” and it goes on a separate heap.

You want to avoid allocations on the large object heap as much as possible. Not only is collecting garbage from this heap more expensive, it is more likely to fragment, causing unbounded memory increases over time. Continuous allocations to the large object heap send a strong signal to the garbage collector to do continuous garbage collections—not a good place to be in.

To avoid these problems, you need to strictly control what your program allocates on the large object heap. What does go there should last for the lifetime of your program and be reused as necessary in a pooling scheme.

The large object heap does not automatically compact, but you may tell it to do so programmatically starting with .NET 4.5.1. However, you should use this only as a last resort, as it will cause a very long pause. Before explaining how to do that, the next few sections will explain how to avoid getting into that situation in the first place.

Avoid Copying Buffers

You should usually avoid copying data whenever you can. For example, suppose you have read file data into a MemoryStream (preferably a pooled one if you need large buffers). Once you have that memory allocated, treat it as read-only and every component that needs to access it will read from the same copy of the data.

A common requirement, then, is to refer to sub-ranges of a buffer, array, or memory range. .NET provides two ways to accomplish this at present.

The first option, available only for arrays, is the ArraySegment<T> struct to represent just a portion of the underlying array. This ArraySegment can be passed around to APIs independent of the original stream, and you can even attach a new MemoryStream to just that segment. Throughout all of this, no copy of the data has been made.

var memoryStream = new MemoryStream(2048);
var segment = new ArraySegment<byte>(memoryStream.GetBuffer(), 
                                     100, 
                                     1024);
...
var blockStream = new MemoryStream(segment.Array, 
                   segment.Offset, 
                   segment.Count);

The biggest problem with copying memory is not the CPU necessarily, but the GC. If you find yourself needing to copy a buffer, then try to copy it into another pooled or existing buffer to avoid any new memory allocations.

A newer option for representing pieces of existing buffers is the Span<T> struct. Span<T> is still in a pre-release phase at the time of this writing, but it will likely become finalized with the release of C# 7.2 or future upgrades to the runtime. To use this library, you will need to consume the System.Memory NuGet package and use Visual Studio 2017.

Span<T> is like an array in the sense that it represents a contiguous block of memory, but it has the distinction of being able to wrap managed memory, unmanaged memory, and stack memory with the same abstraction. For unmanaged memory, you can think of it as a smart wrapper that does pointer arithmetic for you.

The following examples of Span<T> come from the Span project in the accompanying sample code.

The first example creates a standard byte array on the managed heap and creates a span from a sub-portion of that array. (It could just as easily have spanned the entire array.)

{
...
    byte[] array = new byte[] {0, 1, 2, 3};
    Span<byte> byteSpan = new Span<byte>(array, 1, 2);
    PrintSpan(byteSpan);    
    ...
}

private static void PrintSpan<T>(Span<T> span)
{
    for (int i = 0; i < span.Length; i++)
    {
        ref T val = ref span[i];
        Console.Write(val);
        if (i < span.Length - 1) { Console.Write(", "); }
    }
    Console.WriteLine();
}

This produces the following output:

1, 2

This example uses a Span<T> to wrap a stack-allocated array:

unsafe
{
    int* stackMem = stackalloc int[4];
    Span<int> intSpan = new Span<int>(stackMem, 4);
    for (int i=0;i<intSpan.Length;i++)
    {
        intSpan[i] = 13 + i;
    }
    PrintSpan(intSpan);
}

As you can see, it uses the exact same semantic to wrap this array, and the same helper method can be used to print the values. Its output is:

13, 14, 15, 16

The next example is slightly more complex. When you allocate from the native heap, you must specify the number of bytes you are allocating and when you wrap unmanaged memory in a Span<T>, you are assigning types to that memory, so the length of the span is specified in the count of objects, not length of bytes. This example accounts for that by multiplying the size of the objects we want by the count before we allocate.

unsafe
{
    const int ObjectCount = 4;
    int memSize = sizeof(int) * ObjectCount;
    IntPtr hNative = Marshal.AllocHGlobal(memSize);
    Span<int> unmanagedSpan = new Span<int>(hNative.ToPointer(), 
                                            ObjectCount);
    for (int i = 0; i < unmanagedSpan.Length; i++)
    {
        unmanagedSpan[i] = 100 + i;
    }
    PrintSpan(unmanagedSpan);
    Marshal.FreeHGlobal(hNative);
}

The output is:

100, 101, 102, 103

The final example makes use of one of the extension methods included in the library to convert a string into a ReadOnlySpan<char>. Unfortunately, there is no relationship between Span<T> and ReadOnlySpan<T> because Span<T> utilizes ref-return semantics to avoid copying values. That means, we have to have a separate utility method to print the values.

{
...
    ReadOnlySpan<char> subString = 
      "NonAllocatingSubstring".AsSpan().Slice(13);
    PrintSpan(subString);
...
}

private static void PrintSpan<T>(ReadOnlySpan<T> span)
{
    for (int i = 0; i < span.Length; i++)
    {
        T val = span[i];
        Console.Write(val);
        if (i < span.Length - 1) { Console.Write(", "); }
    }
    Console.WriteLine();
}

The output of this code is:

S, u, b, s, t, r, i, n, g

There are also utility methods to convert from arrays and ArraySegment structs to Span<T> structs.

Pool Long-Lived and Large Objects

Remember the cardinal rule from earlier: Objects live very briefly or forever. They should either go away in gen 0 collections or last forever in gen 2. Some objects are essentially static—they are created and last the lifetime of the program naturally. Other objects do not obviously need to last forever, but their natural lifetime in the context of your program ensures they will live longer than the period of a gen 0 (and maybe gen 1) garbage collection. These types of objects are candidates for pooling. Another strong candidate for pooling is any object that you allocate on the large object heap, typically collections.

There is no single way to pool and there is no standard pooling API you can rely on. It really is up to you to develop a way that works for your application and the specific objects you need to pool.

One way to think about poolable objects is that you are turning a normally managed resource (memory) into something that you have to manage explicitly. .NET already has a pattern for dealing with finite managed resources: the IDisposable pattern. See earlier in this chapter for the proper implementation of this pattern. A reasonable design is to derive a new type and have it implement IDisposable, where the Dispose method puts the pooled object back in the pool. This will be a strong clue to users of that type that they need to treat this resource specially.

Implementing a good pooling strategy is not trivial and can depend entirely on how your program needs to use it, as well as what types of objects need to be pooled. Here is code that shows one example of a simple pooling class to give you some idea of what is involved. This code is from the PooledObjects sample program.

interface IPoolableObject : IDisposable
{
  int Size { get; }
  void Reset();
  void SetPoolManager(PoolManager poolManager);
}

class PoolManager
{
  private class Pool
  {
    public int PooledSize { get; set; }
    public int Count { get { return this.Stack.Count; } }
    public Stack<IPoolableObject> Stack { get; private set; }
    public Pool()
    {
      this.Stack = new Stack<IPoolableObject>();
    }

  }
  const int MaxSizePerType = 10 * (1 << 10); // 10 MB

  Dictionary<Type, Pool> pools = 
    new Dictionary<Type, Pool>();

  public int TotalCount
  {
    get
    {
      int sum = 0;
      foreach (var pool in this.pools.Values)
      {
        sum += pool.Count;
      }
      return sum;
    }
  }

  public T GetObject<T>() 
    where T : class, IPoolableObject, new()
  {
    Pool pool;
    T valueToReturn = null;
    if (pools.TryGetValue(typeof(T), out pool))
    {
      if (pool.Stack.Count > 0)
      {
        valueToReturn = pool.Stack.Pop() as T;
      }
    }
    if (valueToReturn == null)
    {
      valueToReturn = new T();
    }
    valueToReturn.SetPoolManager(this);
    pool.PooledSize -= valueToReturn.Size;

    return valueToReturn;
  }

  public void ReturnObject<T>(T value) 
    where T : class, IPoolableObject, new()
  {
    Pool pool;
    if (!pools.TryGetValue(typeof(T), out pool))
    {
      pool = new Pool();
      pools[typeof(T)] = pool;
    }

    if (value.Size + pool.PooledSize <= MaxSizePerType)
    {
      pool.PooledSize += value.Size;
      value.Reset();        
      pool.Stack.Push(value);
    }
  }
}

class MyObject : IPoolableObject
{
  private PoolManager poolManager;
  public byte[] Data { get; set; }
  public int UsableLength { get; set; }

  public int Size
  {
    get { return Data != null ? Data.Length : 0; }
  }

  void IPoolableObject.Reset()
  {
    UsableLength = 0;
  }

  void IPoolableObject.SetPoolManager(
    PoolManager poolManager)
  {
    this.poolManager = poolManager;
  }

  public void Dispose()
  {
    this.poolManager.ReturnObject(this);
  }
}

It may seem a burden to force pooled objects to implement a custom interface, but apart from convenience, this highlights a very important fact: In order to use pooling and reuse objects, you must be able to fully understand and control them. Your code must reset them to a known, safe state every time they go back into the pool. This means you should not naively pool 3rd-party objects directly. By implementing your own objects with a custom interface, you are providing a very strong signal that the objects are special. You should especially be wary of pooling objects from the .NET Framework.

It is particularly tricky pooling collections because of their nature—you do not want to destroy the actual data storage (that is the whole point of pooling, after all), but you must be able to signify an empty collection with available space. Thankfully, most collection types implement both Length and Capacity properties that make this distinction. Given the dangers of pooling the existing .NET collection types, it is better if you implement your own collection types using the standard collection interfaces such as IList<T>, ICollection<T>, and others. See Chapter 6 for general guidance on creating your own collection types.

An additional strategy is to have your poolable types implement a finalizer as a safety mechanism. If the finalizer runs, it means that Dispose was never called, which is a bug. You can choose to write something to the log, crash, or otherwise signal the problem. You must be very careful with this signaling, though, because touching memory that has been invalidated by the GC will cause a crash or hang.

Remember that a pool that never dumps objects is indistinguishable from a memory leak. Your pool should have a bounded size (in either bytes or number of objects), and once that has been exceeded, it should drop objects for the GC to clean up. Ideally, your pool is large enough to handle normal operations without dropping anything and the GC is only needed after brief spikes of unusual activity. Depending on the size and number of objects contained in your pool, dropping them may lead to long, full GCs. It is important to make sure your pool is tunable for your situation.

I do not usually run to pooling as a default solution. As a general-purpose mechanism, it is clunky and error-prone. However, you may find that your application will benefit from pooling of just a few types.

Case Study: RecyclableMemoryStream

I once worked on an application that managed federation to thousands of back-end network resources per second. Most of its work was reading bytes off the network or writing to it. Nearly 90% of all allocated memory was going towards MemoryStream objects that were being allocated and resized all over the place: string encoding, marshaling, unmarshaling, temporary buffers, and more. As a result, we were spending a phenomenal amount of time just doing GC—nearly 25% of all CPU time! Doing memory and CPU profiling quickly revealed the need for a better way to handle bytes than MemoryStream.

This section will discuss the design and some implementation details of a pooled MemoryStream class, called RecyclableMemoryStream. You can download the code at https://github.com/Microsoft/Microsoft.IO.RecyclableMemoryStream or use it directly from Visual Studio with a NuGet package.

Our requirements for the replacement were:

  • Completely eliminate allocations to the large object heap
  • Spend less time in GC, especially with gen 2 GCs
  • Avoid memory leaks by bounding the pool size
  • Avoid memory fragmentation
  • Provide easy debuggability
  • Instrument for metrics and logging
  • Obey Dispose semantics
  • Be a drop-in replacement for MemoryStream, as much as possible
  • Have thread safety

These requirements were all met and led to the following list of features and implementation details:

  • Instead of pooling the streams themselves, the underlying buffers are pooled. This allows us to better instrument the streams and detect aberrant usage patterns such as stream reuse and leaking (especially important in pooling scenarios), as well as simpler reuse of the arrays. It also helps prevent fragmentation by using equal buffer sizes.
  • The streams are an abstraction on top of chained buffers, to appear as a single large buffer.
  • While streams themselves are not thread-safe, the acts of allocating and freeing pooled streams are thread-safe.
  • Each stream has an identifying tag which can help you debug incorrect pool usage.
  • Includes the ability to track the allocation stack of the stream to help you debug pool leaks.
  • Allows flexible and configurable pool limits that restrict memory usage while handling spikes in usage.
  • Surfaces detailed events and metrics to track usage over time.

The devil is in the details, as the saying goes, so let’s dive into some of the implementation.

Before you can allocate a RecyclableMemoryStream, you must create the pool manager, a RecyclableMemoryStreamManager object. This is the class that actually manages the buffer pools and tracks resource usage. Think of it like a miniature heap inside the CLR’s heap. On this class, you set all of your configuration options, like default buffer sizes, maximum size of the heap, and more. There is typically one manager object per process and it lives for the lifetime of the process. However, if you have wildly different usage scenarios, there is no problem using multiple RecyclableMemoryStreamManager objects.

The RecyclableMemoryStreamManager maintains two categories of buffers: the Small Pool and the Large Pool. The Small Pool is made of lots of equal-sized buffers. The “Small” in Small Pool refers to the size of the individual buffer, not the size of the pool. The buffers in the Small Pool are called blocks (because they are combined to form the longer stream). The Large Pool contains larger buffers, but far fewer of them, and is designed to be used less frequently (only when GetBuffer is called). Both pools use uniform buffer sizes to reduce the likelihood of heap fragmentation.

The pools in RecyclableMemoryStreamManager, with a block size of 128KB, a large buffer multiple of 1MB, and a maximum buffer size of 4 MB.
The pools in RecyclableMemoryStreamManager, with a block size of 128KB, a large buffer multiple of 1MB, and a maximum buffer size of 4 MB.

Using this library is easy:

var sourceBuffer = new byte[]{0,1,2,3,4,5,6,7}; 
var manager = new RecyclableMemoryStreamManager(); 
using (var stream = manager.GetStream("Test")) 
{ 
    stream.Write(sourceBuffer, 0, sourceBuffer.Length); 
}

This code creates a RecyclableMemoryStreamManager with default settings, grabs a stream, writes some bytes to it, and then returns the stream’s blocks to the pool with the Dispose call. This example passes the tag “Test” to the stream’s constructor. This tag is not unique per-stream, but serves to identify the location in code where it is allocated, which can help in debugging. It is not required to use tags, but they are useful. Internally, each stream is also assigned a unique GUID that does serve to uniquely identify the stream, which can be useful when tracing concurrent usage of multiple streams.

Internally, the RecyclableMemoryStream will grab a block from the manager. As more data is written to the stream, more blocks are chained together and the stream’s APIs will make this look like a single contiguous block of memory. As the length of the stream grows, the total memory usage only grows by the block size (and that is assuming the blocks were not already pooled). This is in contrast to MemoryStream’s implementation, which doubles the stream’s capacity as it grows, leading to potentially massive memory waste, which is fine on a small scale, but not on a massive scale.

As long as just Read and Write methods are used, only blocks will be used. However, sometimes it is necessary to get a single contiguous buffer. For this, there is the GetBuffer API, inherited from MemoryStream. When GetBuffer is called, a contiguous block must be returned. If there is only one block in use, then a reference to it is returned. If multiple blocks are used, then the Large Pool is used to satisfy the request, and bytes are copied from the blocks to the larger buffer. If the buffer requested is larger than the maximum buffer size of the pool, then a memory allocation occurs to satisfy the request.

It is worthwhile noting that the buffer returned is at least as large as the data contained in it—it may in fact be much larger. You must use the stream’s Length property to determine how much data is actually in it. Naive users of the library sometimes ignore this and write huge buffers to the network or to files. After converting the stream to a buffer, with an associated data length, it may be useful to wrap them in an ArraySegment<byte> struct.

The ToArray method is much less useful in a pooling scenario. It is required to return an array of exactly the right size, which means that an allocation (possibly on the large object heap) will occur, as well as a memory copy. Because of these inefficiencies, ToArray should just be completely avoided.

I encourage you to study the code at the link provided earlier because it will be beneficial to understanding how the library attempts to avoid allocations while balancing the need for other requirements.

Once we implemented this library in production code, we saw allocation on the large object heap drop 99%. Worrying about expensive gen 2 collections became a thing of the past. The time spent in garbage collection dropped from 25% to less than 1%.

Reduce Large Object Heap Fragmentation

If you cannot completely avoid large object heap allocations, then you want to do your best to avoid fragmentation.

The large object heap can grow indefinitely if you are not careful, but it is mitigated by the free list. To take advantage of this free list, you want to increase the likelihood that memory allocations can be satisfied from holes in the heap.

One way to do this is to ensure that all allocations on the LOH are of uniform size, or at least multiples of some standard size. For example, a common need for LOH allocations is for buffer pools. Rather than have a hodge-podge of buffer sizes, ensure that they are all the same size, or in multiples of some well-known number such as one megabyte. This way, if one of the buffers does need to get garbage collected, there is a high likelihood that the next buffer allocation can fill its spot rather than going to the end of the heap.

Force Full GCs in Some Circumstances

In nearly all cases, you should not force collections to happen outside of their normal schedule as determined by the GC itself. Doing so disrupts the automatic tuning the garbage collector performs and may lead to worse behavior overall. However, there are some considerations in a high-performance system that may cause you to reconsider this advice in very specific situations.

In general, it may be beneficial to force a GC to occur during a more optimal time to avoid a GC occurring during a worse time later on. Note that we are only talking about the expensive, ideally rare, full GCs. Gen 0 and gen 1 GCs can and should happen frequently to avoid building up a too-large gen 0 size.

Some situations may merit a forced collection:

  1. You are using low-latency GC mode. In this mode, heap size can grow and you will need to determine appropriate points to perform a full collection. See the section earlier in this chapter about low-latency GC.
  2. You have a natural downtime, or off-peak hours, in the application’s schedule. This may mean you are using a low-latency GC mode, but is not required.
  3. You occasionally make a large number of allocations that will live for a long time (forever, ideally). It makes sense to get these objects into gen 2 as quickly as possible. If these objects replace other objects that will now become garbage, you can just get rid of them immediately with a forced collection.
  4. You need to compact the large object heap because of fragmentation. See the section about large object heap compaction.

Situations 1 through 3 are all about avoiding full GCs during specific times by forcing them at other times. Situation 4 is about reducing your overall heap size if you have significant fragmentation on the LOH. If your scenario does not fit into one of those categories, you should not consider this a useful option.

To perform a full collection, call the GC.Collect method with the generation of the collection you want it to perform. Optionally, you can specify a value of the GCCollectionMode enumeration argument to tell the GC to decide for itself whether to do the collection. There are three possible values:

  • Default: Currently, Forced.
  • Forced: Tells the garbage collector to start the collection immediately.
  • Optimized: Allows the garbage collector to decide if now is a good time to run.
GC.Collect(2);
// equivalent to:
GC.Collect(2, GCCollectionMode.Forced);

Story This exact situation existed on a server that took user queries. Every few hours we needed to reload over a gigabyte of data, replacing the existing data. Since this was an expensive operation and we were already reducing the number of requests the machine was receiving, we also forced two full GCs after the reload happened. This removed the old data and ensured that everything allocated in gen 0 either got collected or made it to gen 2 where it belonged. Then, once we resumed a full query load, there would not be a huge, full GC to affect the first queries.

Compact the Large Object Heap On-Demand

Even if you do pooling, it is still possible that there are allocations you cannot control and the large object heap will become fragmented over time. Starting in .NET 4.5.1, you can tell the GC to compact the large object heap on the next full collection.

GCSettings.LargeObjectHeapCompactionMode = 
  GCLargeObjectHeapCompactionMode.CompactOnce;

Depending on the size of the large object heap, this can be a slow operation, up to multiple seconds. You may want to put your program in a state where it stops doing real work and force an immediate collection with the GC.Collect method.

This setting only affects the next full GC that happens. Once the next full collection occurs, GCSettings.LargeObjectHeapCompactionMode resets automatically to GCLargeObjectHeapCompactionMode.Default.

Because of the expense of this operation, I recommend you reduce the number of large object heap allocations to as little as possible and pool those that you do make. This will significantly reduce the need for compaction. View this feature as a last resort and only if fragmentation and very large heap sizes are an issue.

Get Notified of Collections Before They Happen

If your application absolutely should not be impacted by gen 2 collections, then you can tell the GC to notify you when a full GC is approaching. This will give you a chance to stop processing temporarily, perhaps by shunting requests off the machine, or otherwise putting the application into a more favorable state.

It may seem like this notification mechanism is the answer to all GC woes, but I recommend extreme caution. You should only implement this after you have optimized as much as you can in other areas. You can only take advantage of GC notifications if all of the following statements are true:

  1. A full GC is so expensive that you cannot afford to endure a single one during normal processing.
  2. You are able to turn off processing for the application completely. (Perhaps other computers or processes can do the work meanwhile.)
  3. You can turn off processing quickly (so you do not waste more time stopping processing than actually performing the GC).
  4. Gen 2 collections happen rarely enough to make this worth it.

Gen 2 collections will happen rarely only if you have large object allocations minimized and little promotion beyond gen 0, so it will still take a fair amount of work to get to the point where you can reliably take advantage of GC notifications.

Unfortunately, because of the imprecise nature of GC triggering, you can only specify the pre-trigger time in an approximate way with a number in the range 1–99. With a number that is very low, you will be notified much closer to when the GC will happen, but you risk having the GC occur before you can react to it. With a number that is too high, the GC may be quite far away and you will get a notification far too frequently, which is quite inefficient. It all depends on your allocation rate and overall memory load. Note that you specify two numbers: one for the gen 2 threshold and one for the large object heap threshold. As with other features, this notification is a best effort by the garbage collector. The garbage collector never guarantees you can avoid doing a collection.

To use this mechanism, follow these general steps:

  1. Call the GC.RegisterForFullGCNotification method with the two threshold values.
  2. Poll the GC with the GC.WaitForFullGCApproach method. This can wait forever or accept a timeout value.
  3. If the WaitForFullGCApproach method returns Success, put your program in a state acceptable for a full GC (e.g., turn off requests to the machine).
  4. Force a full collection yourself by calling the GC.Collect method.
  5. Call GC.WaitForFullGCComplete (again with an optional timeout value) to wait for the full GC to compete before continuing.
  6. Turn requests back on.
  7. When you no longer want to receive notifications of full GCs, call the GC.CancelFullGCNotification method.

Because this requires a polling mechanism, you will need to run a thread that can do this check periodically. Many applications already have some sort of “housekeeping” thread that performs various actions on a schedule. This may be an appropriate task, or you can create a separate dedicated thread.

Here is a full example from the GCNotification sample project demonstrating this behavior in a simple test application that allocates memory continuously. See the accompanying source code project to test this.

class Program
{
  static void Main(string[] args)
  {
    const int ArrSize = 1024;
    var arrays = new List<byte[]>();

    GC.RegisterForFullGCNotification(25, 25);

    // Start a separate thread to wait for GC notifications
    Task.Run(()=>WaitForGCThread(null)); 

    Console.WriteLine("Press any key to exit");
    while (!Console.KeyAvailable)
    {
      try
      {
        arrays.Add(new byte[ArrSize]);
      }
      catch (OutOfMemoryException)
      {
        Console.WriteLine("OutOfMemoryException!");
        arrays.Clear();
      }
    }

    GC.CancelFullGCNotification();
  }

  private static void WaitForGCThread(object arg)
  {
    const int MaxWaitMs = 10000;
    while (true)
    {
      // There is also an overload of WaitForFullGCApproach 
      // that waits indefinitely
      GCNotificationStatus status =    
                     GC.WaitForFullGCApproach(MaxWaitMs);
      bool didCollect = false;
      switch (status)
      {
        case GCNotificationStatus.Succeeded:
          Console.WriteLine("GC approaching!");
          Console.WriteLine(
             "-- redirect processing to another machine -- ");
          didCollect = true;
          GC.Collect();
          break;
        case GCNotificationStatus.Canceled:
          Console.WriteLine("GC Notification was canceled");
          break;
        case GCNotificationStatus.Timeout:
          Console.WriteLine("GC notification timed out");
          break;
      }

      if (didCollect)
      {
        do
        {
          status = GC.WaitForFullGCComplete(MaxWaitMs);
          switch (status)
          {
            case GCNotificationStatus.Succeeded:
              Console.WriteLine("GC completed");
              Console.WriteLine(
              "-- accept processing on this machine again --");
              break;
            case GCNotificationStatus.Canceled:
              Console.WriteLine(
                  "GC Notification was canceled");
              break;
            case GCNotificationStatus.Timeout:
              Console.WriteLine(
                  "GC completion notification timed out");
              break;
          }
          // Looping isn't necessary, but it is useful if you want
          // to check other state before waiting again.
        } while (status == GCNotificationStatus.Timeout);        
      }
    }
  }
}

Another possible reason is to compact the large object heap, but you could trigger this based on memory usage instead, which may be more appropriate.

Use Weak References For Caching

Weak references are references to an object that still allow the garbage collector to clean up the object. They are in contrast to the default strong references, which prevent collection completely (for that object). They are mostly useful for caching expensive objects that you would like to keep around, but are willing to let go if there is enough memory pressure. Weak references are a core CLR concept that are exposed through a couple of .NET classes:

  • WeakReference
  • WeakReference<T>

You should ignore the first one in favor of the generic version that was introduced in .NET 4.5. The non-generic version has API weaknesses that are resolved in the newer version, and I will only discuss the generic version here.

An example of a simple usage:

// The underlying Foo object can be garbage collected at any time!
WeakReference<Foo> weakRef = new WeakReference(new Foo());
...
// Create a strong reference to the object, 
// now no longer eligible for GC
Foo myFoo;
if (weakRef.TryGetTarget(out myFoo))
{
    ...
}

Note that the reference to the WeakReference<T> object itself is strong, which means that it will not be collected out from under you—it is only the underlying target object that is weakly referenced. If you are memory-conscious enough to use WeakReference<T> then you might rightly be leery of continually allocating new WeakReference<T> objects. Thankfully, you can reuse these wrapper objects by using the SetTarget method to replace the underlying value as needed.

You can still have other references, both strong and weak, to the same object. Collection will only happen if the only references to it are weak (or non-existent).

Most applications do not need to use weak references at all, but there are some criteria that may indicate good usage:

  • Memory use needs to be tightly restricted (such as on mobile devices).
  • Object life time is highly variable. If your object lifetime is predictable, you can just use strong references and control their life time directly.
  • Objects are large, but easy to create. Weak references are ideal for objects that would be nice to have around, but if they are not, you can easily regenerate them or do without. (Note that this also implies that the object size should be significant compared to the overhead of using the additional WeakReference<T> objects in the first place.)
  • You need secondary indexes for objects. See the example below.

Following are two examples of using WeakReference<T> for efficient caching.

Example: HybridCache

A good way to use WeakReference<T> is as part of a cache. Objects start out held through a strong reference, but after enough time of not being used (or some other criteria of your choosing), they can be demoted to being held by weak references, which may eventually disappear through garbage collection.

This example shows a simple cache that internally manages two levels of caches.

public class HybridCache<TKey, TValue>  where TValue : class
{
    class ValueContainer<T>
    {
        public T value;
        public long additionTime;
        public long demoteTime;
    }

    private readonly TimeSpan maxAgeBeforeDemotion;

    // Values live here until they hit their maximum age
    private readonly ConcurrentDictionary<TKey, 
                                          ValueContainer<TValue>> 
      strongReferences =
        new ConcurrentDictionary<TKey, ValueContainer<TValue>>();

    // Values are moved here after they hit their maximum age
    private readonly ConcurrentDictionary<
        TKey, 
        WeakReference<ValueContainer<TValue>>> 
      weakReferences =
        new ConcurrentDictionary<
          TKey, 
          WeakReference<ValueContainer<TValue>>>();

    public int Count
    {
        get 
        { 
            return this.strongReferences.Count; 
        }
    }

    public int WeakCount
    {
        get
        {
            return this.weakReferences.Count;
        }
    }

    public HybridCache(TimeSpan maxAgeBeforeDemotion)
    {
        this.maxAgeBeforeDemotion = maxAgeBeforeDemotion;
    }

    public void Add(TKey key, TValue value)
    {
        RemoveFromWeak(key);
        var container = new ValueContainer<TValue>();
        container.value = value;
        container.additionTime = Stopwatch.GetTimestamp();
        container.demoteTime = 0;
        this.strongReferences.AddOrUpdate(
          key, 
          container, 
          (k, existingValue) => container);
    }

    private void RemoveFromWeak(TKey key)
    {
        WeakReference<ValueContainer<TValue>> oldValue;
        weakReferences.TryRemove(key, out oldValue);
    }

    public bool TryGetValue(TKey key, out TValue value)
    {
        value = null;
        ValueContainer<TValue> container;
        if (this.strongReferences.TryGetValue(key, out container))
        {
            AttemptDemotion(key, container);
            value = container.value;
            return true;
        }

        WeakReference<ValueContainer<TValue>> weakRef;
        if (this.weakReferences.TryGetValue(key, out weakRef))
        {
            if (weakRef.TryGetTarget(out container))
            {
                value = container.value;
                return true;
            }
            else
            {
                RemoveFromWeak(key);
            }
        }
        return false;
    }

    /// <summary>
    /// Call this method periodically from another thread.
    /// </summary>
    public void DemoteOldObjects()
    {
        var demotionList = 
          new List<KeyValuePair<TKey, 
                                ValueContainer<TValue>>>();

        long now = Stopwatch.GetTimestamp();

        foreach (var kvp in this.strongReferences)
        {
            var age = CalculateTimeSpan(kvp.Value.additionTime, 
                                        now);
            if (age > this.maxAgeBeforeDemotion)
            {
                demotionList.Add(kvp);
            }
        }

        foreach (var kvp in demotionList)
        {
            Demote(kvp.Key, kvp.Value);
        }
    }

    private void AttemptDemotion(TKey key, 
                                 ValueContainer<TValue> container)
    {
        long now = Stopwatch.GetTimestamp();
        var age = CalculateTimeSpan(container.additionTime, now);
        if (age > this.maxAgeBeforeDemotion)
        {
            Demote(key, container);
        }
    }

    private void Demote(TKey key, 
                        ValueContainer<TValue> container)
    {
        ValueContainer<TValue> oldContainer;
        this.strongReferences.TryRemove(key, out oldContainer);
        container.demoteTime = Stopwatch.GetTimestamp();
        var weakRef = 
          new WeakReference<ValueContainer<TValue>>(container);
        this.weakReferences.AddOrUpdate(key, 
                                        weakRef, 
                                        (k, oldRef) => weakRef);
    }

    private static TimeSpan CalculateTimeSpan(long offsetA, 
                                              long offsetB)
    {
        long diff = offsetB - offsetA;
        double seconds = (double)diff / Stopwatch.Frequency;
        return TimeSpan.FromSeconds(seconds);
    }
}

Example: Secondary Index

This example uses weak references to make updates to a simple database more efficient by avoiding immediate, potentially expensive index updates.

class Person
{
    public string Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public DateTime Birthday { get; set; }
}

class PersonDatabase
{
    private Dictionary<string, Person> index = 
      new Dictionary<string, Person>();
    private Dictionary<DateTime, 
                       List<WeakReference<Person>>> 
                         birthdayIndex =
      new Dictionary<DateTime, List<WeakReference<Person>>>();

    public bool NeedsIndexRebuild { get; private set; }

    public void AddPerson(Person person)
    {
        this.index[person.Id] = person;
        List<WeakReference<Person>> birthdayList;
        if (!this.birthdayIndex.TryGetValue(person.Birthday, 
                                            out birthdayList))
        {
            birthdayIndex[person.Birthday] 
             = birthdayList 
             = new List<WeakReference<Person>>();
        }

        birthdayList.Add(new WeakReference<Person>(person));
    }

    public void RemovePerson(string id)
    {
        index.Remove(id);
    }

    public bool TryGetById(string id, out Person person)
    {
        return this.index.TryGetValue(id, out person);
    }

    public bool TryGetByBirthday(DateTime birthday, 
                                 out List<Person> people)
    {
        people = null;
        List<WeakReference<Person>> weakPeople;
        if (this.birthdayIndex.TryGetValue(birthday, 
                           out weakPeople))
        {
            var list = new List<Person>(weakPeople.Count);
            foreach(var wp in weakPeople)
            {
                Person person;
                if (wp.TryGetTarget(out person))
                {
                    list.Add(person);
                }
                else
                {
                    // we got a null reference -- 
                    // we need to rebuild the indexes
                    this.NeedsIndexRebuild = true;
                }
            }
            if (list.Count > 0)
            {
                people = list;
                return true;
            }                
        }
        return false;
    }        
}

Object Resurrection

There is an overload of WeakReference<T>’s constructor that takes a Boolean value called trackResurrection:

WeakReference<MyObject> weakRef = 
  new WeakReference<MyObject>(myObj, trackResurrection: true);

Resurrection is when you do something like this in a class’s finalizer:

class MyObject
{
    static MyObject myObj;

    ~MyObject()
    {
        myObj = this;
    }
}

By doing this, you are taking an object that had no more references to it (hence why the finalizer ran) and re-adding a reference to it. This technique is sometimes used in advanced caching scenarios, but it has a number of drawbacks:

  • The object has already been promoted to gen 1 by the garbage collector, and will never be demoted to an earlier generation.
  • You must call GC.ReRegisterForFinalizer on an object or the finalizer will not run again for it.
  • The state of the object can be indeterminate. Objects with native resources will have released them and they will need to be reinitialized. It can be tricky working through the exact object state.
  • Any objects that the resurrected object refers to are also resurrected. If any of those objects have finalizers, they will also have already run, making your state trickier.

You should just consider this technique a bug unless you really understand the state of the objects you are dealing with. There are better ways to reuse objects.

If you do use it, then you can tell WeakReference<T> to allow longer access to the underlying object. If your object does not have a finalizer, then this parameter has no effect.

Dynamically Allocate on the Stack

Instead of allocating memory from the heap, it is possible to allocate dynamically sized buffers on the stack using stackalloc. Such allocations are faster than heap allocations and incur no garbage collection. However, there are some significant caveats:

  • Doing so is explicitly unsafe. Instead of a managed array, you receive back a pointer to the beginning of the buffer and you are responsible for ensuring you do not exceed its bounds.
  • You are severely limited in how much data you can allocate. Managed stacks are typically limited to 1 megabyte in size (or as little as 256KB in ASP.NET/IIS). Each stack frame takes some of that, and some frameworks can have very deep stacks.

To demonstrate how stackalloc works, see the StackAlloc sample program, which contains this code:

private static unsafe void DoStackAlloc(int size)
{
    int* buffer = stackalloc int[size];
    for (int i = 0; i < size; i++)
    {
        buffer[i] = i;
    }
}

The rest of the program runs this code in a loop, asking for input for how much to allocate. A sample run looks like this:

Enter size to stackalloc ('q' to exit): 100
Allocated 100-size array
Enter size to stackalloc ('q' to exit): 200
Allocated 200-size array
Enter size to stackalloc ('q' to exit): 100000
Allocated 100000-size array
Enter size to stackalloc ('q' to exit): 1000000

Process is terminated due to StackOverflowException.

StackOverflowException has the notable distinction of being an exception that your program cannot catch. The sample code wraps the allocation in an exception handler, but to no avail. When this exception is thrown, your application will immediately exit. If you run under a debugger, however, it can catch it.

Despite the risks and limitations, stackalloc is a valuable tool when you want small, dynamically sized arrays in your methods without the overhead of a heap allocation.

Investigating Memory and GC

In this section, you will learn many tips and techniques to investigate what is happening on the GC heap. In many cases, multiple tools can give you the same information. I will endeavor to describe the use of a few in each scenario, where applicable.

Performance Counters

.NET supplies a number of Windows performance counters, all in the .NET CLR Memory category. All of these counters except for Allocated Bytes/sec are updated at the end of a collection. If you notice values getting stuck, it is likely because collections are not happening very often.

  • # Bytes in all Heaps: Sum of all heaps, except gen 0 (see the description for Gen 0 heap size).
  • # GC Handles: Number of handles in use.
  • # Gen 0 Collections: Cumulative number of gen 0 collections since process start. Note that this counter is also incremented for gen 1 and 2 collections because higher generation collections always imply collections of the lower generations as well.
  • # Gen 1 Collections: Cumulative number of gen 1 collections since process start. Note that this counter is incremented for gen 2 collections as well because a gen 2 collection implies a 1 collection.
  • # Gen 2 Collections: Cumulative number of gen 2 collections since process start.
  • # Induced GC: Number of times GC.Collect was called to explicitly start garbage collection.
  • # of Pinned Objects: Number of pinned objects the garbage collector observes during collection.
  • # of Sink Blocks in use: Each object has a header that can store limited information, such as a hash code, or synchronization information. If there is any contention for use of this header, a sync block is created. These blocks are also used for interop metadata. A high counter value here can indicate lock contention. Yes, this counter’s name is misspelled (look at the description in PerfMon).
  • # Total committed Bytes: Number of bytes the garbage collector has allocated that are actually backed by the paging file.
  • # Total reserved Bytes: Number of bytes reserved by garbage collector, but not yet committed.
  • % Time in GC: Percentage of time the processor has spent in the GC threads compared to the rest of the process since the last collection. This counter does not account for background GC.
  • Allocated Bytes/sec: Number of bytes allocated on a GC heap per second. This counter is not updated continuously, but only when a garbage collection starts.
  • Finalization Survivors: Number of finalizable objects that survived a collection because they are waiting for finalization (which only happens in gen 1 collections). Also, see the Promoted Finalization-Memory from Gen 0 counter.
  • Gen 0 heap size: Maximum number of bytes that can be allocated in gen 0, not the actual number of bytes allocated.
  • Gen 0 Promoted Bytes/Sec: The rate of promotion from gen 0 to gen 1. You want this number to be as low as possible, indicating short memory lifetimes.
  • Gen 1 heap size: Number of bytes in gen 1, as of the last garbage collection.
  • Gen 1 Promoted Bytes/Sec: Rate of promotion from gen 1 to gen 2. A high number here indicates memory having a very long lifetime, and good candidates for pooling.
  • Gen 2 heap size: Number of bytes in gen 2, as of the last garbage collection.
  • Large Object Heap Size: Number of bytes on the large object heap.
  • Promoted Finalization–Memory from Gen 0: Total number of bytes that were promoted to gen 1 because an object somewhere in their tree is awaiting finalization. This is not just the memory from finalizable objects directly, but also the memory from any references those objects hold.
  • Promoted Memory from Gen 0: Number of bytes promoted from gen 0 to gen 1 at the last collection.
  • Promoted Memory from Gen 1: Number of bytes promoted from gen 1 to gen 2 at the last collection.

ETW Events

The CLR publishes numerous events about GC behavior. In most cases, you can rely on the tools to analyze these in aggregate for you, but it is still useful to understand how this information is logged in case you need to track down specific events and relate them to other events in your application. You can examine these in detail in PerfView with the Events view. Here are some of the most important:

  • GCStart_V1: Garbage collection has started. Fields include:
    • Count: The number of collections that have occurred since the process began.
    • Depth: Which generation is being collected.
    • Reason: Why the collection was triggered.
    • Type: Blocking, background, or blocking during background.
  • GCEnd_V1: Garbage collection has ended. Fields include:
    • Count, Depth: Same as for GCStart.
  • GCHeapStats_V1: Shows stats at the end of a garbage collection.
    • There are many fields, describing all aspects of the heap such as generation sizes, promoted bytes, finalization, handles, and more.
  • GCCreateSegment_V1: A new segment was created. Fields include:
    • Address: Address of the segment
    • Size: Size of the segment
    • Type: Small or large object heap
  • GCFreeSegment_V1: A segment was released. Just one field:
    • Address: Address of the segment
  • GCAllocationTick_V2: Emitted every time about 100KB (cumulative) has been allocated. Fields include:
    • AllocationSize: Exact size of the allocation that triggered the event.
    • Kind: Small or large object heap allocation.
  • GCFinalizersBegin_V1: Finalizers are starting to run.
  • GCFinalizersEnd_V1: Finalizers are done running.
    • Count: Number of finalizers executed.
  • GCCreateConcurrentThread_V1: A concurrent garbage collection thread was created.
  • GCTerminateConcurrentThread_V1: A concurrent garbage collection thread was terminated.
  • GCSuspendEE_V1: Threads are starting to suspend.
    • Reason: Why the suspension was initiated.
    • Count: GC Count at the time of this event.
  • GCSuspendEEEnd_V1: Threads are done suspending.
  • GCRestartEEBegin_V1: Threads start resuming.
  • GCRestartEEEnd_V1: Threads are done resuming.

The order of events that are received is important. For a normal, foreground GC of any generation, the sequence is:

  1. GCSuspendEE_V1: Start suspending threads.
  2. GCSuspendEEEnd_V1: All threads are suspended.
  3. GCStart_V1: GC begins.
  4. GCEnd_V1: GC work done.
  5. GCRestartEEBegin_V1: Start resuming threads.
  6. GCRestartEEEnd_V1: All threads resumed. GC complete.

If you want to analyze these events in your own applications or utilities, see the sections on TraceEvent and PerfView in Chapter 8 for an easy-to-use library. Through judicious use and analysis of ETW events you can detect whether operations in your application are being affected by GC (or any other type of external influence).

What Does My Heap Look Like?

WinDbg can give you a few different views of the heap. First, by segment:

!eeheap -gc

The output will look something like this:

Number of GC Heaps: 1
generation 0 starts at 0x05824e2c
generation 1 starts at 0x0532100c
generation 2 starts at 0x05321000
ephemeral segment allocation context: none
 segment     begin  allocated      size
05320000  05321000  05891ff4  0x570ff4(5705716)
Large object heap starts at 0x06321000
 segment     begin  allocated      size
06320000  06321000  07312c80  0xff1c80(16718976)
07900000  07901000  088ee660  0xfed660(16701024)
08a30000  08a31000  09a1e660  0xfed660(16701024)
09c80000  09c81000  0ac6e660  0xfed660(16701024)
0ac80000  0ac81000  0bc6e540  0xfed540(16700736)
...more segments...
Total Size:              Size: 0x213b9d94 (557555092) bytes.
------------------------------
GC Heap Size:    Size: 0x213b9d94 (557555092) bytes.

If the process is running server GC, there will be more than one heap, each with their own set of ephemeral, gen 2, and large object segments.

Another view is provided with the !HeapStat command, which aggregates across all segments to break down the sizes of each generation, including free space.

0:007> !HeapStat
Heap        Gen0      Gen1    Gen2          LOH
Heap0     446920   5258784      12    551849376

Free space:                                    Percentage
Heap0         12      1948       0        15936SOH:  0% LOH:  0%

This output shows that there is very little in the gen 2 heap, and there is an insignificant amount of free space (fragmentation) on the heap. The letters SOH mean “Small Object Heap”, which means every segment other than the Large Object Heap segments.

The !VMMap command shows information about virtual address regions and the levels of protection applied to them:

0:000> !VMMap
Start    Stop     Length    AllocProtect  Protect    State    Type    
00000000-00f5ffff 00f60000                NA         Free             
00f60000-00f60fff 00001000  ExWrCp        Rd         Commit   Image   
00f61000-00f61fff 00001000  ExWrCp                   Reserve  Image   
00f62000-00f62fff 00001000  ExWrCp        Rd         Commit   Image   
00f63000-00f63fff 00001000  ExWrCp                   Reserve  Image   
00f64000-00f64fff 00001000  ExWrCp        Rd         Commit   Image   
00f65000-00f65fff 00001000  ExWrCp                   Reserve  Image   
00f66000-00f66fff 00001000  ExWrCp        Rd         Commit   Image   
00f67000-00f67fff 00001000  ExWrCp                   Reserve  Image   
...

The !VMStat command will take that information and summarize it by State:

0:000> !VMStat
TYPE         MINIMUM     MAXIMUM     AVERAGE  BLK COUNT       TOTAL
====         =======     =======     =======  =========       =====
Free:
Small             8K         64K         43K         30      1,315K
Medium           84K        996K        332K         10      3,323K
Large         1,152K  2,090,816K    204,209K         17  3,471,563K
Summary           8K  2,090,816K     60,986K         57  3,476,203K

Reserve:
Small             4K         64K         34K         34      1,183K
Medium           68K      1,012K        299K         56     16,779K
Large         1,376K     32,768K     12,073K          7     84,515K
Summary           4K     32,768K      1,056K         97    102,479K

Commit:
Small             4K         64K         12K        204      2,575K
Medium           68K        964K        347K         44     15,307K
Large         1,048K     16,332K     12,716K         47    597,671K
Summary           4K     16,332K      2,086K        295    615,555K

Private:
Small             4K         64K         19K         88      1,716K
Medium           68K      1,012K        285K         57     16,267K
Large         1,376K     32,768K     15,215K         41    623,851K
Summary           4K     32,768K      3,450K        186    641,835K

Mapped:
Small             4K           64K         25K        8        204K
Medium           68K        1,004K        374K        6      2,247K
Large         1,540K       18,320K      5,442K        5     27,211K
Summary           4K       18,320K      1,561K       19     29,663K

Image:
Small             4K           64K         12K      142      1,839K
Medium           68K          964K        366K       37     13,571K
Large         1,048K       15,712K      3,890K        8     31,124K
Summary           4K       15,712K        248K      187     46,535K

The SysInternals tool VMMap can also give you a good summary of all the segments in a process. Once you have selected the process, highlight the Managed Heap in the table, and you will see a list of all segments in the process.

VMMap can break down all the various memory regions in a process, including GC heap segments.
VMMap can break down all the various memory regions in a process, including GC heap segments.

How Long Does a Collection Take?

The GC records many events about its operation. You can use PerfView to examine these events in a very efficient way.

To see statistics on GC, start the AllocateAndRelease sample program.

Start PerfView and follow these steps:

  1. Menu Collect | Collect (Alt+C).
  2. Expand Advanced Options. You can optionally turn off all event categories except GC Only, but for now just leave the default selection, as GC events are included in .NET events.
  3. Check No V3.X NGEN Symbols. (This will make symbol resolution faster.)
  4. Click Start.
  5. Wait for a few minutes while it measures the process’s activity. See Chapter 1 for a discussion of how many samples you need to collect. (If collecting for more than a few minutes, you may want to turn off CPU events.)
  6. Click Stop Collection.
  7. Wait for the files to finish merging.
  8. In the resulting view tree, double-click on the GCStats node, which will open up a new view.
  9. Find the section for your process.

For each process, you will find a list of data points and a set of tables summarizing GC behavior.

At the top of each section is a list of items describing the overall information.

GC Trace Summary Item Description
CommandLine The exact command that executed the process
Runtime Version The version of the CLR that is executing
CLR Startup Flags Flags controlling the behavior of the GC, such as CONCURRENT_GC or SERVER_GC
Total CPU Time Total time, in milliseconds, taken by the process during the profile
Total GC CPU Time Total time, in milliseconds, spent doing garbage collection
Total Allocs Amount of allocation you have done
GC CPU MSec/MB Alloc How much time, in milliseconds, the GC spent processing 1 MB of memory
Total GC Pause Amount of time, in milliseconds, the process was paused for GC
% Time paused for Garbage Collection GC pause time, expressed as a percent of total CPU time
% CPU Time spent Garbage Collecting CPU time can be different than % Time if you are running server GC
Max GC Heap Size Maximum size of the GC heap during profiling
Peak Process Working Set Maximum size of the working set during profiling
Peak Virtual Memory Usage Maximum virtual memory reserved during profiling

Below that, you will find a table summarizing all the generations of GC.

The GCStats table for the AllocateAndRelease sample program. This shows you the number of GCs that occurred as well as interesting stats like the mean/max pause times, and allocation rates.
The GCStats table for the AllocateAndRelease sample program. This shows you the number of GCs that occurred as well as interesting stats like the mean/max pause times, and allocation rates.
GC Summary Info Column Description
Gen Generation, including ALL, which aggregates all GCs into a single set of stats.
Count Number of collections.
Max Pause Longest time, in milliseconds, that GC was paused.
Max Peak MB Maximum size of the generation on the heap.
Max Alloc MB/sec Peak allocation rate.
Total Pause Sum of all pause times, in milliseconds.
Total Alloc MB Amount of memory allocated.
Alloc MB/MSec GC Amount of memory allocated per millisecond of GC time. This is a measure of GC efficiency. Higher numbers mean a more effective (or less intrusive) GC.
Survived MB/MSec GC Amount of memory that survives a GC, per millisecond of GC time. This is another measure of GC efficiency. Higher numbers mean more memory is surviving.
Mean Pause Average pause time, in milliseconds.
Induced Count of explicit GC invocations (GC.Collect).

Below this table, you will find even more detailed tables listing specific GC instances in various categories, such as “Pauses > 200 MSec”, “LOH Allocation Pause (due to background GC) > 200 MSec”, “Gen 2”, and “All GC Events”.

GC Details Column Description
GC Index The order in which the GC occurred
Pause Start Time stamp, in milliseconds, from when the profile started, of when the GC occurred
Trigger Reason Reason the GC happened.
Gen Generation and letter code indicating the type of GC. Gen is 0-2. N=NonConcurrent, B=Background, F=Foreground, I=Induced, i=induced, not forced.
Suspend Msec The number of milliseconds required to suspend running threads
Pause Msec Total time process is paused for GC
% Pause Time % of time in GC, since previous GC
% GC % of CPU time used by GC
Gen0 Alloc MB Amount allocated since previous GC
Gen0 Alloc Rate MB/sec Allocation rate since previous GC
Peak MB Peak size of heap during GC
After MB Size of heap after GC is complete
Ratio Peak/After Efficiency, higher is better
Promoted MB Amount of memory that survived the GC
Gen0 MB Gen 0 size after this GC is complete
Gen0 Survival Rate % % of objects in gen 0 that survived GC
Gen 0 Frag % % of gen 0 that is free space
Gen 1 MB Gen 1 size after this GC is complete
Gen1 Survival Rate % % of objects in gen 1 that survived GC
Gen1 Frag % % of generation 1 that is free space
Gen2 MB Gen 2 size after this GC is complete
Gen2 Survival Rate % % of objects in gen 2 that survived GC
Gen2 Frag % % of gen 2 that is free space
LOH MB LOH size after this GC is complete
LOH Survival Rate % % of objects on LOH that survived GC
LOH Frag % % of the LOH that is free space
Finalizable Surv MB Finalizable object size that survived GC
Pinned Obj # of pinned objects this GC promoted. Fewer is better.

As you can see, there is a wealth of information with each GC event which you can use to analyze GC performance.

Where Are My Allocations Occurring?

Visual Studio can track .NET memory allocations via ETW sampling. Note that this is completely different than the Memory Usage profiler. That report is essentially a heap dump analyzer, which shows static snapshots of the objects on the heap and their ownership references, back to the root. The .NET memory allocation report in Performance Wizard uses ETW events to track which methods are actually doing the allocation, regardless of who ends up holding onto the references. It uses the GCAllocationTick_V2 ETW event that the CLR emits every 100KB of allocations.

Memory Profiling Report, which shows which methods allocate the most, as well as which types take up the most memory.
Memory Profiling Report, which shows which methods allocate the most, as well as which types take up the most memory.

Clicking on a method name will again take you to the familiar Function Details view. Just remember that you are looking at memory allocations rather than CPU time.

Memory allocation function details, showing which called methods, as well as source lines, are responsible for allocations.
Memory allocation function details, showing which called methods, as well as source lines, are responsible for allocations.

This report has many other views to drill-down along different dimensions. The Allocation view in particular is interesting. It is what is shown when you click on a type name in the main summary view.

A summary of allocations by type, rather than method stack.
A summary of allocations by type, rather than method stack.

This view aggregates by type and shows you which methods contribute to their allocations most frequently.

Another option is PerfView, which can show you the same information as Visual Studio, and much more, though the interface is not quite as polished.

  1. With PerfView, collect either .NET or just GC Only events.
  2. Once completed, open the GC Heap Alloc Stacks view and select the desired process from the process list. (For a simple example, use the AllocateAndRelease sample program) from the process list.)
  3. On the By Name tab, you will see types sorted in order of total allocation size. Double-clicking a type name will take you to the Callers tab, which shows you the stacks that made the allocations.
The GC Heap Alloc Stacks view shows the most common allocations in your process. The LargeObject entry is a pseudo node; double-clicking on it will reveal the actual objects allocated on the LOH.
The GC Heap Alloc Stacks view shows the most common allocations in your process. The LargeObject entry is a pseudo node; double-clicking on it will reveal the actual objects allocated on the LOH.

See Chapter 1 for more information on using PerfView’s interface to get the most out of the view.

Using the above information, you should be able to find the stacks for all the allocations that occur in the test program, and their relative frequency. For example, in my trace, string allocation accounts for roughly 59.5% of all memory allocations.

You can also use CLR Profiler to find this information and display it in a number of ways.

Once you have collected a trace and the Summary window opens, click on the Allocation Graph button to open up a graphical trace of object allocations and the methods responsible for them.

CLR Profiler’s visual depiction of object allocation stacks quickly points you to objects you need to be most concerned about.
CLR Profiler’s visual depiction of object allocation stacks quickly points you to objects you need to be most concerned about.

The most frequently allocated objects are also the ones most likely to be triggering garbage collections. Reduce these allocations and the rate of GCs will go down.

What Are All The Objects On The Heap?

All of the tools in this section will use the LargeMemoryUsage sample program, reproduced here:

class Program
{
  const int ArraySize = 1000;
  static object[] staticArray = new object[ArraySize];

  static void Main(string[] args)
  {
    var localArray = new object[ArraySize];

    var rand = new Random();
    for (int i = 0; i < ArraySize; i++)
    {
      staticArray[i] = GetNewObject(rand.Next(0, 4));
      localArray[i] = GetNewObject(rand.Next(0, 4));
    }

    Console.WriteLine("Examine heap now. Press any key to exit.");
    Console.ReadKey();
    
    // This will prevent localArray from being 
    // garbage collected before you take the snapshot
    Console.WriteLine(staticArray.Length);
    Console.WriteLine(localArray.Length);
  }

  private static Base GetNewObject(int type)
  {
    Base obj = null;
    switch (type)
    {
      case 0: obj = new A(); break;
      case 1: obj = new B(); break;
      case 2: obj = new C(); break;
      case 3: obj = new D(); break;
    }
    return obj;
  }
}

class Base
{
  private byte[] memory;
  protected Base(int size) { this.memory = new byte[size]; }
}

class A : Base { public A() : base(1000) { } }
class B : Base { public B() : base(10000) { } }
class C : Base { public C() : base(100000) { } }
class D : Base { public D() : base(1000000) { } }

This simple program just allocated random amounts of different classes and waits for you to analyze the heap before exiting.

There are a number of ways to analyze this heap, starting with a very low level.

Using WinDbg, you could execute the !DumpHeap command to just dump a list of every single object on the heap:

0:007> !DumpHeap
 Address       MT     Size
02aa1000 00b2ac70       10 Free
02aa100c 00b2ac70       10 Free
02aa1018 00b2ac70       10 Free
02aa1024 71911eac       84     
02aa1078 71912000       84     
02aa10cc 71912044       84     
02aa1120 71912088       84     
02aa1174 719120cc       84     
02aa11c8 719120cc       84     
02aa121c 71912104       12     
02aa1228 71911d64       14

The MT column specifies the address of the Method Table, which is essentially equivalent to the class.

You can dump a specific object to get its information:

0:007> !DumpObj /d 02bb8cf4
Name:        LargeMemoryUsage.A
MethodTable: 00de4f6c
EEClass:     00de196c
Size:        12(0xc) bytes
File:        D:SampleCode...LargeMemoryUsage.exe
Fields:
      MT   Field Offset          Type VT     Attr    Value Name
719160e8 4000003      4 System.Byte[]  0 instance 02bb8d00 memory

Dumping every object in the heap will usually be overwhelming. Thankfully, you can filter the output a bit, such as by type:

0:007> !DumpHeap -type LargeMemoryUsage.A
 Address       MT     Size
02aaba98 00de4f6c       12     
02ab82cc 00de4f6c       12     
02ab86cc 00de4f6c       12     
02ab8acc 00de4f6c       12 

If you have a heap range, you can filter the output to only objects within a specified range:

!DumpHeap -type LargeMemoryUsage.A 02aaba98 02ab86cc
DumpHeap Parameter Description
-min Display objects at least the given size.
-max Display objects at most the given size.
-startatLowerBound Start scanning the heap at the specified address (must be the address of an object).
-type Does a substring match of the argument against the type name.
-mt Displays only objects with the given method table address. This is a more precise way to get output of a specific type of objects compared to -type, which can match on different types.
-short Outputs object addresses only.
-strings Displays a summary of strings in the heap.
-stat Only displays the statistical summary.

While there is a rudimentary scripting language in WinDbg, doing advanced heap analysis can be difficult. Another option is to use CLR MD to analyze the objects.

private static void PrintGen0Objects(ClrRuntime clr)
{    
    var heap = clr.Heap;
    
    foreach(var obj in heap.EnumerateObjects())
    {    
        Console.WriteLine($"0x{obj.Address:x} - {obj.Type.Name}");
    }
}

Because you have programmatic access to the same properties of an object as in WinDbg, you can filter by the same criteria, or even come up with more complex criteria to find and analyze objects.

So far, we have looked at ways to analyze each discrete object. That certainly comes in handy while debugging, but often when we are analyzing overall behavior, we want to consider all of the objects in aggregate.

Starting with Visual Studio 2013 Premium Edition (Enterprise Edition starting in Visual Studio 2015), there is a managed heap analyzer. You can access it after opening a managed memory dump by selecting “Debug Managed Memory”.

Higher editions of Visual Studio include this heap analysis view, which works from managed memory dumps.
Higher editions of Visual Studio include this heap analysis view, which works from managed memory dumps.

From here, you can do three things:

  1. See all instances of that type (double-click on the type name)
  2. See the various roots of the objects (what is keeping them in memory)
  3. See what other types are referenced by the highlighted type

There is also a feature in the Performance Profiler to get heap snapshots during runtime. To access this, go to the Analyze | Performance Profiler, then select Memory Usage. The output is a graph of memory usage against garbage collections. While the analysis is running, you can take snapshots of the heap whenever you want.

Memory usage graph, showing aggregate memory usage, GCs, and heap snapshots.
Memory usage graph, showing aggregate memory usage, GCs, and heap snapshots.

Clicking the size or object count in a snapshot will take you to a table of all the objects on the heap, and their paths to the root (what is keeping them alive).

Visual Studio’s heap snapshot view. This shows the various paths to root of each object type in aggregate, or with specific instances.
Visual Studio’s heap snapshot view. This shows the various paths to root of each object type in aggregate, or with specific instances.

Each snapshot also allows you to see just the objects that changed from the previous snapshot, helping you analyze allocations over time.

These options give you a fairly basic, but useful overview of your heap. If you need more analytical power, then I recommend PerfView. PerfView will not show you individual objects, but its ability to show aggregated object relationships is unparalleled.

To use this feature in PerfView:

  1. From the Memory menu, select Take Heap Snapshot. Note that this does not pause the process (unless you check the option to Freeze), but it does have a significant impact on the process’s performance.
  2. Highlight the desired process in the resulting dialog.
  3. Click Dump GC Heap.
  4. Wait until collection is finished, then close the window.
  5. Open the file from the PerfView file tree (it may automatically open for you when you close the collection window).

You should see a table like this:

A PerfView trace of the largest objects in the heap.
A PerfView trace of the largest objects in the heap.

It tells you immediately that D accounts for 88% of the program’s memory at 462 MB with 924 objects. You can also see local variables are holding on to 258 MB of memory and the staticArray object is holding onto 263 MB of memory.

PerfView is somewhat unique in that you can control how the sub-objects contribute to the size of their parent objects. This is done with the folding configuration. You can specify a folding percentage, below which all memory is attributed to the parent object, or a folding pattern to specify that certain object types are always folded into their parent objects (they effectively disappear from analysis). See Chapter 1 for more details on how to use PerfView.

You can also get a graphical view of the same information with CLR Profiler. While the program is running, click the Show Heap Now button to capture a heap sample.

CLR Profiler shows you some of the same information as PerfView, but in a graphical format.
CLR Profiler shows you some of the same information as PerfView, but in a graphical format.

Where Is the Memory Leak?

There are many ways memory can leak, and all of the sections under “Investigating Memory and GC” in this chapter can help you narrow problems down, but there are a few general ways memory can leak in managed applications:

  • Unexpected references are being held for objects, prohibiting the garbage collector from cleaning them up.
  • Full garbage collections are rare, keeping a lot of memory in the process, but otherwise unused and unrooted. This is not usually a problem, and somewhat by design.
  • There is high fragmentation, especially of the gen 2 or large object heap.
  • A high number of pinned objects are preventing efficient collections. See later in this chapter for diagnosing extreme pinning situations.

In Visual Studio (Premium or Enterprise editions), you can open up the heap dump and debug the managed heap. When you click on a type, the tabs below will allow you to see which other types are referencing those objects.

You can use PerfView for more detailed analysis:

  1. In the Memory menu, select Take Heap Snapshot.
  2. In the resulting dialog box, select the process to analyze and click Dump GC Heap. You can optionally freeze the process or force a GC to occur before collecting the snapshot.
  3. Once the snapshot is done, click Close.
PerfView’s Heap Snapshot dialog samples the managed heap for easy analysis.
PerfView’s Heap Snapshot dialog samples the managed heap for easy analysis.

Once the snapshot is completed, a file will show up in the left-hand pane. Double-click this file to open a view of the types in the heap. You can manipulate this view as any of the other stack views in PerfView (e.g., with grouping, folding, and filtering). You can double-click the entry for a type, which switches to the Referred-From view.

PerfView shows the ownership of the objects on the heap, allowing easy leak analysis.
PerfView shows the ownership of the objects on the heap, allowing easy leak analysis.

This view clearly shows that the D objects belong to the staticArray variable and a local variable (those lose their names during compilation).

You can generally get a good sense for what is on the heap from this view. If you take two dumps separated in time, then you can use the Diff menu to calculate a difference between the two snapshots. This can give you an idea for what is accumulating uncollected, if anything.

If you open two heap snapshots, you can compare them to get a view that shows differences.
If you open two heap snapshots, you can compare them to get a view that shows differences.
The Heap Diff view is the same as all stack views in PerfView, but the numbers will represent the difference of the target snapshot with the baseline.
The Heap Diff view is the same as all stack views in PerfView, but the numbers will represent the difference of the target snapshot with the baseline.

Visual Studio and PerfView are mostly useful for aggregate analysis. PerfView is a sampling profiler, even when it analyzes the heap, so it will sometimes give a skewed picture of what the heap looks like. If you need to drill down onto a specific object, or get the absolute truth about the whole picture, then you need to start using the debugger or CLR MD.

In WinDbg, to get a quick summary of what is on the heap, run the !DumpHeap -stat command:

0:023> !DumpHeap -stat
...
71f718f8     8752       525120 System.Reflection.RuntimeMethodInfo
139e5424    15138       544968 System.Collections.Immutable.Sort...
71f7ffe4    11294       573796 System.Object[]
1370f7d0     4605       626280 Microsoft.VisualStudio.Compositio...
13707114     6190       990400 Microsoft.VisualStudio.Compositio...
1370f24c     5482      1227968 Microsoft.VisualStudio.Compositio...
71f8419c     4799      4684529 System.Byte[]
71f7fbf0   108732      8303452 System.String
00586810    30707     72014878      Free

It will produce a lot of output. I usually scroll to the end of the object summary to look at the largest consumers of heap space. (Note that after the object summary, it prints a list of objects that appear after free blocks—you want to scroll above that.)

If you do this a couple of times between letting the application run (and presumably leak), you can get a sense for what objects are taking up space. If you see the Free size increasing, that is an indication of either no collections happening or heap fragmentation. See later in this chapter for how to diagnose fragmentation.

The downside of WinDbg is that it is harder to get an overall picture of object ownership, especially for common objects like System.Byte[] or System.String. For this, use PerfView as described above.

If you want to analyze a single object, you will need to get its address first. To get the addresses of objects, use the !DumpStackObjects command, or use !DumpHeap to find objects of interest on heap, as in this example:

0:004> !DumpHeap -type LargeMemoryUsage.C
 Address     MT   Size
021b17f0 007d3954     12   
021b664c 007d3954     12   
...   

Statistics:
    MT  Count  TotalSize Class Name
007d3954    475     5700 LargeMemoryUsage.C
Total 475 objects

Once you have the object’s address you can use the !gcroot command:

0:003> !gcroot 02ed1fc0 
HandleTable:
  012113ec (pinned handle)
  -> 03ed33a8 System.Object[]
  -> 02ed1fc0 System.Random

Found 1 unique roots (run '!GCRoot -all' to see all roots).

!gcroot is often adequate, but it may miss some cases, in particular if your object is rooted from an older generation. For this, you will need to use the !findroots command.

In order for this command to work you first need to set a breakpoint in the GC, right before a collection is about to happen, which you can do by executing:

!findroots -gen 0
g

This sets a breakpoint right before the next gen 0 GC happens. It then loses effect and you will need to run the command again to break on the following GC.

Once the code breaks, you need to find the object you are interested in and execute this command with its address:

!findroots 027624fc

If the object is already in a higher generation than the current collection generation, you will see output like this:

Object 027624fc will survive this collection:
  gen(0x27624fc) = 1 > 0 = condemned generation.

If the object itself is in the current generation being collected, but it has roots from an older generation, you will see something like this:

older generations::Root:  027624fc (object)->
  023124d4(System.Collections.Generic.List`1
  [[System.Object, mscorlib]])

If that is too tedious, you can use build your own !gcroot command using CLR MD.

const string TargetType = "LargeMemoryUsage.D";

private static void PrintRootsOfObjects(ClrRuntime clr)
{
    PrintHeader("Roots of Object");

    Dictionary<ulong, ClrObject> childToParents = 
      new Dictionary<ulong, ClrObject>();
    var heap = clr.Heap;

    // Find an arbitrary object for demo purposes
    ClrObject targetObject = FindObjectOfType(clr, TargetType);

    if (targetObject.Address == 0)
    {
        Console.WriteLine(
          $"Could not find any objects of type {TargetType}");
        return;
    }

    // Analyze all objects, build up reference map
    foreach (var obj in heap.EnumerateObjects())
    {
        foreach (var objRef in obj.EnumerateObjectReferences())
        {
            childToParents[objRef.Address] = obj;
        }                
    }

    // Walk up the chain of references
    ClrObject currentObj = targetObject;
    int indentSize = 0;
    while(true)
    {
        Console.Write(new string(' ', indentSize));
        Console.WriteLine(
          $"0x{currentObj.Address:x} - {currentObj.Type.Name}");

        ClrObject parentObject;
        if (!childToParents.TryGetValue(currentObj.Address, 
                                        out parentObject))
        {
            break;
        }
        currentObj = parentObject;
        indentSize += 4;
    }
}

private static ClrObject FindObjectOfType(ClrRuntime clr, 
                                          string typeName)
{
    foreach (var obj in clr.Heap.EnumerateObjects())
    {                                
        if (obj.Type.Name == TargetType)
        {
            return obj;
        }
    }
    return new ClrObject();
}

This produces output similar to the following:

Roots of Object
===============
0x2e46bfc - LargeMemoryUsage.D
    0x2e43428 - System.Object[]

How Big Are My Objects?

Calculating an object’s size is a bit tricky. Do you mean the size of all the fields in that object? What if there is a reference to another object, such as an array–is that included? What if two objects both refer to each other?

Thankfully, most tools that show object size follow an algorithm that has these concepts:

  1. Exclusive size is the size of the object and all its fields. Referred objects are not included (but the size of the references to those objects, 4- or 8-bytes, are).
  2. Inclusive size is the size of the object and all of the objects referred to by that object.
  3. Object references are traced until they run out of references, or they touch an already-examined object. This avoids double-counting.

To get object sizes in Visual Studio, use the Memory Usage profiler:

  1. Go to the Analyze | Performance profiler… menu option (or Alt+F2).
  2. Select Memory Usage.
  3. Execute the target program.
  4. When desired, take a heap snapshot.
  5. End profiling, or the target program.
Visual Studio’s Memory Usage profiler can show aggregate and individual object sizes, including referenced objects.
Visual Studio’s Memory Usage profiler can show aggregate and individual object sizes, including referenced objects.

If you do not see the level of detail you expect, make sure that the table’s view options has turned off “Collapse Small Objects” and “Just My Code.”

In WinDbg, there are a couple SOS command that can show the same information.

The !DumpObj command can show exclusive size of an object:

0:007> !DumpObj /d 058e8230
Name:        LargeMemoryUsage.D
MethodTable: 035d4e74
EEClass:     035d1870
Size:        12(0xc) bytes
File:        D:HighPerformanceDotNetBook...LargeMemoryUsage.exe
Fields:
      MT   Field Offset            Type VT     Attr    Value Name
71b54080 4000003      4   System.Byte[]  0 instance 2a895510 memory

You can see that it does not take into account the owned byte array. For that, use the !ObjSize command:

0:007> !ObjSize 058e8230
sizeof(058e8230) = 1000028 (0xf425c) bytes (LargeMemoryUsage.D)

If you run !ObjSize without any parameters, it will show a list of all threads and GC handles, totaling up the size of objects rooted by each one.

0:007> !ObjSize
...
Thread 5580 (LargeMemoryUsage.Program.Main(System.String[]) 
  [D:HighPerformanceDotNetBook...Program.cs @ 29]): 
  ebp+1c: 012ff37c -> <exec cmd="!DumpObj /d 05383448">
  05383448</exec>: 283846000 (0x10eb2570) bytes (System.Object[])
...
Handle (pinned): 035b13ec -> 06383510: 286744176 (0x11175e70) bytes 
  (System.Object[])
Handle (pinned): 035b13f0 -> 06382500: 8864 (0x22a0) bytes 
  (System.Object[])
Handle (pinned): 035b13f4 -> 063822e0: 640 (0x280) bytes 
  (System.Object[])
Handle (pinned): 035b13f8 -> 0538121c: 12 (0xc) bytes 
  (System.Object)
Handle (pinned): 035b13fc -> 06381020: 8440 (0x20f8) bytes 
  (System.Object[])

CLR MD can also calculate this size for you. You have to do the work of traversing the objects yourself.

private static void PrintObjectSize(ClrRuntime clr)
{
    PrintHeader("Object Size");
    
    var obj = FindObjectOfType(clr, TargetType);
    Console.WriteLine($"0x{obj.Address:x} - {obj.Type.Name}");
    var heap = clr.Heap;
    // Evaluation stack
    Stack<ulong> stack = new Stack<ulong>();

    HashSet<ulong> considered = new HashSet<ulong>();

    int count = 0;
    ulong size = 0;
    stack.Push(obj.Address);

    while (stack.Count > 0)
    {                
        var objAddr = stack.Pop();
        if (considered.Contains(objAddr))
            continue;

        considered.Add(objAddr);
                        
        ClrType type = heap.GetObjectType(objAddr);
        if (type == null)
        {
            continue;
        }

        count++;
        size += type.GetSize(objAddr);
                                       
        type.EnumerateRefsOfObject(objAddr, 
                                   delegate (ulong child, 
                                             int offset)
        {
            if (child != 0 && !considered.Contains(child))
                stack.Push(child);
        });
    }
    Console.WriteLine($"Object Size: {obj.Size}");
    Console.WriteLine($"Full size: {size}");
}

The output looks like this:

Object Size
===========
0x4636c24 - LargeMemoryUsage.D
Object Size: 12
Full size: 1000024

If you are interested only in aggregate object sizes, then PerfView can give you this information and allow you to aggregate sub-objects in multiple ways to get very fine-grained analysis. This was described in the previous section.

Which Objects Are Being Allocated On the LOH?

Understanding which objects are being allocated on the large object heap is critical to ensuring a well-performing system. The important rule discussed earlier in this chapter states that all objects should be cleaned up in a gen 0 collection, or they need to live forever.

Large objects are only cleaned up by an expensive gen 2 GC, so it violates that rule out of the gate.

To find out which objects are on the LOH, use PerfView and follow the previously given instructions for getting a GC event trace. In the resulting GC Heap Alloc Stacks view, in the By Name tab, you will find a special node that PerfView creates called “LargeObject.” Double-click on this to go to the Callers view, which shows which “callers” LargeObject has. In the sample program, they are all Int32 arrays. Double-clicking on those in turn will show where the allocations occurred.

PerfView can show large objects and their types with the stacks that allocated them.
PerfView can show large objects and their types with the stacks that allocated them.

CLR MD can also tell you which objects are in the large object heap.

private static void PrintLOHObjects(ClrRuntime clr)
{
    PrintHeader("LOH Objects (limit:10)");

    int objectCount = 0;
    const int MaxObjectCount = 10;
    if (clr.Heap.CanWalkHeap)
    {
        foreach (var segment in clr.Heap.Segments)
        {                    
            if (segment.IsLarge)
            {
                for (ulong objAddr = segment.FirstObject; 
                     objAddr != 0; 
                     objAddr = segment.NextObject(objAddr))
                {                            
                    var type = clr.Heap.GetObjectType(objAddr);
                    if (type == null)
                    {
                        continue;
                    }
                    var obj = new ClrObject(objAddr, type);
                    if (++objectCount > MaxObjectCount)
                    {
                        break;
                    }
                    Console.WriteLine(
                      $"{obj.Address} {obj.Type.Name}");
                }                        
            }
        }
    }
}

What Objects Are Being Pinned?

As covered earlier, a performance counter will tell you how many pinned objects the GC encounters during a collection, but that will not help you determine which objects are being pinned.

Use the Pinning sample project, which pins things via explicit fixed statements and by calling some Windows APIs.

Use WinDbg to view pinned objects with the !gchandles command:

0:010> !gchandles
  Handle Type      Object   Size   Data Type
...
003511f8 Strong    01fa5dbc     52      System.Threading.Thread
003511fc Strong    01fa1330    112      System.AppDomain
003513ec Pinned    02fa33a8   8176      System.Object[]
003513f0 Pinned    02fa2398   4096      System.Object[]
003513f4 Pinned    02fa2178    528      System.Object[]
003513f8 Pinned    01fa121c     12      System.Object
003513fc Pinned    02fa1020   4420      System.Object[]
003514fc AsyncPinned 01fa3d04     64    
  System.Threading.OverlappedData

You will usually see lots of System.Object[] objects pinned. The CLR uses these arrays internally for things like statics and other pinned objects. In the case above, you can see one AsyncPinned handle. This object is related to the FileSystemWatcher in the sample project.

Unfortunately, the debugger will not tell you why something is pinned, but often you can examine the pinned object and trace it back to the object that is responsible for it.

The following WinDbg session demonstrates tracing through object references to find higher-level objects that may give a clue to the origins of the pinned object. See if you can follow the trail of object references, starting with the address of the AsyncPinned handle from above.

0:010> !do 01fa3d04
Name:  System.Threading.OverlappedData
MethodTable: 64535470
EEClass:   646445e0
Size:  64(0x40) bytes
File:  C:windowsMicrosoft.Net...mscorlib.dll
Fields:
  MT  Field   Offset   Type VT   Attr  Value Name
64927254  4000700  4  System.IAsyncResult  0 instance 020a7a60 
  m_asyncResult
64924904  4000701  8 ...ompletionCallback  0 instance 020a7a70 
  m_iocb
...
0:010> !do 020a7a70
Name:  System.Threading.IOCompletionCallback
MethodTable: 64924904
EEClass:   6463d320
Size:  32(0x20) bytes
File:  C:windowsMicrosoft.Net...mscorlib.dll
Fields:
  MT  Field   Offset   Type VT   Attr  Value Name
649326a4  400002d  4  System.Object  0 instance 01fa2bcc _target
...
0:010> !do 01fa2bcc
Name:  System.IO.FileSystemWatcher
MethodTable: 6a6b86c8
EEClass:   6a49c340
Size:  92(0x5c) bytes
File:  C:windowsMicrosoft.Net...System.dll
Fields:
  MT  Field   Offset   Type VT   Attr  Value Name
649326a4  400019a  4  System.Object  0 instance 00000000 __identity
6a699b44  40002d2  8 ...ponentModel.ISite  0 instance 00000000 site
...

While the debugger gives you the maximum power, it is cumbersome at best. Instead, you can use PerfView, which can simplify a lot of the drudgery.

With a PerfView trace, you will see a view called “Pinning at GC Time Stacks” that will show you stacks of the objects being pinned across the observed collections.

PerfView will show you information about what types of objects are pinned across a GC, as well as some information about its likely origin.
PerfView will show you information about what types of objects are pinned across a GC, as well as some information about its likely origin.

You can also approach pinning problems by looking at the free space holes created in the various heaps, which is covered in the next section.

Where Is Fragmentation Occuring?

Fragmentation occurs when there are freed blocks of memory inside segments containing used blocks of memory. Fragmentation can occur at multiple levels, inside a GC heap segment, or at the virtual memory level for the whole process. Fragmentation becomes a problem when there are so many small free blocks that they are not usable for future allocations.

Fragmentation in gen 0 is usually not an issue, unless you have a very severe pinning problem where you have pinned so many objects and each block of free space is too small to fulfill any new allocations. This will cause the size of the small object heap to grow and more garbage collections will occur.

Fragmentation is usually more of an issue in gen 2 or the large object heap, especially if you are not using background GC. You may see fragmentation rates that seem high, perhaps even 50%, but this is not necessarily an indication of a problem. Consider the size of the overall heap, and if it is acceptable and not growing over time, you probably do not need to take action.

First, you will want to know if fragmentation is happening at all. WinDbg can show you what percentage of a heap is free space, indicating fragmentation, using the !HeapStat command:

0:023> !HeapStat
Heap     Gen0    Gen1     Gen2     LOH
Heap0 2870384 2423640 93212392 9692760

Free space:                                   Percentage
Heap0  177940   21480 65552412 6324464 SOH: 66% LOH: 65%

This prints each heap and tells you the percentage of free space in both small and large object heaps. For large object heap fragmentation, you can often deduce the likely culprits just by looking at which objects are on the large object heap and examining their sizes and related code. See earlier in this chapter for information on how to find this out.

You can get a summary of types and objects that are adjacent to free blocks with the !DumpHeap -stat command. At the very end of the heap summary, there will be some output like this:

Fragmented blocks larger than 0.5 MB:
    Addr     Size      Followed by
16b61000    1.7MB         16d08948 System.Byte[]
16d08d7c    1.7MB         16ec4aa4 System.Byte[]
16f530c4    6.0MB         1755fb10 System.Byte[]
175e978c    0.6MB         17680ae0 System.Byte[]
176b9694    1.8MB         1787fff4 System.Byte[]
1e461000    1.5MB         1e5d7300 System.Byte[]
1e5d7734    1.4MB         1e74660c System.Byte[]
1e746a40    2.4MB         1e9a20d8 System.Byte[]

If you need detailed information about fragmentation, including which specific objects are causing the free space holes, you can use other WinDbg commands.

Get a list of free blocks with !DumpHeap -type Free:

0:010> !DumpHeap -type Free
 Address     MT   Size
02371000 008209f8     10 Free
0237100c 008209f8     10 Free
02371018 008209f8     10 Free
023a1fe8 008209f8     10 Free
023a3fdc 008209f8     22 Free
023abdb4 008209f8    574 Free
023adfc4 008209f8     46 Free
023bbd38 008209f8    698 Free
023bdfe0 008209f8     18 Free
023d19c0 008209f8   1586 Free
023d3fd8 008209f8     26 Free
023e578c 008209f8   2150 Free
...

For each block, figure out which heap segment it is in with !eeheap -gc.

0:010> !eeheap -gc
Number of GC Heaps: 1
generation 0 starts at 0x02371018
generation 1 starts at 0x0237100c
generation 2 starts at 0x02371000
ephemeral segment allocation context: none
     segment       begin     allocated  size
02370000  02371000  02539ff4  0x1c8ff4(1871860)
Large object heap starts at 0x03371000
     segment       begin     allocated  size
03370000  03371000  03375398  0x4398(17304)
Total Size:        Size: 0x1cd38c (1889164) bytes.
------------------------------
GC Heap Size:  Size: 0x1cd38c (1889164) bytes.

Dump all of the objects in that segment, or within a narrow range around the free space.

0:010> !DumpHeap 0x02371000 02539ff4
 Address     MT   Size
02371000 008209f8     10 Free
0237100c 008209f8     10 Free
02371018 008209f8     10 Free
02371024 713622fc     84   
02371078 71362450     84   
023710cc 71362494     84   
02371120 713624d8     84   
02371174 7136251c     84   
023711c8 7136251c     84   
0237121c 71362554     12
...

This is a manual and tedious process, but it does come in handy and you should understand how to do it. You can write scripts to process the output and generate the WinDbg commands for you based on previous output, but CLR Profiler can show you the same information in a graphical, aggregated manner that may be good enough for your needs.

CLR Profiler can show you a visual representation of the heap that makes it possible to see what types of objects are next to free space blocks. In this image, the free space blocks are bordered by blocks of System.Byte[] and an assortment of other types.
CLR Profiler can show you a visual representation of the heap that makes it possible to see what types of objects are next to free space blocks. In this image, the free space blocks are bordered by blocks of System.Byte[] and an assortment of other types.

PerfView can also tell you when fragmentation is occurring in the GCStats view. Look at the Frag % columns. However, it does not tell you why, exactly.

The CLR MD library allows you to build your own tool to highlight fragmentation for you. Each ClrObject has a Type property, which has an IsFree boolean property, which indicates if that type represents free space on the heap.

Virtual Memory Fragmentation

You may also get virtual memory fragmentation, which can cause an unmanaged allocation to fail because it cannot find a large enough range to satisfy the request. This can include allocating a new GC heap segment, which means your managed memory allocations will fail.

Use VMMap (part of SysInternals) to get a visual representation of your process. It will divide the heap into managed, native, and free regions. Selecting the Free portion will show you all Free segments. If the maximum size is insufficient for your requested memory allocation, you will get an OutOfMemoryException.

VMMap shows you tons of memory-related information, including the size of all free blocks in the address range. In this case, the largest block is over 1.1 GB in size—plenty!
VMMap shows you tons of memory-related information, including the size of all free blocks in the address range. In this case, the largest block is over 1.1 GB in size—plenty!

VMMap also has a fragmentation view that can show where these blocks fit in the overall process space.

VMMap’s fragmentation view shows free space in the context of other segments.
VMMap’s fragmentation view shows free space in the context of other segments.

You can also retrieve this information in WinDbg:

!address -summary

This command produces this output:

...
-- Largest Region by Usage -- Base Address -- Region Size --
Free              26770000    49320000 (1.144 Gb)
...

You can retrieve information about specific blocks with the command:

!address -f:Free

This produces output similar to:

BaseAddr EndAddr+1 RgnSize  Type State   Protect     Usage
--------------------------------------------------------------
   0   150000   150000     MEM_FREE  PAGE_NOACCESS Free  

Virtual memory fragmentation is more likely in 32-bit processes, where you are limited to just two gigabytes of address space for your program by default. The biggest symptom of this is an OutOfMemoryException. The easiest way to fix this is to convert your application to a 64-bit process, with its 128-terabyte address space. If you cannot do this, your only choice is to become far more efficient in memory allocations. You will need to compact the heaps and you may need to implement significant pooling.

What Generation Is An Object In?

You can retrieve this information from inside your app’s own code by using the GC.GetGeneration method and passing it the object in question.

In WinDbg, once you obtain the address of the object of interest (say from !DumpStackObjects or !DumpHeap), use the !gcwhere command:

0:003> !gcwhere 02ed1fc0 
Address   Gen Heap segment  begin  allocated size
02ed1fc0  1   0  02ed0000 02ed1000 02fe5d4c  0x14(20)

In CLR MD, you can use the ClrHeap.GetGeneration method:

foreach(var obj in heap.EnumerateObjects())
{
    int gen = heap.GetGeneration(obj.Address);
}

Which Objects Survive Gen 0?

The simple way to do this is to simply enumerate all objects that are in the gen 1 or gen 2 portion of the heap.

CLR MD can do this for you with minimal code:

foreach(var obj in heap.EnumerateObjects())
{
    int gen = heap.GetGeneration(obj.Address);
    if (gen > 0)
    {
        // do some analysis
    }
}

On a big heap, it would be extremely inefficient to iterate through every object in the heap. If you are interested in just the gen 1 heap, for example, you can make this a little bit better by walking the heap per segment.

private static void PrintGen1ObjectsByHeapSegment(ClrRuntime clr)
{
    PrintHeader("Gen1 Objects by Heap Segment");
    if (clr.Heap.CanWalkHeap)
    {
        foreach(var segment in clr.Heap.Segments)
        {
            // Only the ephemeral segment contains gen0 and gen1
            if (segment.IsEphemeral)
            {
                //get range of gen 1
                ulong start = segment.Gen1Start;
                ulong end = start + segment.Gen1Length;
                Console.WriteLine(
                  $"Segment Info: Start: {start}, End {end}");

                for (ulong objAddr = segment.FirstObject; 
                     objAddr != 0; 
                     objAddr = segment.NextObject(objAddr))
                {
                    if (objAddr >= start && objAddr < end)
                    {
                        var type = 
                          clr.Heap.GetObjectType(objAddr);
                        if (type == null)
                        {
                            continue;
                        }
                        var obj = new ClrObject(objAddr, type);                        
                        Console.WriteLine(
                          $"{obj.Address} {obj.Type.Name}");
                    }
                }

                break;
            }
        }
    }
}

On the other hand, perhaps you want to debug which objects survive a specific garbage collection—perhaps you are in the debugger, sitting at a breakpoint, and you want to know what happens after the next GC. This is possible in WinDbg, but it is fairly involved.

In WinDbg, execute these commands:

!FindRoots -gen 0
g

This will set a breakpoint right before the next gen 0 collection begins. Once it breaks, you can send whatever commands you want to dump the objects on the heap. You can simply do:

!DumpHeap

This will dump every object on the heap, which may be excessive. Optionally, you can add the -stat parameter to limit output to a summary of the found objects (their counts, sizes, and types). However, if you want to limit your analysis to just gen 0, the !DumpHeap command allows you to specify an address range. Recall the description of memory segments from the top of the chapter and that gen 0 is at the end of the segment.

Basic segment layout
Basic segment layout

To get a list of heaps and segments, you can use the eeheap -gc command:

0:003> !eeheap -gc
Number of GC Heaps: 1
generation 0 starts at 0x02ef0400
generation 1 starts at 0x02ed100c
generation 2 starts at 0x02ed1000
ephemeral segment allocation context: none
     segment  begin  allocated  size
02ed0000  02ed1000  02fe5d4c 0x114d4c(1133900)
Large object heap starts at 0x03ed1000
     segment  begin     allocated  size
03ed0000  03ed1000  041e2898  0x311898(3217560)
Total Size:        Size: 0x4265e4 (4351460) bytes.
------------------------------
GC Heap Size:  Size: 0x4265e4 (4351460) bytes.

This command will give you a printout of each generation and each segment. The segment that contains gen 0 and gen 1 is called the ephemeral segment. !eeheap tells you the start of gen 0. To get the end of it, you merely need to find the segment that contains the start address. Each segment contains a number of addresses and the length. In the example above, the ephemeral segment starts at 02ed0000 and ends at 02fe5d4c. Therefore, the range of gen 0 on this heap is 02ef0400 - 02fe5d4c.

Now that you know this, you can put some limits on the !DumpHeap command and print only the objects in gen 0:

!DumpHeap 02ef0400 02fe5d4c

Once you have done that, you will want to compare what happens as soon as the GC is complete. This is a little trickier. You will need to set a breakpoint on an internal CLR method. This method is called when the CLR is ready to resume managed code. If you are using workstation GC, call:

bp clr!WKS::GCHeap::RestartEE

For server GC:

bp clr!SVR::GCHeap::RestartEE

Once you have set the breakpoints, continue execution (F5 or the g command). Once the GC is complete, the program will break again and you can repeat the !eeheap -gc and !DumpHeap commands.

Now you have two sets of outputs and you can compare them to see what changed and which objects are remaining after a GC. By using the other commands and techniques in this section, you can see who maintains a reference to that object.

Note If you use server GC, then remember there will be multiple heaps. To do this kind of analysis, you will need to repeat the commands for each heap. The !eeheap command will print information for every heap in the process.

Who Is Calling GC.Collect Explicitly?

When code explicitly calls GC.Collect, it is called an “induced” garbage collection, and there are counters and ETW events that surface this information. However, they will not tell you who is calling it. You can easily search your own code base in Visual Studio or any advanced text editor, but if that turns up nothing, you will need to set a breakpoint on the GC.Collect method itself to see how your program gets to it.

In WinDbg, set a managed breakpoint on the GC class’s Collect method:

!bpmd mscorlib.dll System.GC.Collect

Continue executing. Once the breakpoint is hit, to see the stack trace of who called the explicit collection, do:

!DumpStack

What Weak References Are In My Process?

Because weak references are a type of GC handle, you can use the !gchandles command in WinDbg to find them:

0:003> !gchandles
  Handle Type      Object  Size  Data Type
006b12f4 WeakShort   022a3c8c  100   System.Diagnostics.Tracing...
006b12fc WeakShort   022a3afc  52  System.Threading.Thread
006b10f8 WeakLong  022a3ddc  32  Microsoft.Win32.UnsafeNati...
006b11d0 Strong    022a3460  48  System.Object[]
...

Handles:
  Strong Handles:     11
  Pinned Handles:     5
  Weak Long Handles:  1
  Weak Short Handles:   2

Weak Short handles are the normal weak references you may use. Weak Long handles track whether a finalized object has been resurrected (objects without a finalizer always have short handles). Resurrection can occur when an object has been finalized, and rather than letting the GC clean it up, you decide to reuse it by assigning the object to a new reference from the finalizer. This can be relevant for pooling scenarios. However, it is possible to do pooling without finalization, and given the complexities of resurrection, just avoid this in favor of deterministic methods.

What Finalizable Objects Are On The Heap?

WinDbg’s !FinalizeQueue command will show you all objects that are registered for finalization, as well as a summary of their types.

0:042> !FinalizeQueue
SyncBlocks to be cleaned up: 0
Free-Threaded Interfaces to be released: 0
MTA Interfaces to be released: 0
STA Interfaces to be released: 0
----------------------------------
generation 0 has 13 finalizable objects (288603b4->288603e8)
generation 1 has 6 finalizable objects (2886039c->288603b4)
generation 2 has 57247 finalizable objects (28828520->2886039c)
Ready for finalization 0 objects (288603e8->288603e8)
Statistics for all finalizable objects 
  (including all objects ready for finalization):
      MT Count TotalSize Class Name
72753184     1        12 System.WeakReference`1...
6df6bea8     1        12 System.Windows.Forms.VisualStyles...
6df68c44     1        12 System.Windows.Forms.ImageList...
584582f0     1        12 System.WeakReference`1...
58443158     1        12 Microsoft.Build.BackEnd.Components...
...

If you want to see a summary of objects that are ready for finalization, you can execute:

!FinalizeQueue -detail

This will show you a list of type names that are currently “freachable” (that is, eligible to have their finalizers called). If you want to get the specific objects that are in that category, you can use the address range given in the output to dump all objects within the “ready for finalization” range:

!DumpHeap 288603e8 288606c4

In CLR MD, you can use the EnumerateFinalizableObjectAddresses method to enumerate all finalizable objects:

private static void PrintFinalizableObjects(ClrRuntime clr)
{
    foreach (var objAddr in 
               clr.Heap.EnumerateFinalizableObjectAddresses())
    {
        ClrType type = clr.Heap.GetObjectType(objAddr);
        if (type == null)
        {
            continue;
        }
        ClrObject obj = new ClrObject(objAddr, type);
        // Do something with the object...
    }
}

Unfortunately, this does not tell you whether those objects are ready for finalization.

Summary

You need to understand garbage collection in-depth to truly optimize your applications. Choose the right configuration settings for your application, such as server GC if your application is the only one running on the machine, but be wary of using advanced configuration settings. Ensure that object lifetime is short, allocation rates are low, and that any objects that must live longer than the average GC frequency are pooled or otherwise kept alive in gen 2 forever. Make judicious use of stackalloc to avoid heap allocations.

Avoid pinning and finalizers if possible. Any LOH allocations should be pooled and kept around forever to avoid full GCs. Reduce LOH fragmentation by keeping objects a uniform size and occasionally compacting the heap on-demand. Consider GC notifications to avoid having full collections impact application processing at inopportune times.

The garbage collector is a deterministic component and you can control its operation by closely managing your object allocation rate and lifetime. You are not giving up control by adopting .NET’s garbage collector, but it does require a little more subtlety and analysis.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset