Chapter 14. Immutable Data

WHAT'S IN THIS CHAPTER?

  • Understanding Why Immutability Is Good

  • Managing State and Mutation Responsibly

  • Passing References

  • Enhancing Performance

The term functional is similar to object oriented in that it represents a collection of language design choices and associated design methodologies. In this chapter, you'll explore the data side of functional programming in F# through a combination of language features and methodology. When you arrive on the other side, you'll know why immutable data structures are so important and have ways of handling mutation in a safe way.

THE PROBLEM WITH STATE

For years, the vast majority of programmers have combined data in mutable form and operations on that data in constructs known as classes. Along with the numerous advantages of this methodology come a number of disadvantages that are not often discussed. Most of these disadvantages stem from the use of these classes as black boxes that each encapsulate some part of the state of your program. This state is hidden by a mess of unspecified behavior, implicit in the method definitions of the classes.

When consuming these classes, you find yourself unable to directly understand their implementation. The same call to a class method may produce different results, and often does. This problem is not restricted to a single class because many classes mutate each other or are interlinked with complex networks of events. A single method call can cause a ripple effect of state changes throughout your program.

The foundation of your reasoning about these programs is unsound. As the number of possible state changes per call increases, so does the number of potential bugs. A single change to a class can cause bugs to appear in a completely different part of the program. This problem is compounded to a large degree by asynchronous, parallel, or event-driven programming.

If you have attempted to build a test harness for a large legacy class, you have likely experienced this firsthand. A precise model of behavior can be difficult to nail down when that behavior is defined by a large number of stateful constructs. Modern testing and design methodologies have been invented to help mitigate some of these problems, but little has been done to combat the source.

The good news is that by taking a functional approach you can eliminate the vast majority of these errors. By changing your thinking to be in terms of immutable data and transformations of that data, most of these problems simply evaporate. As all inputs and outputs are exposed, you can be sure that calls are doing only what you expect. Intermediate program states are not exposed because data is immutable and changes are made primarily through atomic reference swapping.

STATE SAFETY

Bar none, the biggest problem with modern object-oriented languages is that they make it near impossible to guarantee any kind of state safety. This single problem is responsible for everything from the modern concurrency nightmare to the epidemic of difficult-to-test code. Many sets of best practices have arisen to help. However, they come at a heavy cost in terms of both performance and code size because they lack language support.

Programwide State Safety

When data and operations on that data are inextricably intertwined, several problems arise. First, object graphs are subject to simultaneous and conflicting state changes. Second, it cannot be ensured that data is in the correct state for a call to complete successfully. Third, changes in seemingly unrelated objects may affect the output of a call. To understand these problems better, it is necessary to examine them in more detail and see how having immutable data prevents these types of error from occurring.

Errors arising from simultaneous operations on a stateful object graph are one of the primary reasons for the great interest in functional programming. An unfortunate side effect of classes being considered black boxes is that without examining the content of the class you're using you can never be sure that it was written in a thread-safe way.

type UnsafeAccessClass() =
    let mutable i = 0
    member this.IncrementTenAndGetResult() =
        i <- i + 10
        i

As you can see when multiple calls to UnsafeAccessClass. IncrementTen () occur in separate threads, the output will vary depending on the order of thread execution. To explore what can happen, take a look at this small framework that views the possible outputs based on discrete timings by breaking this class into calls for each line.

open System
open System.Threading

type UnsafeAccessClass() =
    let mutable i = 0
    member this.IncrementTen() =
        i <- i + 10
        this
    member this.GetResult() = i

let asyncExecuteClass (ac: UnsafeAccessClass) (firstSleep: int, secondSleep: int) =
    async {
        Thread.Sleep( firstSleep*firstSleep*20 )
        let incremented = ac.IncrementTen()
        Thread.Sleep( secondSleep*secondSleep*20 )
        let result = incremented.GetResult()
        return result, DateTime.Now.Ticks
    }

let compareUnsafeExecution firstThreadSleeps secondThreadSleeps =
    let ac = new UnsafeAccessClass()
    let threadWork = [ asyncExecuteClass ac firstThreadSleeps;
                       asyncExecuteClass ac secondThreadSleeps]
    let results = Async.RunSynchronously (Async.Parallel threadWork)
    if snd results.[0] < snd results.[1] then
        [ "1st", (fst results.[0]); "2nd", (fst results.[1]) ]
    else
        [ "2nd", (fst results.[1]); "1st", (fst results.[0]) ]

Let's examine each of the possible output cases.

//Execution Order: 1,1,2,2
let case1 = compareUnsafeExecution (1, 2) (3, 4)
//Execution Order: 2,2,1,1
let case2 = compareUnsafeExecution (3, 4) (1, 2)
//Execution Order: 1,2,1,2
let case3 = compareUnsafeExecution (1, 3) (2, 4)
//Execution Order: 2,1,2,1
let case4 = compareUnsafeExecution (2, 4) (1, 3)
//Execution Order: 1,2,2,1
let case5 = compareUnsafeExecution (1, 4) (2, 3)
//Execution Order: 2,1,1,2
let case6 = compareUnsafeExecution (2, 3) (1, 4)

val case1 : (string * int) list = [("1st", 10); ("2nd", 20)]
val case2 : (string * int) list = [("2nd", 10); ("1st", 20)]
val case3 : (string * int) list = [("1st", 20); ("2nd", 20)]
val case4 : (string * int) list = [("2nd", 20); ("1st", 20)]
val case5 : (string * int) list = [("2nd", 20); ("1st", 20)]
val case6 : (string * int) list = [("1st", 20); ("2nd", 20)]

                                                      
Programwide State Safety

Notice that given just this simple example with a single mutated variable and two threads, there are six possible execution orderings. The result of this is two possible return values and two possible return orderings. The number of orderings will explode exponentially with the number of threads. Even worse, the number of different possible results has an exponential relationship with both the number of mutations and the number of threads.

Now, let's examine the possible outputs given an immutable class.

type SafeAccessClass(i) =
    new() = SafeAccessClass (0)
    member this.IncrementTen() = new SafeAccessClass(i + 10)
    member this.GetResult() = i

val case1 : (string * int) list = [("1st", 10); ("2nd", 10)]
val case2 : (string * int) list = [("2nd", 10); ("1st", 10)]
val case3 : (string * int) list = [("1st", 10); ("2nd", 10)]
val case4 : (string * int) list = [("2nd", 10); ("1st", 10)]
val case5 : (string * int) list = [("2nd", 10); ("1st", 10)]
val case6 : (string * int) list = [("1st", 10); ("2nd", 10)]
                                                        
Programwide State Safety

Note that the return ordering cannot be guaranteed. However, the combination of immutability and F#'s asynchronous workflows make this a nonissue.

Almost all the System.Drawing namespace works in a similar way, and extensive steps must be taken to ensure calls to Clone() are performed in the correct places. Often consumers of these classes make underlying assumptions about the state of internal class data with disastrous results. When using immutable data, considerations of this type become completely unnecessary.

This leads to the problem of ensuring data is in the correct state before a call is made. Method calls may require a particular ordering for internal data structures to be properly initialized, or assumptions may be made about the state of method inputs. This results in the need for explicit preconditions to verify that these assumptions are true. Preconditions that when violated, can result in runtime failures.

Let's take a quick look at an example of implicitly required ordering of method calls with a common pattern that is often used in C#:

exception NotInitialized

type InitializeClass() =
  let mutable initialized = false;
  member this.Initialize() =
    initialized <- true
  member this.Perform() =
    if not initialized then raise NotInitialized

The problems generated by the need for preconditions are two-fold. First, if the programmer forgets about even one precondition, the data may be in a bad state and cause an exception of an unhandled type to be thrown. Second, it is much better to correct program flow issues of this type at compile-time. As discussed in the previous chapter, it is possible to prevent the need for most of these preconditions by effectively leveraging the F# type system. Immutability also helps here immensely by ensuring that the contents of class fields do not change after construction. These two features alone eliminate the need for most precondition checks.

Finally, there is the issue of seemingly unrelated objects affecting each other in unexpected ways. As classes often hold shared or bidirectional references, a change in one class may cause unexpected behavior in another. As an example of this, take a look at a simple settings class and the classes that reference it.

type ListGenerationManager() =
    let mutable GenerateInstances = 0
    member this.InstancesToGenerate
        with get () = GenerateInstances
        and set (value) = GenerateInstances <- value
    member this.GetIntegerGenerator() = new IntegerListGenerator(this)
    member this.GetFloatGenerator() = new FloatListGenerator(this)

and IntegerListGenerator(manager: ListGenerationManager) =
    member this.GenerateInstances() =
        List.init manager.InstancesToGenerate (fun i -> int i)

and FloatListGenerator(manager: ListGenerationManager) =
    member this.GenerateInstances() =
        List.init manager.InstancesToGenerate (fun i -> float i)
                                                                     
Programwide State Safety

In cases where invisible shared state is present, attempting to fix a bug without cataloging everywhere the classes sharing that state are constructed or used is extremely difficult. Without full knowledge of their context, it is impossible to correctly reason about their behavior. Furthermore, attempting to change the referenced class may cause unexpected behavioral changes to pop up in its consumers. Just as the runtime state changes ripple through these objects, so do the effects of behavioral changes.

With immutable data you can be confident that these types of situations do not occur as changes to the internal state of the object will always be accompanied by the construction of a new object. This prevents interobject behavioral dependencies from being created in the first place.

These common problems are all easy to avoid when programming with immutable data. Given the same input set, your functions will reliably produce the same output. Data is transformed and passed around, instead of being encapsulated and kept around. As you might expect, this greatly reduces the number of state-related bugs in a given program. As your data is unable to change, you can rest assured that these types of unexpected conditions will not occur.

Local Data State Safety

A big problem with mutable data is that it can be difficult or impossible to reason about what the contents of a variable will be after a call is made. This is part of the reason extensive debugging tools are needed when programming in most modern object-oriented languages. Will a passed-in list be kept in the same order? Will the data transform you gave as input still contain the same values? Worse, even after spotting a bug caused by the unexpected mutation of data, it can be time-consuming to find exactly where that mutation took place. Changes can occur anywhere the object is passed or held.

When classes are designed this way, it creates a situation where the ordering of calls and handling of data becomes very important. Without extreme care brittle code is the result. A great example of this is the .NET List class. Any call that uses this class can potentially cause its contents to change.

type ListData =
    struct
      val mutable Number: int
      val mutable Name: string
      new( number, name ) = { Number = number; Name = name }
      override this.ToString() =
          String.Format( "Number: {0}, Name: {1}", this.Number, this.Name )
    end

let clrList = new List<ListData>( [| new ListData(1, "Johnny");
                                     new ListData(5, "Lisa");
                                     new ListData(3, "Mark") |]);

> clrList.ToArray();;
val it : ListData [] =
  [|Number: 1, Name: Johnny; Number: 5, Name: Lisa; Number: 3, Name: Mark|]

let sortedClrList =
    let ListDataComparison (d1:ListData) (d2:ListData) =
        d1.Number - d2.Number in
        clrList.Sort( ListDataComparison )
    clrList

> sortedClrList.ToArray();;
val it : ListData [] =
  [|Number: 1, Name: Johnny; Number: 3, Name: Mark; Number: 5, Name: Lisa|]

> clrList.ToArray();;
val it : ListData [] =
  [|Number: 1, Name: Johnny; Number: 3, Name: Mark; Number: 5, Name: Lisa|]

let reversedClrList =
    clrList.Reverse()
    clrList

> reversedClrList.ToArray();;
val it : ListData [] =
  [|Number: 5, Name: Lisa; Number: 3, Name: Mark; Number: 1, Name: Johnny|]

> sortedClrList.ToArray();;
val it : ListData [] =
  [|Number: 5, Name: Lisa; Number: 3, Name: Mark; Number: 1, Name: Johnny|]

> clrList.ToArray();;
val it : ListData [] =
  [|Number: 5, Name: Lisa; Number: 3, Name: Mark; Number: 1, Name: Johnny|]

One commonly used method of avoiding data mutation is to explicitly clone the class each time it is passed. This adds extra code to your program and makes it slower, but at least it allows some semblance of safety.

let clrList = new List<ListData>( [| new ListData(1, "Johnny");
                                     new ListData(5, "Lisa");
                                     new ListData(3, "Mark") |]);
let sortedClrList =
    let newList = new List<ListData>(clrList)
    let ListDataComparison (d1:ListData) (d2:ListData) =
        d1.Number - d2.Number in
        newList.Sort( ListDataComparison )
    newList

> sortedClrList.ToArray();;
val it : ListData [] =
  [|Number: 1, Name: Johnny; Number: 3, Name: Mark; Number: 5, Name: Lisa|]

> clrList.ToArray();;
val it : ListData [] =
  [|Number: 1, Name: Johnny; Number: 5, Name: Lisa; Number: 3, Name: Mark|]

F# does all it can to prevent you from unexpectedly mutating the contents of a list, and it does so at compile time. In fact, even if the members of ListData are marked mutable, the following sample will not compile. The immutable property is maintained for the data structure and its contents.

> let incrementedClrList =
   clrList.ForEach( (fun data -> data.Number <- data.Number + 1) )
   clrList

error FS0256: A value must be mutable in order to mutate the contents or take the
address of a value type, e.g. 'let mutable x = ...'

Unfortunately, other .NET languages don't make such guarantees. In these languages, great pains must be taken and great inefficiencies must be incurred to avoid mutation. The following C# example compiles and can be used without any errors being raised.

public static List<ListData> IncrementAllInList(List<ListData> list)
{
    list.ForEach((data => data.Number += 1));
    return list;
}

Therefore, when using these languages, it is often wise to do a deep clone before passing your list into another part of your program. This is the only way you can be confident that neither the order nor the content of the elements will change. This process is often very resource-intensive and can be painful to implement.

An alternative to this is to build a read-only version of a .NET list by using List.AsReadOnly. However, this returns an IList and so can't be used in places where a standard list is taken. Also, using a read-only list may protect against reordering, element addition, and element removal, but it won't stop a consumer from directly modifying the elements themselves. Unless you have control over the entire API, AsReadOnly and the IList it returns are little more than a bandage on a gaping wound.

Now, in C# you would wisely implement ICloneable and a generic List<T>. DeepClone() extension method to make this easier on yourself. However, the vast majority of the .NET framework is composed of structs and sealed classes that do not implement ICloneable. For this reason, separate deep clone extension methods for each List-DataType combination must be created.

let deepCloneList (list: List<ListData>) =
    let newList = new List<ListData>()
    list.ForEach( (fun data ->
        newList.Add( new ListData( data.Number, data.Name ) ) ) )
    newList

Thankfully, F# comes with a number of immutable data structures that, when used, eliminate these problems. Now, take a look at similar code that uses F# lists instead (for more information on F# lists see Chapter 5).

> let initialList = [ new ListData(1, "Johnny");
                    new ListData(5, "Lisa");
                    new ListData(3, "Mark") ];

val initialList : ListData list =
  [Number: 1, Name: Johnny; Number: 5, Name: Lisa; Number: 3, Name: Mark]

> let sortedList =
    let numberSelector (d: ListData) = d.Number in
        List.sortBy numberSelector initialList

val sortedList : ListData list =
  [Number: 1, Name: Johnny; Number: 3, Name: Mark; Number: 5, Name: Lisa]

> let reversedList = List.rev initialList

val reversedList : ListData list =
  [Number: 3, Name: Mark; Number: 5, Name: Lisa; Number: 1, Name: Johnny]

Notice that, after being constructed, you can always be confident about the contents of an F# list instance. It will never be unexpectedly changed by a call. Coupled with immutable list members, this makes it easy to reason about exactly what your code will do even before writing a test or walking through it in a debugger.

To further this ability to reason about code, functional programs are constructed with a different approach than those that are object oriented. As you may have noticed, instead of using classes to create black boxes, you are using immutable data structures and functions that return new data. This practice ensures that when viewing code it is easy to tell exactly what is being done and exactly where those effects are applied, even in the face of out-of-scope code changes.

DATA MUTATION

One of the advantages of F# is that, although it encourages immutability, it is not a pure functional programming language. That is, F# allows for the mutation of data. This difference frees programmers to use the most expressive paradigm for the algorithm they are implementing, instead of being forced to use convoluted or inefficient code.

However, this puts the responsibility for state safety solely in your hands. Although pure functional languages force the programmer to behave, it is entirely possible to build F# programs with all the problems discussed in the previous two sections. For this reason, it is important to carefully consider the patterns you use to make state changes in your program.

Avoiding Mutation

An important part of becoming an effective F# programmer is learning to avoid the mutable keyword whenever possible. Once you get the hang of the language constructs you'll see that in the vast majority of cases data mutation is completely unnecessary. It is usually only when you begin to use imperative .NET framework classes, such as WinForms, that you need to start using a significant amount of mutable data.

As an example, consider putting numbers into buckets according to their least factor. Using mutable data this task might be accomplished in the following way:

let leastFactor n =
    let max = int (sqrt (float n))
    let mutable lastTried = 1
    let mutable keepLooping = true
    while keepLooping do
        lastTried <- lastTried + 1
        if n % lastTried = 0 || lastTried > max
        then keepLooping <- false
    if lastTried > max then n else lastTried

let bucketByLeastFactors numbers =
    let mutable buckets = Array.create (1 + List.max numbers) []
    for number in numbers do
        let lf = leastFactor number
        buckets.[lf] <- [number] @ buckets.[lf]
    buckets

let printFactorBuckets factorBuckets =
    for i in 0 .. (Array.length factorBuckets) - 1 do
        let factoredList = factorBuckets.[i]
        if not (List.isEmpty factoredList) then
            printfn "%d -> %A" i factoredList

let factorBuckets = bucketByLeastFactors [ 0 .. 25 ]
                                                                  
Avoiding Mutation
> printFactorBuckets factorBuckets;;

0 -> [0]
1 -> [1]
2 -> [24; 22; 20; 18; 16; 14; 12; 10; 8; 6; 4; 2]
3 -> [21; 15; 9; 3]
5 -> [25; 5]
7 -> [7]
11 -> [11]
13 -> [13]
17 -> [17]
19 -> [19]
23 -> [23]

It may look like employing mutable data here is your only option as, at the very least, it seems like you need a place to accumulate your primes into buckets. However, if you restructure the flow of data, you will see that it is unnecessary in all cases.

let leastFactor n =
    let maxFactor = int (sqrt (float n))
    let rec factorTest = function
        | i when i > maxFactor -> n
        | i when n % i = 0 -> i
        | i -> factorTest (i + 1)
    factorTest 2

let bucketByLeastFactors (numbers: int list) =
    let rec addToBucket bucketIndex n bucketsList =
        let neededBuckets = bucketIndex - List.length bucketsList
        if neededBuckets >= 0 then
            addToBucket bucketIndex n
                (bucketsList @ [ for i in 0 .. neededBuckets -> [] ])
        else
            List.mapi
                (fun i bucket ->
                    if bucketIndex = i then ([n] @ bucket)
                    else bucket)
                bucketsList
    List.fold
        (fun bucketsList n -> (addToBucket (leastFactor n) n bucketsList))
        List.empty
        numbers

let printFactorBuckets factorBuckets =
    List.iteri
        (fun i factored ->
            if List.length factored > 0
            then printfn "%d -> %A" i factored)
        factorBuckets

let factorBuckets = bucketByLeastFactors [ 0 .. 25 ]
                                                          
Avoiding Mutation
> printFactorBuckets factorBuckets

0 -> [0]
1 -> [1]
2 -> [24; 22; 20; 18; 16; 14; 12; 10; 8; 6; 4; 2]
3 -> [21; 15; 9; 3]
5 -> [25; 5]
7 -> [7]
11 -> [11]
13 -> [13]
17 -> [17]
19 -> [19]
23 -> [23]

First, compare the leastFactor functions. The mutable iterative sample has been changed into a tail recursive iterative function by breaking it into subcases. It still effectively iterates from 2 to the maxFactor but instead of directly looping and mutating the variable, in each iteration the factorTest function is called with the possible factor incremented by one.

The changes to bucketByLeastFactors are significantly more extensive. Because the goal is to build a list of lists, you must thread this data structure through each of your numbers to be factored. After each number is visited, a new bucketsList is returned. In effect, this is what List.fold does.

After generating the least factor with the (leastFactor n) call, the result is passed, along with the current number and the current bucketsList, into the addToBucket function. This function takes a bucketList and returns a new bucketList with the passed-in number appended to the bucket at the index of the least factor.

This is done by first checking to make sure the appropriate bucket exists; if it doesn't then enough buckets are appended to ensure that it does and recursively calls itself with the freshly expanded bucketList. If it does, a new list of lists is created via the List.mapi function. The only difference between the lists being that the bucket at the least factor index has the newly factored number appended to the front. This bucketList is then returned and used in the next iteration of List.fold.

Getting the hang of this can be a bit difficult at first. This example contains two recursion patterns and two List module functions. F# provides quite a few constructs that help reduce the burden of immutability. The first step in becoming adept at F# is learning these patterns and constructs so that you can program effectively with immutable data. With a little practice they become as familiar as the design patterns and loops you have used for years in object-oriented programming.

When first starting out with F#, the prospect of learning all this can be quite intimidating. The most important thing to keep in mind is isolating mutable data within the confines of a single function. This practice provides most of the safety of immutability but without needing deep knowledge of functional patterns and immutable data structures.

Bubble and Assign

One technique for relatively safe state mutation, which is easy to begin with, is bubble and assign. When possible, you should always bubble your changes up the call stack and do your variable assignment within the same scope.

let simulationLoop () =
    let worldState = getInitialWorldStateArray()
    while simulationIsRunning() do
        let changes = getWorldStateChanges(worldState)
        for change in changes do
            let i, j, newValue = change
            worldState.[i].[j] <- newValue
        updateUI(worldState)

Note

This is a structural example and does not compile.

By simply returning changes to the current scope and then applying them, state changes can be isolated to this single function. Each iteration of the getWorldStateChanges function returns a set of unapplied changes to the worldState array. These changes are then applied to the array one at a time in a very controlled way.

This simple technique is easy to understand and effective in many cases. It keeps state changes within the scope in which the mutable data is defined and is generally low overhead. The trade-off is that you are limited in terms of where your data can be accessed from and the types of data that can be easily used.

Reference Cells

You will often find the bubble and assign technique too constraining. In these cases, reference cells are usually the easiest solution. With reference cells your state mutations are made with an atomic reference assignment operation. This replaces an entire data instance with a new one. As your state is enclosed in a heap allocated reference, you can now mutate it from within closures or other functions.

The most common way to do reference swapping is via a reference cell. To do this, define a reference cell with the ref keyword, assign it with the reference assignment operator and retrieve its underlying value with the dereference operator.

> let refInt = ref 10;;
val refInt : int ref = {contents = 10;}

> refInt := 20;;
> refInt;;
val it : int ref = {contents = 20;}

> let normalInt = ! refInt;;
val normalInt : int = 20

Because using a reference cell effectively boxes your type within a reference type, it is now possible to mutate it from anywhere it or any closures that contain it are accessible. For this reason it is important to carefully consider the implications of both where the reference cell is located and where it is accessed from. It is well known that even judicious use of global variables can lead to major problems with parallelization, code reuse, and the ability to reason about what is happening. However, global variables are just the worst possible case of a larger domain of antipatterns. Any widely accessible mutable data will have these same characteristics, albeit to a lesser degree.

let gameLoop actors display input =
    let mutable gameIsRunning = true
    let worldState = ref (getInitialWorldStateArray())
    let applyChanges changes =
        for (i,j,value) in changes do
(!worldState).[i].[j] <- value
    while gameIsRunning do
        for actor in actors do
            applyChanges (actor.getChanges !worldState)
        applyChanges (input.getChanges !worldState)
        display.update !worldState
        gameIsRunning <- not input.quit

Note

This is a structural example and does not compile.

Note that in the preceding example, all the state changes are confined to this one function. As a general rule of thumb, it is best to limit assignment and passing of mutable data to as small of a scope as possible.

Now, consider the following worst-case example:

let gameIsRunning = ref true
let worldState: option< option<int> array array > ref = ref None

let initializeWorldStateArray () =
    worldState := Some(getInitialWorldStateArray())

let updateGameStateFromActors () =
    for actor in !actors do
        actor.updateWorldState worldState

let updateGameStateFromUserInput () =
    userInput.updateWorldState worldState
    gameIsRunning := not userInput.quit

let updateUI () = ui.update worldState

let gameLoop () =
    initializeWorldStateArray()
    while !gameIsRunning do
        updateGameStateFromActors()
        updateUI()
        updateGameStateFromUserInput()

Note

This is a structural example and does not compile.

This is effectively a bunch of mutable variables and functions that transparently operate on those variables. As you do not even require function wrappers to access or change the data, the state of this program can be changed anywhere silently and is all but impossible to debug. Code constructed in this way is also notoriously difficult to reuse or parallelize.

If instead, you swap close to your reference by moving the mutable reference declaration down the stack and push the needed changes up the stack, the negative effects of mutability are isolated to a small portion of your overall code. It is always best to keep your mutable data access and assignments limited to as small of a scope as possible.

Passing by Reference

Although often overlooked, passing by reference is a very effective way to perform mutation. As only those functions granted special access can manipulate the reference, you can be confident that hidden mutations are not happening elsewhere. Passing by reference requires three steps be taken. First, the variable must be marked as mutable. Second, you must wrap the value in a reference cell with the in boxing operator. Third, the function must be annotated to take its argument as a reference with the byref keyword.

> let mutable number = 10
let doubleNumber (number: int byref) = number <- 2 * number
doubleNumber (&number);;

> number;;
val it : int = 20

By using the byref keyword, calls that perform mutation stand out as they now require a boxing operator to be applied.

let gameLoop actors display input =
    let mutable gameIsRunning = true
    let mutable worldState = getInitialWorldStateArray()
    let applyChanges changes (worldState: 'a option [] [] byref) =
        List.iter
            (fun (i,j,value) -> (worldState).[i].[j] <- value)
            changes
    while gameIsRunning do
        for actor in actors do
            applyChanges (actor.getChanges worldState) (&worldState)
        applyChanges (input.getChanges worldState) (&worldState)
        display.update worldState
        gameIsRunning <- not input.quit

Note

This is a structural example and does not compile.

Effectively, the byref keyword makes a contract between the caller and the callee that states that it has permission to mutate the reference. For this reason it is the safest way to perform direct mutation.

Message Passing

A very effective way to enhance the safety of state mutation is through message passing. This entails using discriminated union subtypes (see Chapter 7, "Complex Composite Types") as the enumeration of possible messages along with the data corresponding to that message. Simply put, the type name, the message, describes the change to be made while the type itself contains the data necessary to complete that change. Message passing is most commonly used when describing programming with mailboxes. However, as you'll see in the Data and State Flow subsection of Chapter 15, "Data Types," mailboxes are not necessary to gain the safety benefits of message passing.

PERFORMANCE CONSIDERATIONS

As you saw in the Data Safety section, the mutable nature of .NET constructs can make it difficult to reason about code. This can be alleviated somewhat by judicious copying, but this practice has significant performance implications. This is why F#'s immutable data structures are designed in such a way as to minimize the number of objects that must be collected.

For example, when changes to a list are made, only the parts of it that are affected by the change are replaced in the new instance. In practice, this use of intelligent implementations under the hood largely mitigates the performance cost of immutability. These, and other optimizations, are why it is always important to avoid making assumptions about performance when using F#.

One important trick to know is that you can enable timing in the F# interactive window. This is a fast way to check out the performance of new code without the need for a profiler or timing harness.

> #time;;

--> Timing now on

> let l = [0 .. 1000000];;
Real: 00:00:00.390, CPU: 00:00:00.390, GC gen0: 5, gen1: 2, gen2: 0

To turn it off, simply type the command again.

> #time;;

--> Timing now off

However, as not all optimizations are available to F# interactive, it is always best to ultimately use a profiler on Release mode generated assemblies to find areas of slow performance. At best, the #time feature can give you a general idea.

As testing every line of code written in this way would be tedious and time-consuming, it is important to understand the underlying implementation and performance characteristics of each of the basic F# data structures. For the sake of convenience, I'll expand upon what was previously stated in Chapter 7.

Lists

F# lists are immutable singly linked lists and so are designed for left-to-right access. Head and iterative access are always O(1). However, indexing to a particular element is O(n), where n is the index of the node in the list. For this reason, large F# Lists are best used with forward iterative operations only. Thankfully, functional programming is largely oriented toward the iterative processing of data and, due to their immutability and data sharing, lists are often your best choice.

Due to the O(n) element access of linked lists, when using them it is important to carefully consider the performance characteristics of your implementation. When appending two lists, it is best to place the longer of the two on the right whenever possible. This allows the entire structure of the right list to be shared with the newly created list.

> let smallList = [0 .. 100]
let bigList = [0 .. 10000000];;

> smallList @ bigList;;
Real: 00:00:00.000, CPU: 00:00:00.000, GC gen0: 0, gen1: 0, gen2: 0

> bigList @ smallList;;
Real: 00:00:02.214, CPU: 00:00:02.215, GC gen0: 29, gen1: 17, gen2: 0

Both ordering and operator use are important considerations when generating lists with recursive functions. If your recursive function appends to the right side of a list, the append will grow slower with every recursive call. However, it is also important to remember that if you have a recursive call on the left side of a list operation, your function cannot take advantage of tail recursion optimization and will both execute more slowly and be at risk of stack overflow. For these reasons, in most cases it is best to generate lists iteratively with accumulators, list comprehensions, or List.init.

A simple example of this is the reversal of a list. The following naive implementation will give O(n^2) performance.

let rec badReverse list =
    if List.isEmpty list then []
    else badReverse list.Tail @ [list.Head]

Looking at this example, two flags should immediately pop up in your mind. First, list.Head, a single element, occurs on the right side of the append operator, which means it will be quite slow.

> badReverse [0 .. 10000];;
Real: 00:00:05.105, CPU: 00:00:05.101, GC gen0: 396, gen1: 90, gen2: 1
val it : int list =
  [10000; 9999; 9998; 9997; 9996; 9995; 9994; 9993; 9992; 9991; 9990; 9989;
   9988; 9987; 9986; 9985; 9984; 9983; 9982; 9981; 9980; 9979; ...]

Second, the recursive call is on the left side and so tail recursion optimization will not be used. This means you are at risk of a stack overflow.

> badReverse [0 .. 100000];;

Process is terminated due to StackOverflowException.

If you instead thread an accumulator through your list reversal, you can perform the operation in O(n) time and without the risk of a stack overflow. Generally, the phrase threading an accumulator simply means to pass yourself intermediate values with each recursion. This topic is discussed in further depth in Chapter 16.

let appendReverse list =
    let rec recReverse rest reversed =
        if List.isEmpty rest then reversed
        else recReverse rest.Tail ([rest.Head] @ reversed)
recReverse list []

> appendReverse [0 .. 100000];;

Real: 00:00:00.033, CPU: 00:00:00.031, GC gen0: 1, gen1: 0, gen2: 0
val it : int list =
  [100000; 99999; 99998; 99997; 99996; 99995; 99994; 99993; 99992; 99991;
   99990; 99989; 99988; 99987; 99986; 99985; 99984; 99983; 99982; 99981; ...]

There's also a third issue here. A single element is repeatedly added, but append is used instead of cons. Whenever repeatedly adding a single element to a list, it's better to use the cons operator instead of creating a new list and appending it; cons simply creates a single new list node while append contains quite a lot of logic to ensure all possible states of both lists are covered. With lists up to 100,000 elements, the impact is negligible. However, when you have millions of elements the impact becomes quite significant.

let consReverse list =
    let rec recReverse rest reversed =
        if List.isEmpty rest then reversed
        else recReverse rest.Tail (rest.Head :: reversed)
    recReverse list []

> let list = [0 .. 1000000];;
Real: 00:00:00.291, CPU: 00:00:00.296, GC gen0: 7, gen1: 7, gen2: 2

> appendReverse list;;
Real: 00:00:00.345, CPU: 00:00:00.343, GC gen0: 6, gen1: 4, gen2: 1

> consReverse list;;
Real: 00:00:00.064, CPU: 00:00:00.062, GC gen0: 2, gen1: 2, gen2: 0

To simplify things further, you can use pattern matching syntax. This makes the two recursive cases stand out apart from each other better.

let patternMatchingReverse list =
    let rec recReverse rest reversed =
        match rest with
        | [] -> reversed
        | head::tail -> recReverse tail (head::reversed)
    recReverse list []

However, as in most cases, the best solution is to use the built-in list module operators as they are concise, safe, and designed with performance in mind.

> List.rev [0 .. 100000];;

Real: 00:00:00.015, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0

When using these operators, you can be confident that the underlying implementation has been done in an intelligent way and that your code will be easy for others to understand. Also, if you take a glimpse at the F# source code, you will see that library list operations are almost all implemented via pattern matching. When no existing list operation fits your need, pattern matching is most often the best choice.

Arrays

F# arrays are in all ways traditional .NET arrays. They are contiguous in memory, and just as in other .NET languages, indexing is always O(1). Also, as arrays are themselves a special class of CLR reference type, they are by far the most efficient way to store and operate on sets of other value types in memory. However, appending to, or otherwise changing the structure of arrays will often require a completely new allocation. For this reason when managing large datasets, which don't need random access, lists are the better choice.

It is also important to note that F# does not enforce immutability for arrays.

> let a = [|0 .. 5|];;
val a : int [] = [|0; 1; 2; 3; 4; 5|]

> a.[0] <- 1;;
val it : unit = ()

> a;;
val it : int [] = [|1; 1; 2; 3; 4; 5|]

When using arrays in F#, you must always keep in mind that their contents may change, especially in the case where they are passed in to a standard .NET library. Of course, this is not quite as much of a problem when you completely control access to the array. F# assignment requires a special operator that is easy to spot. Also, when designing for fast concurrency with shared mutable datasets, arrays are often the ideal choice.

However, as discussed in the data mutation section, it's best to choose the scope of your mutable data carefully. Do think carefully before using globally shared mutable arrays.

Sequences

F# Seqs are simply instances of the IEnumerable interface. Similar to lists, sequence indexing is O(n), where n is the cost of evaluating the contents of each prior element. The benefits of using sequences are that they are evaluated when called and are simple to manage. However, this delayed evaluation can come at a high performance cost if you are not careful. For example, while Seqs cost very little to allocate, if repeatedly iterated over performance can be very slow.

> let simpleSeq = seq { for i in 0 .. 1000000 do yield i }
Real: 00:00:00.000, CPU: 00:00:00.000, GC gen0: 0, gen1: 0, gen2: 0

> for i in 0 .. 10 do Seq.iter (fun i -> ()) simpleSeq
Real: 00:00:01.241, CPU: 00:00:01.232, GC gen0: 0, gen1: 0, gen2: 0

One way to mitigate this somewhat is to use Seq.cache. This function will generate a sequence that caches each value of its output when generated. The next time the sequence is used, already visited elements won't need to be evaluated. However, Seq.cache is only fast for expensive element computations. It does not remove the overhead in IEnumerable itself.

> let cachedSeq = Seq.cache simpleSeq;;
Real: 00:00:00.000, CPU: 00:00:00.000, GC gen0: 0, gen1: 0, gen2: 0

> for i in 0 .. 10 do Seq.iter (fun i -> ()) cachedSeq;;
Real: 00:00:00.688, CPU: 00:00:00.686, GC gen0: 73, gen1: 0, gen2: 0

Often a better approach for repeatedly iterated-over sequences is to convert them into a list at the first available opportunity. While conversion will carry the same cost as the initial sequence traversal, additional iterations will be much cheaper.

> let listFromSeq = Seq.toList simpleSeq
Real: 00:00:00.313, CPU: 00:00:00.312, GC gen0: 7, gen1: 2, gen2: 1

> for i in 0 .. 10 do List.iter (fun i -> ()) listFromSeq
Real: 00:00:00.031, CPU: 00:00:00.031, GC gen0: 0, gen1: 0, gen2: 0

In this particular case, due to the large number of elements, the creation of the list is rather expensive. However, with only ten iterations, the conversion to a list is well worth the initial cost as it is nearly twice as fast.

Of course, this technique is not applicable when using sequences that are evaluated based on a function that references mutable data. However, this style of using sequences should usually be avoided for the same reasons described above in the State Safety section.

Tuples

Perhaps second only to lists, tuples are one of the most frequently used data structures in F#. Most frequently, they are used to conveniently bind together sets of function inputs and outputs and make for much more readable functional code. In .NET 4.0 and later, F# tuples are .NET tuple types in the System namespace.

Although some were concerned with the design decision to implement tuples as reference types, after much testing it was found that this made little or no difference for the vast majority of applications and broadly increased tuple interoperability.[10] In cases where performance is paramount, such as number crunching in a tight loop, using a predefined struct in the place of a tuple may grant a small performance improvement.

Records

Records are similar to C# anonymous types and are used in much the same way. They allow you to quickly generate data-only classes with named fields for data processing. Similarly to tuples, the fact that records are immutable reference types may cause a small amount of additional overhead. In some cases you may find using a struct in their place grants a performance improvement.

structs

Fundamentally, F# structs are no different from those in any other .NET language. As they are value types, they have significantly different performance characteristics than classes; structs are best used only in cases where the use of many small objects creates significant performance or memory overhead.

Because structs are value types, they can be allocated on the execution stack instead of the heap and do not need to be garbage collected; structs also have much smaller memory footprints than classes. However, it is also important to note that each time a value type is passed, it is copied. For this reason, haphazard use of structs can actually reduce the performance of your program.

One example of a case where structs can be beneficial is when repeatedly creating a large number of small data instances, such as in image processing.

type StructData =
    struct
      val R: byte
      val G: byte
      val B: byte
      new( r, g, b ) = { R = r; G = g; B = b }
    end

type RecordData = { R: byte; G: byte; B: byte }

let pixels = 1920 * 1200
> let structArray = [| for i in 0 .. pixels do
                        yield new StructData( 0uy, 0uy, 0uy )|]
Real: 00:00:00.387, CPU: 00:00:00.390, GC gen0: 0, gen1: 0, gen2: 0

> let recordArray = [| for i in 0 .. pixels do
                        yield { R = 0uy; G = 0uy; B = 0uy }|]
Real: 00:00:00.846, CPU: 00:00:00.780, GC gen0: 5, gen1: 2, gen2: 0

For the best possible performance in these situations it is best to use only a combination of arrays and value types, that is structs, enums, and primitives. When exclusively value types are used, the entire data structure can be laid out contiguously in memory. This is ideal for fast member access and caching.

One of the few differences with structs in F# is that members are immutable by default. This is done via automatic property generation. As structs are primarily used to enhance performance, and the added call needed when accessing a property does take a small amount of additional time, it is useful to know that when struct members are defined as mutable they are instead generated as publicly accessible fields.

type MutableStructData =
    struct
      val mutable R: byte
      val mutable G: byte
      val mutable B: byte
      new( r, g, b ) = { R = r; G = g; B = b }
    end

As always when dealing with mutable data, it is important to consider the data safety ramifications of this design decision before making this optimization. In the vast majority of cases, you'll find the improvement in removing one call is negligible. Even with each iteration accessing every RGB value in every pixel in the equivalent of a 20MP image, my tests saw only a 10% performance improvement. Also, it's important to always keep in mind that value types are copied when passed unless they are inside of an array or another reference type. This means that when you mutate them the changes will not show up in any parent function or method.

SUMMARY

As you saw in this chapter, F# gives you the tools to mitigate many of the problems inherent in objected-oriented programming by providing a set of useful data constructs. Data immutability helps make your programs safe, parallelizable, and easy to understand.

However, F# neither enforces your use of immutable data structures nor good mutation practices. Without proper application F# code can be just as buggy, rigid, and difficult to maintain as any other language. This especially goes for the optimizations mentioned in the performance considerations section of this chapter. As the benefits only rarely outweigh the risks, it is best to always optimize only after your profiler has shown a problem. If you take only one thing away from this chapter, make it the careful consideration of all variables mutable.



[10] http://msdn.microsoft.com/en-us/magazine/dd942829.aspx

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset