Chapter 15. Data Types

WHAT'S IN THIS CHAPTER?

  • Understanding ambiguously defined data

  • Encapsulating state in types

  • Leveraging discriminated unions

  • Defining recursive data types

Specifying types is often thought of as merely a way to tell your computer how to handle the contents of some location in memory. However, as time has progressed, it has been discovered that data types actually can do much more. Beyond telling the compiler which operations to perform on binary data, types can provide a powerful way to label the conceptual meaning of your data. In effect, they become contracts about the semantic meaning of that data. Both data and functions labeled in this way are easier to reason about and can help to ensure the correctness of your code.

AMBIGUOUSLY TYPED DATA

A problem rife in current mainstream design methodologies is the frequent use of ambiguously typed data. A bool, a double, an exception and a null are all very general. They each carry no meaning other than that which is given to them by their surrounding context. The compiler can do nothing to check if your method inputs, and their internal uses are valid.

What is stopping you from passing a double that represents a probability into a method that wants a specific scalar measurement? Even NASA, with its legendary software QA, has experienced defects of this type. One of these failures has even caused the failure of a very expensive mission. You may be surprised to learn that null references and mishandled exceptions are exactly the same class of defect. They are both caused by inadequately constrained inputs and outputs. They both also cost countless hours in wasted bug-fixing time each year.

To combat this, F# has two features: units of measure (See Chapter 3, "Primitive Types") and discriminated unions (See Chapter 7, "Complex Composite Types"). Units of measure allow you to give semantic meaning to basic types and ensure that they are not allowed in places they don't belong. Discriminated unions allow you to do the same but for any type of function inputs and outputs. In leveraging these constructs you can greatly reduce the number of bugs in your programs.

FAILING FAST

The fail fast methodology applies well to all types of programming. When a program is designed to fail fast, computation isn't wasted, incorrect state changes are not made, and you are left in a better situation to correctly compensate for the specific type of failure right when it happens.

In using object-oriented design, you often think of failing fast as most important because incorrect state changes must be avoided at all costs. However, in statically checked functional programming, you think of failing fast to prevent incorrect programs from ever executing in the first place. Compile time is the fastest type of failure, the easiest to correct, and has the least impact on your users. The more classes of error that your compiler can protect you from, the more likely your code is to be correct after the first successful compilation.

SPECIFICITY

As mentioned in the chapter overview, using runtime preconditions to constrain data is a less than ideal solution. This practice greatly increases the chance of runtime failure from untested edge cases or errors introduced by the modification of code. If instead you can encode the specific meaning and constraints implied by that meaning into your types, you can easily avoid most of these types of errors. F#'s powerful type system allows you to do just that, and without a significant increase in the size of your code, a decrease in its performance or an increased maintenance cost.

In the data safety section of Chapter 14, "Immutable Data," you saw how immutability can help enhance your ability to reason about code. However, although it is easier to reason about the state of an immutable list, immutability in no way constrains its contents. That is, you can still pass an unordered list into a function that requires an ordered one.

let nLargestNumbers n sortedNumbers =
    let rec nFirstElements n list =
        match n with
        | 0 -> []
        | _ -> match list with
               | [] -> []
               | h::t -> [h] @ nFirstElements (n - 1) t
    nFirstElements n sortedNumbers
                                                              
SPECIFICITY

In some cases, your first inclination might be to check the order of that list as a precondition. However, ensuring the correct ordering of a list is an O(n) operation. As previously shown, you could just order the list each time the function is called, but this is an O(nLog(n)) operation. Wouldn't it be better if you could somehow encode the ordering of the list in its type?

As you'll see in this section, by leveraging discriminated unions you can both enforce these states for function inputs and label the state of your function outputs.

Option as an Example

A great place to start when thinking about the F# type system is the option type. As discussed in Chapter 5, "Composite Types," the option type allows you to encode the lack of a value in a much more type safe way than null. The reason the option type is such a great place to start is that this same idea of encoding meaning in the type system can be used in many other ways.

let getFormattedWebData url =
    let webData = getWebData url
    match webData with
    | Some(data) -> formatData data
    | None -> None

Option wraps another type and must be explicitly unwrapped later to be used. The returned information about the state must be dealt with before the data can be touched. As Option<T> must be explicitly checked before being opened, you are forced to verify its contents before trying to perform any computation with it. The great value in this is that when you have instances of types that aren't options, you can be sure that they contain a value. This is in contrast with null, which may always be present and so must repeatedly be checked for, thus littering your code with frequent precondition checks and potentially causing unexpected runtime null reference exceptions.

Now, it is important to know that when dealing with standard CLR libraries you may still encounter null in F#. In these cases it is always best to wrap these calls to return an option type immediately. However, when using with the F# specific libraries, the option type is used by default and this need not be a consideration.

Encapsulating State in Types

You just learned about how the option type gives you the advantage of no longer having to perform runtime precondition null checks on data's state as your checks are instead being enforced by the type system. This same idea can be extended beyond simply replacing null. In actuality, it can be used to encode and enforce any discretely identifiable piece of state information.

As a trivial example, consider an ordered list. With some algorithms the difference between having an appropriately ordered data set or an unordered data set can be the difference between O(n) and possible O(n^2) performance. Others, which are written to depend on ordering, will not function at all.

One example of the later is the removal of duplicates. With a sorted list this is a simple O(n) operation. Without ordering, this process requires a completely different, and much more complex, algorithm.

let removeDupesFromOrdered orderedList =
    List.foldBack (fun element noDupes ->
        match noDupes with
        | [] -> [element]
        | (h::_) -> if element = h then noDupes
                    else [element] @ noDupes)
        orderedList []

> removeDupesFromOrdered [0; 1; 1; 2; 3; 4; 5; 5; 6];;
val it : int list = [0; 1; 2; 3; 4; 5; 6]

What if you want to ensure that only a sorted list could be passed in to your function? In C# this would likely mean inheriting from ReadOnlyCollection and perhaps adding an extension method to list which returned a new sorted instance. However, this would mean writing a significant amount of code and, as constructors are not inherited, potentially updating that code with each additional .NET framework release.

However, in F# you can leverage the discriminated union feature and do the same thing with significantly less code, in a similar way to how the option type is implemented.

type OrderedList<'a> = | OrderedList of 'a list

let toOrderedList list = OrderedList(List.sort list)

let toList (OrderedList orderedList) = orderedList


let removeDuplicatesFromOrderedList (orderedList: OrderedList<'a>) =
    let removeDupes list =
        List.foldBack (fun element noDupes ->
            match noDupes with
            | [] -> [element]
            | (h::_) -> if element = h then noDupes
                        else element :: noDupes)
            list []
    match orderedList with
    | OrderedList list -> OrderedList(removeDupes list)
                                                                 
Encapsulating State in Types

As you can see here, you can no longer use normal lists with this version of the removeDuplicates.

removeDuplicatesFromOrderedList [0; 1; 1; 2; 2; 3]
error FS0001: This expression was expected to have type
    OrderedPropertyList<'a>
but here has type
    'b list

To use this function you now must wrap your list to encode in the type system that it is ordered.

> let orderedList = toOrderedList [0; 1; 1; 2; 2; 3]
val orderedList : OrderedList<int> = OrderedList [0; 1; 1; 2; 2; 3]

> removeDuplicatesFromOrderedList orderedList
val it : OrderedList<int> = OrderedList [0; 1; 2; 3]

Even more interesting, you can encode several different possible data states into one discriminated union type. This allows your functions to take or return many different information states and, in so doing, behave in the best possible way for every possible situation.

type PropertyList<'a> =
    | Normal of 'a list
    | Ordered of 'a list
    | NoDuplicates of 'a list

let CreateNoDuplicatesListFromPropertyList list =
    function
    | NoDuplicates(_) -> list
    | Normal(noProperty) ->
        NoDuplicates (noProperty |> Set.ofList |> List.ofSeq)
    | Ordered(ordered) ->
        NoDuplicates (List.foldBack (fun element noDupes ->
            match noDupes with
            | [] -> [element]
            | (h::_) -> if element = h then noDupes
                        else element :: noDupes) ordered [])
                                                                
Encapsulating State in Types

By taking information that would have been implicit in the ordering of your calls and using a type wrapper to make this assumption explicit, you are moving contextual constraints into the type system. This will increase the compile-time safety of your code by ensuring that your data will only go into the correct functions. This also allows you to verify things that would otherwise be time consuming to explicitly verify with preconditions or class wrappers, such as the internal state of a list.

Avoiding Exceptions

One significant enhancement that comes from the wrapping of state in a type is the lack of need for exceptions. Exceptions are both costly in terms of performance and dangerous in terms of behavior. Also, the need for many try blocks detracts significantly from the readability of your code.

In most cases, exceptions are not all that exceptional and so need not behave differently than a normal function return value. In the vast majority of cases, they are simply a way of having multiple function output types in languages that lack discriminated unions.

In F#, instead of frequently using exceptions, it is better to encode your various return states within discriminated unions. In cases where you might be tempted to return null, you use the option type as previously described. However, when you want to create your own set of return types, you need to define your own discriminated union.

type UriOutput =
    | Uri of System.Uri
    | Malformed of string

let buildUri stringUri =
    try Uri( new System.Uri(stringUri) )
    with | _ -> Malformed(stringUri)

As in this example, it is often advantageous to wrap existing .NET method calls in functions that return discriminated unions. In this way, you can greatly increase code readability. In F#, exceptions, along with their unseemly try-with blocks, are no longer necessary in most cases. When every state is enumerated explicitly, your code is much easier to maintain and is significantly easier to read.

Data and State Flow

Just as you saw with exceptions, discriminated unions allow you to encode many different data states directly into your type and so ensure that they are properly handled. Designing systems to behave in this way is a simple four-step process. As an example, consider a WebClient interface that takes either a Uri or a String and returns the correct data structure for the type of data found.

First, enumerate your input states with a discriminated union type.

type WebClientInput =
    | StringInput of String
    | UriInput of System.Uri
                                                    
Data and State Flow

Second, do the same for your possible output states.

type WebClientOutput =
    | MalformedUri of string
    | TextContent of string
    | BinaryContent of byte []
    | NoContent
    | WebClientException of System.Net.WebException
                                                        
Data and State Flow

Third, you must build your function to return each of these states in the appropriate situation.

let downloadWithWebClient (inputUri: WebClientInput) =
    let downloadFromUri (uri: System.Uri) =
        try
            use client = new System.Net.WebClient()
            let dlData = client.DownloadData(uri)
            if dlData.Length = 0 then NoContent
            else if (client.ResponseHeaders.["content-type"]
                           .StartsWith(@"text/"))
                then
                    let dlText =
                        System.Text.Encoding.Default.GetString(dlData)
                    TextContent(dlText)
                else
                    BinaryContent(dlData)
        with
           | :? System.Net.WebException as e -> WebClientException(e)
    match inputUri with
    | UriInput(classUri) -> downloadFromUri classUri
    | StringInput(stringUri) ->
match buildUri stringUri with
        | Uri(s) -> downloadFromUri s
        | Malformed(s) -> MalformedUri(s)
                                                         
Data and State Flow

Finally, consume and appropriately handle your expected return states.

let printWebClientOutput clientOutput =
    match clientOutput with
    | MalformedUri(uri) -> printfn "Input Uri was malformed: %s" uri
    | TextContent(content) -> printfn "Page Content: %s" content
    | BinaryContent(content) -> printfn "Binary Data: %d" content.Length
    | NoContent -> printfn "No content was found."
    | WebClientException(e) -> printsn "Exception: %s" (e.ToString())

                                                     
Data and State Flow

The F# compiler issue warnings when not all returned states are handled. This provides a much safer style of data handling than exceptions that are in no way encoded in the function signature. Also, as match statements support wildcards, you need to consider only the relevant states to your current context.

open System.IO

let downloadToFile (inputUri: WebClientInput) outputLocation =
    match downloadWithWebClient inputUri with
    | TextContent(text) -> File.WriteAllText( outputLocation, text )
    | BinaryContent(binary) -> File.WriteAllBytes( outputLocation, binary )
    | _ -> printfn "Download Failed"
                                                              
Data and State Flow

It is important to keep in mind that when using wildcards you will no longer receive compiler warnings when some states are not covered in your match statement. For this reason, it is usually best to explicitly match all states in production code. This ensures that potentially relevant states will not be missed with future code changes.

In this example, all your possible inputs and outputs are discretely defined. The beauty in this is that you can now handle these cases in an explicit and elegant way. If you wanted to add an additional input type, you would need only extend WebClientInput and add another case to your match statement. Similarly, if you wanted to add another output case, you simply extend WebClientOutput and add that case as a new return value of your function. This makes it simple to add new input types and output states as the need arises.

Recursively Defined Data Types

Beyond their use in increasing type safety, discriminated unions may also be used to define complex data structures through recursive type definitions. Recursive type definitions are discriminated unions in which some subtypes leverage either themselves or a child type in its definition. One simple example of this would be a binary tree.

type BinaryTree<'a> =
    | Node of BinaryTree<'a> * BinaryTree<'a>
    | Leaf of 'a
                                                             
Recursively Defined Data Types

In this case, a node may either be a leaf or contain a tuple of two more binary trees. This technique can be leveraged to quickly build data structures that would have taken orders of magnitude more code in other languages.

One might argue that, although the definition is simplified, it may be more difficult to implement functionality for this data structure based on the nodes themselves not containing any methods. As you'll see in the following depth-first search example, this is not the case. It is a simple matter to externalize what would have previously been object-oriented code.

let rec dfs tree leafData =
    match tree with
    | Leaf(l) -> if l = leafData then Some(l) else None
    | Node(a,b) -> let dfsA = dfs a leafData in
                   if Option.isSome dfsA then dfsA
                   else dfs b leafData

let binaryTree =
    Node(
        Node( Leaf(1), Leaf(2) ),
        Node( Leaf(3), Node( Leaf(4), Leaf(5) ) ) );;
val binaryTree : BinaryTree<int> =
  Node (Node (Leaf 1,Leaf 2),Node (Leaf 3,Node (Leaf 4,Leaf 5)))

dfs binaryTree 5;;
val it : int option = Some 5

dfs binaryTree 33;
val it : int option = None
                                                                  
Recursively Defined Data Types

As you can see here, recursive data type definitions are quite powerful constructs. When combined with recursive functions, they can be leveraged to build what would otherwise be complex data and difficult to maintain structures with very little code.

Although perhaps a bit intimidating at first, with a little practice it's a trivial matter to go from a definition in which an object calls its children to one in which a function recurses on each data structure node. In most cases, this can be done by replacing what would have been the this construct with the object operated on. Getting your head around this is one of many paradigm shifts necessary to become effective at F# when coming from an imperative language.

SUMMARY

Although it may seem more difficult to think carefully about the best way to type your data, it is not significantly different than spending time thinking about how you might structure your classes in object-oriented programming. A little bit of time spent upfront thinking conceptually about your data can give great benefits in terms of clarity and safety.

By making the compiler work for you, you can catch many more kinds of bugs at compile-time without having to hope that your tests check for those particular types of edge cases. In many cases, you can completely avoid using exceptions and do not need to deal with the extra conceptual overhead they introduce into your code. This all adds up to less technical debt and higher quality code. It's a big win for both the programmer writing the code and those who must maintain it later.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset