Chapter 17. Pipelining and Composition

WHAT'S IN THIS CHAPTER?

  • Understanding pipelining operators

  • Understanding composition operators

  • Getting started with pipelining function composition

  • Using advanced function composition

What is it about functional languages that make them so powerful for data structure processing? One answer to this is composability. Instead of being a built-in language construct such as for or foreach, functional abstractions like map and reduce are themselves functions. They are first-class language citizens, no different than any function you might write yourself. This means that they are much more flexible because they can be built one argument at a time, passed around, and even combined with other functions.

Out of this composability comes a data-oriented programming technique called pipelining. In pipelining, functions are combined or chained together with special operators. This allows you to pass the results of one function directly and unambiguously into the next. In this way, programs can be constructed as a series of discrete data transformations. As you will see, this approach produces short and expressive data processing programs.

BASIC COMPOSITION AND PIPELINING

When writing code using composition and pipelining, there are four primary operators. Each of these is a combination of two ideas, performing an operation to the left or right and either pipelining data or composing functions. When combined, they provide a powerful set of tools that allow you to write concise and visually appealing code.

Pipelining

When pipelining, the result of each function call is passed directly into the next. Data passes through this series of pipes undergoing conversions, filters, and normalizations. Eventually, when the data reaches the other side, you have the results you need as explained by a simple set of steps.

Although programming in this style has long been considered controversial in imperative-object oriented programming languages, functional languages provide you with the tools to do this in a safe and readable way. The end result is very concise and easy-to-maintain code. LINQ is a great example of this. Although initially designed to facilitate SQL-style querying, it also can be extended quite dramatically for declarative and transformation based programming in C# and VB.NET. However, even what LINQ offers can't come close to the flexibility provided by native functional composition.

Forward Pipe (|>)

The forward pipe operator is used to pass arguments to a function from the left side or the previous line. For example, given a data set like the following:

type Place = { Name: string; Population: int }

let places = [ { Name = "New York"; Population = 9000000; }
               { Name = "Los Angeles"; Population = 4000000; }
               { Name = "Frankfurt"; Population = 700000; }
               { Name = "Tokyo"; Population = 13000000;} ]

It's a simple matter to build a query that finds the cities with a population more than 5 million and then normalize the names of those cities by converting them to uppercase.

> let over5MilUppercase =
    places
    |> List.filter (fun p -> p.Population > 5000000)
    |> List.map (fun p -> p.Name.ToUpper() );;

val over5MilUppercase : string list = ["NEW YORK"; "TOKYO"]

Don't let the fact that operators look like pure syntax fool you; just like most other constructs in F#, under the hood they are functions. This becomes obvious if we inspect the operator in the F# interactive window.

> (|>);;

val it : ('a -> ('a -> 'b) -> 'b) = <fun:it@1>

The only difference between this and a normal function is that binary operators (operators that take two arguments) pull arguments from their left and right sides and are evaluated with low precedence.

Looking at the type signature, you can see it's not all that complicated. On the left side it takes a value of one type ('a) and on the right a function ('a -> 'b). It then returns a value of the same type as its function argument ('b). The passed in function is used to convert the input type into the output type. Were we to write this function ourselves, it would look like the following:

> let pf input func = func(input);;

val pf : 'a -> ('a -> 'b) -> 'b

> (pf);;

val it : ('a -> ('a -> 'b) -> 'b) = <fun:clo@3>

So, in the preceding example, places is on the left side of the first pipe operator and so corresponds to the first ('a) in the function type signature. On the right side something slightly more complex is going on; filter is first evaluated with only one of its arguments using a language feature called partial application. This feature is discussed thoroughly in Chapter 13.

> List.filter (fun p -> p.Population > 5000000);;

val it : (Place list -> Place list) = <fun:it@4-13>

The partially applied representation of filter takes a Place list and returns a Place list, fitting nicely into the right side ('a -> 'b) of the forward pipe operator's type signature. The operator grabs this partially applied filter function from its right side and applies the places value to it from the ('a). Just as if the filter had been called directly with the list of places, another list is returned ('b).

> places
    |> List.filter (fun p -> p.Population > 5000000);;

val it : Place list = [{Name = "New York"; Population = 9000000;};
                       {Name = "Tokyo"; Population = 13000000;}]

This same process is repeated each time the pipe operator is used. On the next line, the newly filtered list of places is taken from the right and a partially applied map is taken from the right. The result of applying this list of places to the mapping function is then returned.

That's it. The forward pipe operator simply changes the order you specify function and argument. The secret is in that the operator takes the value first (from its left side) and the function second (from its right). This allows the result of the previous line to be applied to a function on the current line.

This operator is one of the most commonly used in F#. As the F# compiler is single pass, having the primary argument on the previous line also means that its type can be evaluated before the function it is applied to. This greatly improves type inference and so eliminates the need for type annotations in many cases.

Backward Pipe (<|)

The backward pipe operator is used to pass arguments in to a function from the right side or next line. It is much the same as forward, except that the order of its arguments is reversed. It takes input on its right side and applies it to a function on its left.

Take a look at the example where the cities were filtered to populations under five million and then converted to uppercase using the backward pipe operator.

> let over5MilUppercase =
    List.map (fun (p: Place) -> p.Name.ToUpper() )
    <| (List.filter (fun (p: Place) -> p.Population > 5000000)
        <| places);;

val over5MilUppercase : string list = ["NEW YORK"; "TOKYO"]

Here, the list of places is backward piped into the filter function. In turn, the result of that is then backward piped into map. The primary difference here, other than the reversed ordering, is that additional parentheses are needed to ensure correct order of operation. This is because F# evaluates everything with the same precedence in the function contents from top to bottom and left to right. Without the parentheses, the compiler would try to pipe filter directly into the partially applied map function on the previous line and compilation would fail.

Under the hood, the backward pipe operator's underlying function is exactly the same as the forward operator, except the order of arguments is reversed.

> (<|);;

val it : (('a -> 'b) -> 'a -> 'b) = <fun:it@5-2>

> let pb func input = func(input);;

val pb : ('a -> 'b) -> 'a -> 'b

> (pb);;

val it : (('a -> 'b) -> 'a -> 'b) = <fun:clo@7-3>

Here you see ('a -> 'b) corresponding to the function taken from the left, 'a corresponding to the value taken from the right, and 'b corresponding to the result. The ('a ->'b) function is used to convert 'a into 'b.

Although it is much less commonly used than the forward pipelining operator for line displacement, the backward pipelining operator is often used for avoiding parentheses around subexpressions on the same line.

Seq.map (fun x -> x * 2) (Seq.init 10 (fun i -> i * 2))

Seq.map (fun x -> x * 2) <| Seq.init 10 (fun i -> i * 2)

Both of these examples are equivalent. However, the second is arguably more readable because it has fewer parentheses. This improvement in readability can be particularly great in cases where many small operations are performed on the same line to fill the arguments of a function.

Composition

Consider the case where you would want to apply a series of operations to a data set. In object-oriented languages, you might build a series of classes based on an interface, or functions that match some delegate, and then add them to a generic collection defined to take their shared underlying type. You could then loop over this collection, applying each of these transformations to the result of the previous in turn.

The great benefit in this approach is that a series of operations can be defined dynamically at runtime. However, this methodology has a serious limitation. With this approach, each transformation must take and return the same type. A single change in type means that the processing class or delegate can't be added to your collection.

Another approach might be to perform operations as you previously saw with pipelining. The result of each function call is passed directly into the next, which eventually ends up providing you with the result you need. The equivalent to this in object-oriented languages would be chaining the result of many static method calls.

With this approach, you've solved the problem of composing operations of different types. However, as each pipelining step or static method call is compiled into place and can't be changed, you've given up the ability to define which operations are used at runtime.

In F#, you can have both of these properties through a feature known as function composition. The composition operators allow you to combine any two functions with compatible type signatures into one without executing either. Among many other things, this allows you to both build up a large collection of data transforming functions at runtime and use many different intermediate types.

Forward Composition (>>)

The forward composition operator takes two functions and composes them into a single function in which the left is applied to the input first and then the right is applied to the result of that. Although simple conceptually, this operator can profoundly change what your code can do.

Take a look at how forward composition can be used in the example from forward pipelining.

> let findNamesOfPlacesOver5MilInUppercase =
    List.filter (fun p -> p.Population > 5000000)
    >> List.map (fun p -> p.Name.ToUpper());;

val findNamesOfPlacesOver5MilInUppercase : (Place list -> string list)

> findNamesOfPlacesOver5MilInUppercase places;;

val it : string list = ["NEW YORK"; "TOKYO"]

Taking a deeper look at the actual composition, you see the partially applied filter and map functions are combined into a single function that takes the type of the first function as input and returns the type of the second.

> List.filter (fun p -> p.Population > 5000000);;

val it : (Place list -> Place list) = <fun:it@16-8>

> List.map (fun p -> p.Name.ToUpper());;

val it : (Place list -> string list) = <fun:it@17-10>
> List.filter (fun p -> p.Population > 5000000)
  >> List.map (fun p -> p.Name.ToUpper());;

val it : (Place list -> string list) = <fun:it@18-15>

Although powerful, the forward composition operator is actually just a simple function under the hood.

> (>>);;

val it : (('a -> 'b) -> ('b -> 'c) -> 'a -> 'c) = <fun:it@26-16>

The forward composition operator takes one function on the left of type ('a -> 'b) and another on the right of ('b -> 'c). It then returns another function ('a -> 'c) that takes the same type as the first function but returns the same type as the second.

> let fc fun1 fun2 input = fun2( fun1( input ) );;

val fc : ('a -> 'b) -> ('b -> 'c) -> 'a -> 'c

> (fc);;

val it : (('a -> 'b) -> ('b -> 'c) -> 'a -> 'c) = <fun:clo@28>

Internally, these two functions are combined by simply calling one inside the other. When the operator is used, the argument functions are partially applied but the input variable remains free. This allows the resulting function to be called with the expected input type at any later time.

Backward Composition (<<)

The backward composition operator takes two functions and composes them into a single function in which the right is first applied to the input and then the left is applied to the result of that. As you might have expected, this is exactly the same as forward composition except with reversed arguments.

Take a look at how the backward composition operator can be used to make a functionally equivalent example to one shown in forward pipelining.

> let findNamesOfPlacesOver5MilInUppercase =
  List.map (fun (p: Place) -> p.Name.ToUpper() )
  << List.filter (fun (p: Place) -> p.Population > 5000000);;

val findNamesOfPlacesOver5MilInUppercase : (Place list -> string list)

> findNamesOfPlacesOver5MilInUppercase places;;

val it : string list = ["NEW YORK"; "TOKYO"]

The partially applied filter and map functions are composed together. This is just as in the version from forward composition, except the argument ordering is reversed. The list of places is then passed in to this composed function and the result is returned.

As you might have suspected, the backward composition operator's signature is the same as the forward version except with reversed arguments.

> (<<);;

val it : (('a -> 'b) -> ('c -> 'a) -> 'c -> 'b) = <fun:it@7>

Just the opposite of forward composition, the left side takes a function on the left of type ('a -> 'b) and on the right of ('c -> 'a). These are combined to form a new function of type ('c -> 'b) where 'c is the input type of the function on the right side, and 'b the output type of the function on the left.

> let fb fun1 fun2 input = fun1( fun2( input ) );;

val fb : ('a -> 'b) -> ('c -> 'a) -> 'c -> 'b

> (fb);;

val it : (('a -> 'b) -> ('c -> 'a) -> 'c -> 'b) = <fun:clo@9>

This is done in exactly the same way as with forward composition. The input functions on the left and right are partially applied to the operator's internal function, which is then returned. When evaluated, the input will first be applied to the function taken from the right and then the result of that will be applied to the function from the left.

APPLYING PIPELINING AND COMPOSITION

The most important thing to keep in mind when building data pipelines is the input and output types of the functions involved. When data is pipelined into a function, that data must have the same signature as that function expects. Similarly, when composing two functions, the output type of the first function must match the input type of the second.

If you understand the C# or VB.NET type system, this should make perfect sense to you. You can't fit a square peg into a round hole, as they say. However, unlike in other .NET languages, F# will never automatically convert your data to a different type. This applies even to classes written in other languages. Both the upcasting and downcasting of class instances must always be explicitly stated. In addition, the different numeric types such as int and float must be explicitly converted to each other. Keeping this in mind is useful when writing any programs in F# but particularly important when using composition and pipelining.

From Loops to Pipelining

Coming from the imperative programming world, it takes some change in perspective to move from thinking in terms of mutations in a loop to transformations in a function pipeline. The shift in perspective is one of focusing on what you want done in discrete steps instead of how you want to do it all at the same time.

Take a look at this simple example where you are trying to decide among several vacation spots.

type VacationLocation =
    { Name: string; Pop: int; Density: int; Nightlife: int }

let destinations =
    [ { Name = "New York"; Pop = 9000000; Density = 27000; Nightlife = 9 }
      { Name = "Munich"; Pop = 1300000; Density = 4300; Nightlife = 7 }
      { Name = "Tokyo"; Pop = 13000000; Density = 15000; Nightlife = 3 }
      { Name = "Rome"; Pop =  2700000; Density = 5500; Nightlife = 5 } ]

let findVacationImperitive data =
    let mutable outputList = []
    for i = List.length data - 1 downto 0 do
        let current = data.[i]
        let size = current.Pop / current.Density
        if current.Nightlife >= 5 &&
           size >= 200 &&
           current.Density <= 8000
        then
            outputList <- List.Cons(current.Name, outputList)
    outputList;;
                                                         
From Loops to Pipelining
> findVacationImperitive destinations;; val it : string list = ["Munich"; "Rome"]

In this imperative version, the data set is passed in, and if it passes a set of predicates, the name value of the record is added to a list. There's a lot going on here that you shouldn't need to think about. First, you need to explicitly build and maintain a data structure to store data in. Second, program flow must be explicitly controlled via a backward loop and an if statement. Third, all this logic is tangled together and difficult to extract in a meaningful way. If all you want to do is find the name of a city that meets some criteria, why must the movement of every chunk of data be explicitly stated? It ends up being a tangled mess.

This example is much clearer when pipelining is used. To convert to pipelining, each of the predicates in the if statement is changed into a filter statement. After being filtered we perform the conversion to just the city name with a map. Everything else in the example can be thrown away.

let findVacationPipeline data =
    data
    |> List.filter (fun x -> x.Nightlife >= 5)
    |> List.filter (fun x -> x.Pop / x.Density >= 200)
|> List.filter (fun x -> x.Density <= 8000)
    |> List.map (fun x -> x.Name);;
                                                         
From Loops to Pipelining
> findVacationPipeline destinations;; val it : string list = ["Munich"; "Rome"]

The collection of records is simply passed through a series of filters and converted to an output format. Each step is explicit about the action it performs and can be changed or removed easily. The programmer need not spend much time thinking about exactly how each of these steps is being done.

From Pipelining to Composition

Now, what if you were building this into a website that can help millions of people find the best vacation destination? One flexible way to approach this would be to dynamically build a set of filter functions defined by values given by the user.

> let getSimpleVacationPipeline nightlifeMin sizeMin densityMax =
    List.filter (fun x -> x.Nightlife >= nightlifeMin)
    >> List.filter (fun x -> x.Pop / x.Density >= sizeMin)
    >> List.filter (fun x -> x.Density <= densityMax);;

val getSimpleVacationPipeline :
  int -> int -> int -> (VacationLocation list -> VacationLocation list)

You can just call the getSimpleVacationPipeline function with the values for each of the given filters. When called, the lambda expressions given for each filter acts as a closure over the parameter value it uses. These filter functions are then composed into a single function that is returned. You can take this composite filter function and use it to perform your filtering step at a later time.

> let myPipeline = getSimpleVacationPipeline 5 200 8000;;

val myPipeline : (VacationLocation list -> VacationLocation list)

> let applyVacationPipeline data filterPipeline =
    data
    |> filterPipeline
    |> List.map (fun x -> x.Name);;

val applyVacationPipeline :
    'a -> ('a -> VacationLocation list) -> string list

> applyVacationPipeline destinations myPipeline;;

val it : string list = ["Munich"; "Rome"]

In this example, myPipeline is the composite filter function returned by the call to getSimpleVacationPipeline. This composite filter is then passed into the applyVacationPipeline function along with the data set. This function pushes the data into the composite filter and then performs the same simple map as was used in the pure pipelining example.

Advanced Composition

Although interesting, the static set of filters in the previous example is insufficient for our use case. Any reasonable web interface would allow the user to toggle filters on and off. To gain the needed functionality, you need to go one step further and leverage a combination of option types and the identity function.

The option type has been discussed in Chapter 15, "Data Types." It simply provides a way to say a value may or may not exist. It's quite like the idea of null, but much safer. The identity function is similarly quite simple. It's just a predefined function that takes a value and immediately returns it. Consider it as a kind of a no-op used to make the dynamic gluing of transformation functions a smoother process. In F#, the identity function is given the name id.

By making each argument to your pipeline building function an option type, you can specify if each corresponding function should be added to the filter pipeline by simply selecting if the input value is some value or none.

> let getVacationPipeline nightlifeMin sizeMin densityMax searchName =
    match nightlifeMin with
    | Some(n) -> List.filter (fun x -> x.Nightlife >= n)
    | None -> id
    >> match sizeMin with
       | Some(s) -> List.filter (fun x -> x.Pop / x.Density >= s)
       | None -> id
    >> match densityMax with
       | Some(d) -> List.filter (fun x -> x.Density <= d)
       | None -> id
    >> match searchName with
       | Some(sn) -> List.filter (fun x -> x.Name.Contains(sn))
       | None -> id;;
                                                              
Advanced Composition
val getVacationPipeline : int option -> int option -> int option -> string option -> (VacationLocation list -> VacationLocation list)

Now that each of the input arguments is an option type, we can specify them as Some or None to toggle each filter on or off. When a value is passed in as None, the identity function is returned instead of a filter. Notice that a new name searching filter was added in this example as a place to demonstrate passing in none.

> let myPipeline = getVacationPipeline (Some 5) (Some 200) (Some 8000) None;;

val myPipeline : (VacationLocation list -> VacationLocation list)

> applyVacationPipeline destinations myPipeline;;

val it : string list = ["Munich"; "Rome"]

So, as values were passed in for the first three filters, they were each included in the composition. The name searching filter was left out as None was passed in for its parameter. The result is a filter exactly the same as the previous example but composed based on the availability of input.

You might have noticed that this version of getVacationPipeline was already a bit large and had a significant amount of repeated code. As a final step, the repeated code can be factored out in to a single function. This step is rather advanced but will make adding new filters in the future much easier.

> let getVacationPipeline nightlifeMin sizeMin densityMax searchName =
    let getFilter filter some =
        match some with
        | Some(v) -> List.filter (filter v)
        | None -> id
    getFilter (fun nlMax x -> x.Nightlife >= nlMax) nightlifeMin
    >> getFilter (fun sMax x -> x.Pop / x.Density >= sMax) sizeMin
    >> getFilter (fun dMin x -> x.Density < dMin) densityMax
    >> getFilter (fun sName x -> x.Name.Contains(sName)) searchName;;

                                                 
Advanced Composition
val getVacationPipeline : int option -> int option -> int option -> string option -> (VacationLocation list -> VacationLocation list)

In this example, all the repeated code has been refactored into one function named getFilter. This function takes a filtering function of two arguments and an option type. The new argument to each of the filter functions is a place where the value of the option type will be filled in if it exists. You can see this step inside getFilter as (filter v). After the missing value is filled in, the filter function is passed as an argument to List.filter. Just as in each example leading up to this, the partially composed List.filter is returned and combined with its neighbors via the function composition operator.

Although succinct and usable, the example is by no means as far as this can be taken in F#. You might imagine representing a set of possible filters as a list and their corresponding optional inputs as another. These lists could be folded over with the forward composition operator and a function similar to getFilter. However, that is only one of many possible directions to take this. When functions are just another type of data, the possibilities are virtually limitless.

SUMMARY

The ability to treat functions as data, when combined with F#'s large library of data manipulation constructs, gives you a language in which you can build very powerful data manipulation programs but which are also short and easy to read. This is one of F#'s core strengths. To leverage this strength you first need to understand the composition and pipelining operations well. Next, spend time playing with the various list manipulation functions. In very little time, you'll gain full enough intuition to build composition helper functions and possibly your own data structures. The key to this is practice.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset