Chapter 10. Type Augmentation and Generic Computations

To this point in the book, it was easy to notice the direct link between the use pattern and the correspondent language feature. For example, Chapter 5, Algebraic Data Types, clearly showed that the native F# algebraic types are substitutes for custom classes. Increased quality and speed of implementations based on algebraic data types reflect the payoff for the feature use.

In this chapter, I will consider a certain language features that do not make the payoff from their use obvious. Nevertheless, these features are ubiquitous in F#. I mean the ambivalent pair of code generalization against the code specialization.

We are going to cover the following topics:

  • Code generalization techniques, or making the same functional code applicable to multiple function argument types
  • Code specialization techniques, or making the functional code more specific than usually may be achieved by using standard features

Each of the preceding patterns carries promises of certain benefits: improved performance, more succinctness, and better static type control. The goal of this chapter is to show you how to recognize the situations when these patterns are applicable and apply them, achieving the expected benefits.

Code generalization

Let me begin by stating that F# automatically generalizes (https://msdn.microsoft.com/visualfsharpdocs/conceptual/automatic-generalization-%5bfsharp%5d) arguments of functions where it is possible to deal with the multiplicity of types.

So far, we have mostly been dealing with the generalization of data collections. That is, a sequence is agnostic to the type of its elements. That's why we were able to write functions that operate on sequences of arbitrary generic type. And F# type inference spots and carries this property on.

Suppose that we proudly implement our own function of reversing a list as follows (Ch10_1.fsx):

let reverse ls = 
    let rec rev acc = function 
    | h::t -> rev (h::acc) t 
    | []   -> acc 
    rev [] ls 

Then, we may notice that the F# compiler infers the reverse : ls:'a list -> 'alist signature for it, where 'a indicates that the function can be applied to any type of list elements. And if we decide to check out how exactly our reverse function would behave with different argument types, we may observe its behavior is consistent for the following arguments (Ch10_1.fsx):

reverse [1;2;3] 
// val it : int list = [3; 2; 1] 
reverse ["1";"2";"3"] 
// val it : string list = ["3"; "2"; "1"] 

And even if we try to slightly abuse type system and mix different boxed types (Ch10_1.fsx):

reverse [box 1.0; box 2.0M; box 3I] 
//val it : obj list = [3 {IsEven = false; 
//                        IsOne = false; 
//                        IsPowerOfTwo = false; 
//                        IsZero = false; 
//                        Sign = 1;}; 2.0M; 1.0] 

The reverse function behaves as genuinely generic with regard to the type of argument list elements.

Fine, now let's do something seemingly similar and also utterly simple, such as shifting the argument to the left by a single bit (Ch10_1.fsx):

let twice x  = x <<< 1 

All of a sudden, the F# compiler infers the very specific twice : x:int -> int function signature. What's going on? Apparently, there are some types that allow this geeky way of making the value twice as big, for example, int64. Interestingly, let's look at what happens when we follow the function definition by the usage, as following (Ch10_1.fsx):

let twice x = x <<< 1 
twice 10L 
//val twice : x:int64 -> int64 
//val it : int64 = 20L 

Now the F# compiler has seemingly changed its mind about the signature of the twice  function, this time inferring the argument and result types as int64. This action is irreversible, which means that trying to follow the preceding evaluation with twice 10 will be now rejected with this diagnostics: this expression was expected to have type int64 but it has type int here.

What's going on? Why does the generalization seemingly fail?

Statically resolved type parameters

As we just noticed, the F# compiler inferred a monomorphic type for the (<<<) operator. A step toward polymorphism would assume the ability to express it somehow - while staying within the .NET type system that only such types are fine to be twice argument that work with the operator (<<<). In other words, the compiler should deal with a type constraint.

The problem is that this kind of constraint cannot be expressed within the F# compilation target language MSIL. That is, the latest .NET CLI standard (http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-335.pdf) in the II.10.1.7 Generic Parameters section constraints a type either by being a value type or a reference type for a concrete reference type that has the default constructor. This is the problem of the .NET type system rather than the F# language or compiler.

F# 4.0 Language Specification (http://fsharp.org/specs/language-spec/4.0/FSharpSpec-4.0-latest.pdf) hints at the observed compiler behavior in section 5.2.3:

Uses of overloaded operators do not result in generalized code unless definitions are marked as inline. For example, take a look at the function as shown here: let f x = x + x

It results in an f function, which can be used only to add one type of value, such as int or float. The exact type is determined by later constraints.

Fortunately, the F# compiler may enforce these (and some other) type of constraints at compile-time using the mechanism of statically resolved type parameters. This type parameter for our (<<<) operator will have the special "hat" prefix ^a, assuming that the type is statically known at the point of compilation (compared to 'a, assuming that the type can be anything). As this kind of statically polymorphic function would require a specific manner of compilation depending on the concrete statically resolved type that is aligned with constraints, the F# compiler achieves this goal with the help of inlining, as the language specification has hinted.

Function inlining

Let me apply the inlining mechanism to this failing twice function as shown here (Ch10_1.fsx):

let inline twice' x = x <<< 1 
// val inline twice' : 
//     x: ^a ->  ^a when 
//        ^a : (static member ( <<< ) :  ^a * int32 ->  ^a) 

Note how the automatically inferred signature of the inlined twice' function carries the hat type ^a along with the sought-for constraint regarding the type parameter statically resolved at compile-time: type ^a must have an operator (<<<) with the ^a * int32 -> ^a signature.

Great, now twice' begins to look like a polymorphic function, allowing, for example, the following evaluations (Ch10_1.fsx):

twice' 5    // int32 argument 
twice' 5u   // uint32 argument 
twice' 5L   // int64 argument 
twice' 5UL  // uint64 argument 
twice' 5y   // sbyte argument 
twice' 5uy  // byte argument 
twice' 5s   // int16 argument 
twice' 5us  // uint16 argument 
twice' 5I   // biginteger argument 
twice' 5n   // nativeint argument 

At the same time, it disallows the following evaluations (Ch10_1.fsx):

twice' 5m //The type 'decimal' does not support the operator '<<<' 
twice' 5.0 // The type 'float' does not support the operator '<<<' 
twice' "5"// The type 'string' does not support the operator '<<<' 
twice' '5' // The type 'char' does not support the operator '<<<' 

The compiler has provided all the above niceties after we merely added the inline qualifier to the function definition. But you should realize that the compiler literally injects the inlined function implementation into MSIL adjusting it for argument(s) having statically resolved concrete types. Inlining is the compilation method that allows to alleviate .NET CLR limitations.

Static constraints

As it was inferred by the F# compiler for the twice' function earlier, the argument x can be any type ^a as long as ^a : (static member (<<<) : ^a * int32 -> ^a). This condition, either inferred by the F# compiler or, perhaps, intentionally imposed by the programmer is named a static constraint (https://msdn.microsoft.com/en-us/visualfsharpdocs/conceptual/constraints-%5Bfsharp%5D). There are about a dozen of argument type constraint kinds. You may check the documentation (https://docs.microsoft.com/en-us/dotnet/articles/fsharp/language-reference/generics/constraints) for their complete list.

Constraints can be combined together with the help of the and construction, as shown in the following code snippet (Ch10_2.fsx):

let inline constrained (param: ^a 
    when ^a: equality and ^a: comparison) = () 
 
type Good = Good 
 
[<NoEquality; NoComparison>]type Bad = Bad 
 
Good |> constrained 
// Compiles just fine 
Bad |> constrained 
// Error: type Bad does not support comparison constraint 

We have two discriminated unions here: Good, which is equitable and comparable by default, and Bad, which is also normally equitable and comparable, but here it is decorated with [<NoEquality; NoComparison>] attributes. As the constrained function requires its generic param argument to be of both equitable and comparable types, constrained Good gets compiled, while constrained Bad does not.

Explicit or inferred constraining?

A few years ago, I was tinkering with creating F# generic code. At that time, my thinking was along these lines: if I need to create a generic function that can deal with a restricted handful of argument types, the right approach would be to appropriately and explicitly constrain the latter. I even asked on StackOverflow (http://stackoverflow.com/q/16737675/917053) what the idiomatic approach would be. One of the answers (http://stackoverflow.com/a/16738811/917053) I want to quote here:

Explicit constraints are a way of stating what you want to do. What could be better than doing it and having the compiler statically prove that the argument is valid for the operations?

Enlightening observation, isn't it?

Nevertheless, what if you still want to explicitly limit the list of generic functions' valid argument types based on some external considerations? Then, you may use the following approach I was hinted at in the other answer (http://stackoverflow.com/a/16739483/917053) to my StackOverflow question, which is based on overloading static methods of the auxiliary type (Ch10_2.fsx):

[<AutoOpen>] 
module Restrict = 
    let inline private impl restricted = 
        printfn "%s type is OK" (restricted.GetType().FullName) 
 
    type Restricting = Restrict with 
        static member ($) (Restrict, value: byte) = impl value 
        static member ($) (Restrict, value: sbyte) = impl value 
        static member ($) (Restrict, value: int) = impl value 
        static member ($) (Restrict, value: uint32) = impl value 
        static member ($) (Restrict, value: bigint) = impl value 
 
    let inline doit restricted = Restrict $ restricted 

There are three components in the preceding snippet:

  • The private function, impl, which performs a required action on a generic restricted argument of type 'a; in other words, upon an argument that's not constrained whatsoever
  • The auxiliary type Restricting, which has a single discriminated union case, Restrict, augmented by the overloaded static member $ upon the required set of types (byte, sbyte, int, uint32, bigint picked just for the sake of illustration)
  • The user-facing function doit, whose restricted argument has been statically constrained by the other two pieces

Let's look at the workings of the preceding in the following script (Ch10_2.fsx):

doit 1uy 
doit 1y 
doit 1 
doit 1u 
doit 1I 
doit 1L // does not compile 
doit 1.0 // does not compile 
doit 1.0m // does not compile 
doit '1' // does not compile 

The first five use cases matching the restricted types compile just fine; the last four do not compile with the same diagnostics as expected:

no overloads match the op_Dollar" method

The successful execution of the first five use cases is presented in the following screenshot:

Explicit or inferred constraining?

Executing explicitly constrained generic code

Inlining scope

Inlining in F# is not limited to just module-level functions. It is perfectly fine to use inlining for static and instance methods of F# types. For example, in the following snippet, where we have type Bar with a static method doIt and type Foo on any generic type ^T that has static member doIt with the matching signature and the inline member Invoke calling the doIt method of ^T as shown here (Ch10_2.fsx):

type Bar() = 
  static member doIt() = 42 
 
type Foo< ^T when ^T: (static member doIt: unit -> int)>(data: ^T []) = 
  member inline this.Invoke () = (^T : (static member doIt : unit -> int) ()) 
 
let result = (Foo([|Bar()|]).Invoke()) 
// val result : int = 42 

An intricate matter that the preceding sample illustrates is accessing static or instance inline methods from outside of F# using, for example, a plain C# -> F# interoperability scenario. Remember that normal MSIL cannot support such constraints.

To address this subtlety the compiled MSIL for Invoke method implementation above that C# may access just throws an exception. The actual inlined body of the function is kept accessible only from F# in its assembly's metadata.

Inline optimizations

The relationship between inlining and code optimization is quite subtle. Usually, it is difficult to predict what consequences of generalization via inlining would be.

However, sometimes, it is possible to achieve tremendous performance improvements via inlining. A notorious example is the F# handling of the System.DateTime equality, where the compilation of the datetime1 = datetime2 expression involves boxing. Take a look at the following snippet (Ch10_2.fsx):

open System 
#time "on" 
let x, y = DateTime.MinValue, DateTime.MaxValue 
for i = 0 to 10000000 do x = y |> ignore 
//Real: 00:00:00.421, CPU: 00:00:00.406, GC gen0: 115, gen1: 2, gen2: 1 

Here, using just the = operator, we observe a certain garbage collection activity.

However, suppose we just inline the redefined equality operator, ==, as shown in the following snippet (Ch10_2.fsx):

open System 
#time "on" 
let inline eq<'a when 'a :> IEquatable<'a>> (x:'a) (y:'a) = x.Equals y 
let inline (==) x y = eq x y 
for i = 0 to 10000000 do x == y |> ignore 
//Real: 00:00:00.022, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0 

Then, we achieve no garbage collection activity whatsoever and an impressive 19(!) times better performance.

Writing generic code

I would like to wrap up the theme of writing generic code with just another example of a non-trivial generic function that I have implemented as a sample for a "functional programming" interview (https://infsharpmajor.wordpress.com/2013/05/03/if-google-would-be-looking-to-hire-f-programmers-part-4/): given an arbitrary positive number, find a next higher number consisting of the same digits. If that does not exist, then return the original number.

We will approach the solution as a generic function, allowing the argument to be of any integral type, or anything consisting of just digits, be it byte, BigInteger, or nativeint.

The base line approach would be to split the number into a list of digits, making the list of all digit permutations, assembling digits back into numbers, sorting the list of numbers, and finally, picking the element next to the given argument. Apparently, the time and space complexities of this "solution" are awful, so let's improve it:

  • The first useful observation of the optimized solution would be that the solution exists if a pair of adjacent digits exists in the given number, where the left-hand side is strictly less than the right-hand side.
  • The next useful observation would be that if we scan the list of digits of the given number from right to left by a sliding window of width 2, then the first pair that matches the first observation would be the place of change. Everything to the left of it (if any exists) must stay intact.
  • The final useful observation is to take the pair that matches the second observation. The sublist to the right including the right element of the pair is sorted from right to left. The digit that must substitute the left element of the pair must be the minimally greater digit from the sublist. The left element that we just substituted should be placed some place to the right, preserving the sublist order.

Now, if we concatenate (if nonempty) the sublist to the left of the changing digit, followed by the substituting digit, followed by the reversed sublist after accommodating the changing digit and convert the resulting digit list to the number, this would yield the solution with a surprisingly good time complexity of O(n) and a space complexity of O(n), where n is the number of digits in the original number. The solution snippet is as follows (Ch10_3.fsx):

let inline nextHigher number = 
    let g0 = LanguagePrimitives.GenericZero<'a> 
    let g1 = LanguagePrimitives.GenericOne<'a> 
    let g10 = (g1 <<< 3) + (g1 <<< 1) 
 
    let toDigits n = 
        let rec toDigitList digits n = 
            if n = g0 then digits 
            else toDigitList ((n % g10) :: digits) (n / g10) 
        toDigitList [] n 
 
    let fromDigits digits = 
        let rec fromDigitList n = function 
            | [] -> n 
            | h::t -> fromDigitList (n * g10 + h) t 
        fromDigitList g0 digits 
 
    let make p ll  = 
        ll |> List.rev |> List.partition ((<) p) 
        |> fun (x,y) -> (x.Head::y) @ (p::(x.Tail)) 
 
    let rec scan (changing: 'a list) source = 
        match source with 
        | [] -> changing 
        | h::t -> if h >= changing.Head then 
                    scan (h::changing) t 
                  else 
                    (List.rev t) @ (make h changing) 
 
    number |> toDigits 
           |> List.rev |> fun x -> scan [(x.Head)] (x.Tail) 
           |> fromDigits 

Let's see this in action by running some usage cases via FSI as shown in the following screenshot:

Writing generic code

Generic implementation of non-trivial function

Take into account the complexity of the static constraint expression that the F# compiler has inferred for nextHigher, as shown in the preceding screenshot. It would be really challenging to come up with an expression that complicated from the top of your head. Let the compiler do its job indeed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset