To this point in the book, it was easy to notice the direct link between the use pattern and the correspondent language feature. For example, Chapter 5, Algebraic Data Types, clearly showed that the native F# algebraic types are substitutes for custom classes. Increased quality and speed of implementations based on algebraic data types reflect the payoff for the feature use.
In this chapter, I will consider a certain language features that do not make the payoff from their use obvious. Nevertheless, these features are ubiquitous in F#. I mean the ambivalent pair of code generalization against the code specialization.
We are going to cover the following topics:
Each of the preceding patterns carries promises of certain benefits: improved performance, more succinctness, and better static type control. The goal of this chapter is to show you how to recognize the situations when these patterns are applicable and apply them, achieving the expected benefits.
Let me begin by stating that F# automatically generalizes (https://msdn.microsoft.com/visualfsharpdocs/conceptual/automatic-generalization-%5bfsharp%5d) arguments of functions where it is possible to deal with the multiplicity of types.
So far, we have mostly been dealing with the generalization of data collections. That is, a sequence is agnostic to the type of its elements. That's why we were able to write functions that operate on sequences of arbitrary generic type. And F# type inference spots and carries this property on.
Suppose that we proudly implement our own function of reversing a list as follows (Ch10_1.fsx
):
let reverse ls = let rec rev acc = function | h::t -> rev (h::acc) t | [] -> acc rev [] ls
Then, we may notice that the F# compiler infers the reverse : ls:'a list -> 'alist
signature for it, where 'a
indicates that the function can be applied to any type of list elements. And if we decide to check out how exactly our reverse
function would behave with different argument types, we may observe its behavior is consistent for the following arguments (Ch10_1.fsx
):
reverse [1;2;3] // val it : int list = [3; 2; 1] reverse ["1";"2";"3"] // val it : string list = ["3"; "2"; "1"]
And even if we try to slightly abuse type system and mix different boxed types (Ch10_1.fsx
):
reverse [box 1.0; box 2.0M; box 3I] //val it : obj list = [3 {IsEven = false; // IsOne = false; // IsPowerOfTwo = false; // IsZero = false; // Sign = 1;}; 2.0M; 1.0]
The reverse
function behaves as genuinely generic with regard to the type of argument list elements.
Fine, now let's do something seemingly similar and also utterly simple, such as shifting the argument to the left by a single bit (Ch10_1.fsx
):
let twice x = x <<< 1
All of a sudden, the F# compiler infers the very specific twice : x:int -> int
function signature. What's going on? Apparently, there are some types that allow this geeky way of making the value twice as big, for example, int64
. Interestingly, let's look at what happens when we follow the function definition by the usage, as following (Ch10_1.fsx
):
let twice x = x <<< 1 twice 10L //val twice : x:int64 -> int64 //val it : int64 = 20L
Now the F# compiler has seemingly changed its mind about the signature of the twice
function, this time inferring the argument and result types as int64
. This action is irreversible, which means that trying to follow the preceding evaluation with twice 10
will be now rejected with this diagnostics: this expression was expected to have type int64
but it has type int
here.
What's going on? Why does the generalization seemingly fail?
As we just noticed, the F# compiler inferred a monomorphic type for the (<<<
) operator. A step toward polymorphism would assume the ability to express it somehow - while staying within the .NET type system that only such types are fine to be twice
argument that work with the operator (<<<
). In other words, the compiler should deal with a type constraint.
The problem is that this kind of constraint cannot be expressed within the F# compilation target language MSIL. That is, the latest .NET CLI standard (http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-335.pdf) in the II.10.1.7 Generic Parameters section constraints a type either by being a value type or a reference type for a concrete reference type that has the default constructor. This is the problem of the .NET type system rather than the F# language or compiler.
F# 4.0 Language Specification (http://fsharp.org/specs/language-spec/4.0/FSharpSpec-4.0-latest.pdf) hints at the observed compiler behavior in section 5.2.3:
Uses of overloaded operators do not result in generalized code unless definitions are marked as inline. For example, take a look at the function as shown here:
let f x = x + x
It results in an f function, which can be used only to add one type of value, such as int or float. The exact type is determined by later constraints.
Fortunately, the F# compiler may enforce these (and some other) type of constraints at compile-time using the mechanism of statically resolved type parameters. This type parameter for our (<<<
) operator will have the special "hat" prefix ^a
, assuming that the type is statically known at the point of compilation (compared to 'a
, assuming that the type can be anything). As this kind of statically polymorphic function would require a specific manner of compilation depending on the concrete statically resolved type that is aligned with constraints, the F# compiler achieves this goal with the help of inlining, as the language specification has hinted.
Let me apply the inlining mechanism to this failing twice
function as shown here (Ch10_1.fsx
):
let inline twice' x = x <<< 1 // val inline twice' : // x: ^a -> ^a when // ^a : (static member ( <<< ) : ^a * int32 -> ^a)
Note how the automatically inferred signature of the inlined twice'
function carries the hat type ^a
along with the sought-for constraint regarding the type parameter statically resolved at compile-time: type ^a
must have an operator (<<<
) with the ^a * int32 -> ^a
signature.
Great, now twice'
begins to look like a polymorphic function, allowing, for example, the following evaluations (Ch10_1.fsx
):
twice' 5 // int32 argument twice' 5u // uint32 argument twice' 5L // int64 argument twice' 5UL // uint64 argument twice' 5y // sbyte argument twice' 5uy // byte argument twice' 5s // int16 argument twice' 5us // uint16 argument twice' 5I // biginteger argument twice' 5n // nativeint argument
At the same time, it disallows the following evaluations (Ch10_1.fsx
):
twice' 5m //The type 'decimal' does not support the operator '<<<' twice' 5.0 // The type 'float' does not support the operator '<<<' twice' "5"// The type 'string' does not support the operator '<<<' twice' '5' // The type 'char' does not support the operator '<<<'
The compiler has provided all the above niceties after we merely added the inline
qualifier to the function definition. But you should realize that the compiler literally injects the inlined function implementation into MSIL adjusting it for argument(s) having statically resolved concrete types. Inlining is the compilation method that allows to alleviate .NET CLR limitations.
As it was inferred by the F# compiler for the twice'
function earlier, the argument x
can be any type ^a
as long as ^a : (static member (<<<) : ^a * int32 -> ^a).
This condition, either inferred by the F# compiler or, perhaps, intentionally imposed by the programmer is named a static constraint (https://msdn.microsoft.com/en-us/visualfsharpdocs/conceptual/constraints-%5Bfsharp%5D). There are about a dozen of argument type constraint kinds. You may check the documentation (https://docs.microsoft.com/en-us/dotnet/articles/fsharp/language-reference/generics/constraints) for their complete list.
Constraints can be combined together with the help of the and
construction, as shown in the following code snippet (Ch10_2.fsx
):
let inline constrained (param: ^a when ^a: equality and ^a: comparison) = () type Good = Good [<NoEquality; NoComparison>]type Bad = Bad Good |> constrained // Compiles just fine Bad |> constrained // Error: type Bad does not support comparison constraint
We have two discriminated unions here: Good
, which is equitable and comparable by default, and Bad
, which is also normally equitable and comparable, but here it is decorated with [<NoEquality; NoComparison>]
attributes. As the constrained
function requires its generic param
argument to be of both equitable and comparable types, constrained Good
gets compiled, while constrained Bad
does not.
A few years ago, I was tinkering with creating F# generic code. At that time, my thinking was along these lines: if I need to create a generic function that can deal with a restricted handful of argument types, the right approach would be to appropriately and explicitly constrain the latter. I even asked on StackOverflow (http://stackoverflow.com/q/16737675/917053) what the idiomatic approach would be. One of the answers (http://stackoverflow.com/a/16738811/917053) I want to quote here:
Explicit constraints are a way of stating what you want to do. What could be better than doing it and having the compiler statically prove that the argument is valid for the operations?
Enlightening observation, isn't it?
Nevertheless, what if you still want to explicitly limit the list of generic functions' valid argument types based on some external considerations? Then, you may use the following approach I was hinted at in the other answer (http://stackoverflow.com/a/16739483/917053) to my StackOverflow question, which is based on overloading static methods of the auxiliary type (Ch10_2.fsx
):
[<AutoOpen>] module Restrict = let inline private impl restricted = printfn "%s type is OK" (restricted.GetType().FullName) type Restricting = Restrict with static member ($) (Restrict, value: byte) = impl value static member ($) (Restrict, value: sbyte) = impl value static member ($) (Restrict, value: int) = impl value static member ($) (Restrict, value: uint32) = impl value static member ($) (Restrict, value: bigint) = impl value let inline doit restricted = Restrict $ restricted
There are three components in the preceding snippet:
impl
, which performs a required action on a generic restricted argument of type 'a
; in other words, upon an argument that's not constrained whatsoeverRestricting
, which has a single discriminated union case, Restrict
, augmented by the overloaded static member $
upon the required set of types (byte
, sbyte
, int
, uint32
, bigint
picked just for the sake of illustration)doit
, whose restricted argument has been statically constrained by the other two piecesLet's look at the workings of the preceding in the following script (Ch10_2.fsx
):
doit 1uy doit 1y doit 1 doit 1u doit 1I doit 1L // does not compile doit 1.0 // does not compile doit 1.0m // does not compile doit '1' // does not compile
The first five use cases matching the restricted types compile just fine; the last four do not compile with the same diagnostics as expected:
no overloads match the
op_Dollar"
method
The successful execution of the first five use cases is presented in the following screenshot:
Inlining in F# is not limited to just module-level functions. It is perfectly fine to use inlining for static and instance methods of F# types. For example, in the following snippet, where we have type Bar
with a static method doIt
and type Foo
on any generic type ^T
that has static member doIt
with the matching signature and the inline
member Invoke
calling the doIt
method of ^T
as shown here (Ch10_2.fsx
):
type Bar() = static member doIt() = 42 type Foo< ^T when ^T: (static member doIt: unit -> int)>(data: ^T []) = member inline this.Invoke () = (^T : (static member doIt : unit -> int) ()) let result = (Foo([|Bar()|]).Invoke()) // val result : int = 42
An intricate matter that the preceding sample illustrates is accessing static or instance inline
methods from outside of F# using, for example, a plain C# -> F# interoperability scenario. Remember that normal MSIL cannot support such constraints.
To address this subtlety the compiled MSIL for Invoke
method implementation above that C# may access just throws an exception. The actual inlined body of the function is kept accessible only from F# in its assembly's metadata.
The relationship between inlining and code optimization is quite subtle. Usually, it is difficult to predict what consequences of generalization via inlining would be.
However, sometimes, it is possible to achieve tremendous performance improvements via inlining. A notorious example is the F# handling of the System.DateTime
equality, where the compilation of the datetime1 = datetime2
expression involves boxing. Take a look at the following snippet (Ch10_2.fsx
):
open System #time "on" let x, y = DateTime.MinValue, DateTime.MaxValue for i = 0 to 10000000 do x = y |> ignore //Real: 00:00:00.421, CPU: 00:00:00.406, GC gen0: 115, gen1: 2, gen2: 1
Here, using just the =
operator, we observe a certain garbage collection activity.
However, suppose we just inline the redefined equality operator, ==
, as shown in the following snippet (Ch10_2.fsx
):
open System #time "on" let inline eq<'a when 'a :> IEquatable<'a>> (x:'a) (y:'a) = x.Equals y let inline (==) x y = eq x y for i = 0 to 10000000 do x == y |> ignore //Real: 00:00:00.022, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0
Then, we achieve no garbage collection activity whatsoever and an impressive 19(!) times better performance.
I would like to wrap up the theme of writing generic code with just another example of a non-trivial generic function that I have implemented as a sample for a "functional programming" interview (https://infsharpmajor.wordpress.com/2013/05/03/if-google-would-be-looking-to-hire-f-programmers-part-4/): given an arbitrary positive number, find a next higher number consisting of the same digits. If that does not exist, then return the original number.
We will approach the solution as a generic function, allowing the argument to be of any integral type, or anything consisting of just digits, be it byte
, BigInteger
, or nativeint
.
The base line approach would be to split the number into a list of digits, making the list of all digit permutations, assembling digits back into numbers, sorting the list of numbers, and finally, picking the element next to the given argument. Apparently, the time and space complexities of this "solution" are awful, so let's improve it:
Now, if we concatenate (if nonempty) the sublist to the left of the changing digit, followed by the substituting digit, followed by the reversed sublist after accommodating the changing digit and convert the resulting digit list to the number, this would yield the solution with a surprisingly good time complexity of O(n)
and a space complexity of O(n)
, where n is the number of digits in the original number. The solution snippet is as follows (Ch10_3.fsx
):
let inline nextHigher number = let g0 = LanguagePrimitives.GenericZero<'a> let g1 = LanguagePrimitives.GenericOne<'a> let g10 = (g1 <<< 3) + (g1 <<< 1) let toDigits n = let rec toDigitList digits n = if n = g0 then digits else toDigitList ((n % g10) :: digits) (n / g10) toDigitList [] n let fromDigits digits = let rec fromDigitList n = function | [] -> n | h::t -> fromDigitList (n * g10 + h) t fromDigitList g0 digits let make p ll = ll |> List.rev |> List.partition ((<) p) |> fun (x,y) -> (x.Head::y) @ (p::(x.Tail)) let rec scan (changing: 'a list) source = match source with | [] -> changing | h::t -> if h >= changing.Head then scan (h::changing) t else (List.rev t) @ (make h changing) number |> toDigits |> List.rev |> fun x -> scan [(x.Head)] (x.Tail) |> fromDigits
Let's see this in action by running some usage cases via FSI as shown in the following screenshot:
Take into account the complexity of the static constraint expression that the F# compiler has inferred for nextHigher
, as shown in the preceding screenshot. It would be really challenging to come up with an expression that complicated from the top of your head. Let the compiler do its job indeed.