The conventional engineering vision of data transformations is that they occur over finite collections materialized in the memory, hence allowing these collections to be enumerated with Seq.length
, yielding a number of elements. However, the F# sequences (as well as .NET IEnumerable<T>
per se) grant the following generalization: in some cases, a more math-centric vision might be useful, which suggests looking at sequences as countable but not necessarily finite.
A meticulous reader may immediately object that the countable entity, when applied to practical computing, is necessarily finite because eventually, it is limited by underlying physical hardware, which comes out in boundary values, for example:
System.Int32.MaxValue = 2147483647 System.Int64.MaxValue = 9223372036854775807L
However, I would oppose this objection by saying that this mere consideration does not in any way limit the length of the F# sequences that might be produced. As a proof, let's implement at the low level without using any F# syntactic sugar the repeater, or sequence that when being given an element of any type returns the endless repetition of the given element.
I will begin with a plain vanilla IEnumerator<'T>
implementation as shown in the following code (Ch6_3.fsx
):
type private Repeater<'T>(repeated) = let _repeated = repeated interface System.Collections.Generic.IEnumerator<'T> with member x.Current = _repeated interface System.Collections.IEnumerator with member x.Current = box _repeated member x.MoveNext() = true member x.Reset() = () interface System.IDisposable with member x.Dispose() = ()
The preceding snippet is quite straightforward. The Repeater<'T>
type defines a class with which the single default constructor obtains the element to be repeated as repeated
and persists it within the class instance as _repeated
.
Then, as a fulfillment of the System.Collections.Generic.IEnumerator<'T>
contract, this interface is implemented with the single property Current
returning the persisted _repeated
value.
Then, the implementation of the nongeneric System.Collections.IEnumerator
interface follows with its three contract methods. Here is the place where the desired sequence behavior is coined: the Current
untyped property also returns a persisted _repeated
value, but this time, it's boxed according to the contract, yielding obj
. The MoveNext()
method, as Energizer Bunny says, should keep going, going, going... so that it always returns true
, which means that the next element is available no matter what. The Reset()
legacy method is just a stub.
Finally, a bogus implementation of System.IDisposable
that is required by the IEnumerator<'T>
contract completes the implementation.
Now, for the usage convenience, I add a thin wrapper that upcasts the implemented interface of Repeater<'T>
to the explicit System.Collections.Generic.IEnumerator<'T>
as shown in the following code (Ch6_3.fsx
):
let repeat<'T>(e) = (new Repeater<'T>(e) :> System.Collections.Generic.IEnumerator<'T>)
Finally, a generic makeSeq
shim function provides the conversion of any IEnumerator<'T>
into the corresponding seq<'T>
sequence by implementing both generic and nongeneric flavors of IEnumerable
as shown in the following code (Ch6_3.fsx
):
let makeSeq enumerator = { new System.Collections.Generic.IEnumerable<'U> with member x.GetEnumerator() = enumerator interface System.Collections.IEnumerable with member x.GetEnumerator() = (enumerator :> System.Collections.IEnumerator) }
Here, the enumerator
argument provides the underlying IEnumerator<'T>
for both implementations of IEnumerable
constituting an arbitrary F# seq
.
It's time for field tests! Executing the freshly created makeSeq
function with three different arguments that represent the repeat '.'
, repeat 42
, and repeat
"Hooray!"
enumerators in FSI yields sequences of indefinite length of the corresponding types, as demonstrated in the following screenshot:
However, how can we prove that these sequences are indeed of indefinite length? Ironically, only by counting elements: if for any indiscriminately big number, the sequence yields that many elements, then that is the proof that these sequences are of indefinite length. Unfortunately, this is exactly where we hit the already mentioned counting problem: counting might be effectively limited by the underlying hardware.
But wait a minute; .NET provides a numeric type that, for all practical purposes, represents an arbitrarily large countable System.Numerics.BigInteger (https://msdn.microsoft.com/en-us/library/system.numerics.biginteger(v=vs.110).aspx). So, it would be nice to base the counting on this type.
Assuming that you are not afraid of challenges, it would be a good exercise to implement a generic technique of counting not limited by standard int
. For F#, this task is not very complicated. I would suggest the following idiomatic approach (Ch6_3.fsx
):
let inline traverse n s = let counter = (Seq.zip (seq { LanguagePrimitives.GenericOne..n }) s) .GetEnumerator() let i = ref LanguagePrimitives.GenericOne let mutable last = Unchecked.defaultof<_> while counter.MoveNext() do if !i = n then last <- counter.Current i := !i + LanguagePrimitives.GenericOne last
The traverse
counting function is inlined in order to allow the compiler to build the compiled code aligned with the type on argument n
picked for the counting. The n
argument of traverse
represents the amount of elements expected to be generated. The second traverse
argument s
represents a generic unlimited sequence generator. makeSeq
with a given generic repetitive element makes a perfect second argument for traverse
.
The sequence counting enumerator can be elegantly expressed as Seq.zip
zipping together the presumably unlimited length sequence of makeSeq
and a limited length sequence that has exactly the expected arbitrarily large (within boundaries that the underlying type allows) number of elements. As zipping stops upon reaching the end of the shorter sequence, the counter
value represents exactly the desired enumerator obtained from the zipper expression result.
Finally, I traverse the obtained enumerator until it stops yielding elements keeping track of the last traversed element. This last
element, which is apparently a tuple of the last element number and the unbound sequence element, is returned as the evidence of arbitrary length. The following screenshot demonstrates how the field tests passed. The first test shows how traverse
worked with the BigInteger
counter; the second test just illustrates how to produce the sequence that is 10 elements longer than System.Int32.MaxValue
:
Another interesting experiment would be to generate a sequence longer than System.Int64.MaxValue
, which I leave to you as an exercise. My only concern is the time duration it may take to complete. My rough estimate shows that at the rate of 1,000,000 elements traversed per second, it would take no less than the next 29 centuries to complete; so, some serious revisions of the method and optimizations of the implementation may be due.
As you had a chance to notice recently, generating sequences with the de-sugared .NET way has a fair amount of moving parts and, frankly, is not the best one fits all use case. Fortunately, F# provides enough support through syntactic sugar as well as library functions, making the generation of sequences of finite and infinite lengths a breeze. Let's take a peek at them.
Sequence comprehensions allow you to represent a sequence as a special kind of expression, namely sequence expression (https://msdn.microsoft.com/en-us/library/dd233209.aspx). Or, the other way around, sequence expression, when being evaluated, yields a sequence.
Sequence comprehensions may take plenty of forms. We'll be discussing some that are typical.
These are the simplest forms of comprehensions, producing sequences from ranges. Observe that ranges are not limited to just numeric ones; any type supporting the 'get_One'
operator is fine too as shown here (Ch6_4.fsx
):
// odd int64 between 1 and 1000 seq { 1L .. 2L .. 1000L } // val it : seq<int64> = seq [1L; 3L; 5L; 7L; ...] // range not necessarily must be numeric! seq { 'A' .. 'Z' } // val it : seq<char> = seq ['A'; 'B'; 'C'; 'D'; ...]
These expressions generalize ranges by allowing the projection of one or more enumerations into something of another type. Also, note that the enumeration definition can be very flexible: from a simple range to nested enumerations to just another sequence as shown here (Ch6_4.fsx
):
// even int from 2 to 1000 seq { for i in 1..2..999 -> ((+) 1 i) } // val it : seq<int> = seq [2; 4; 6; 8; ...] // nested enumerations seq { for i in 1..10 do for j in 1..10 -> if i = j then 1 else 0} // val it : seq<int> = seq [1; 0; 0; 0; ...] // cartesian product tuple projection seq { for i in 1..10 do for j in 1..10 -> (i,j) } // val it : seq<int * int> = seq [(1, 1); (1, 2); (1, 3); ...] // cartesian product nested enumerations seq { for i in seq {'a'..'b'} do for j in 1..2 -> (i,j) } val it : seq<char * int> = seq [('a', 1); ('a', 2); ('b', 1); ('b', 2)]
All sequence comprehensions represent the idiomatic F# syntax sugar related to the extremely powerful mechanism of computation expressions (https://msdn.microsoft.com/en-us/library/dd233182.aspx), in particular, providing a convenient syntax for dreaded M-word things, also known as monads. Computation expressions represent the extremely powerful F# pattern of sequencing and combining computations. They may be custom built; however, F# also offers some built-in computation expressions: along with sequence expressions, there are asynchronous workflows and query expressions as well. I will be covering built-in computation expressions in the upcoming chapters of this book.
Arbitrary sequence expressions are just computations wrapped by seq { and }
tokens, although in contrast to the ranges and maps covered earlier, computations can be pretty much anything. Two constructs within sequence expression tokens play a special role as shown here:
yield <expression>
makes the expression value the next element of the final sequenceyield! <sequence expression>
(reads yield-bang) appends the sequence expression operand to the end of the final sequenceThe presence of yield!
turns arbitrary sequence expressions into extremely powerful data transformations. In particular, as seq {...}
is still an expression, being used as a return value of recursive functions, this pattern allows you to extremely succinctly and elegantly implement sequences of finite and infinite length, in particular, easily turn any finite sequence into an infinite circular one, which is often very convenient for the partitioning of other sequences through the element markup.
Enough words; let's look at some code!
I begin with a sample demonstrating how the entire pattern match construct can be nested into a sequence expression in order to detect when the sequence should halt. The following snippet produces a descending sequence of an integer from any non-negative number down to zero (Ch6_4.fsx
):
let rec descend top = seq { match top with | _ when top < 0 -> () | _ -> yield top yield! descend (top - 1) } // descend 3;; // val it : seq<int> = seq [3; 2; 1; 0] // descend -3;; // val it : seq<int> = seq []
Note how generation halting is achieved by returning unit
instead of yielding the next element.
So far, so good. Now let's generate just an endless sequence of alternating strings as shown (Ch6_4.fsx
):
let rec fizzbuzz = seq { yield "Fizz" yield "Buzz" yield! fizzbuzz } in fizzbuzz // val it : seq<string> = seq ["Fizz"; "Buzz"; "Fizz"; "Buzz"; ...]
To wrap up the theme, look at how elegantly the circularization of any arbitrary sequence can be achieved as shown here (Ch6_4.fsx
):
let rec circular ss = seq { yield! ss; yield! circular ss } circular (seq { yield '+'; yield '-' }) // val it : seq<char> = seq ['+'; '-'; '+'; '-'; ...]
Deserves two bangs that are required in the definition above to arrange the circularization indeed.
Now I turn to the support for sequence generation that the F# core libraries provide.
This method is for sequences of predefined length as the length sits right in the function signature. This is quite a simple function that assumes but does not prescribe a projection of the current element number. Here goes a sample with a projection of the sequence number in a string performed in the tacit (https://en.wikipedia.org/wiki/Tacit_programming) manner (Ch6_4.fsx
):
Seq.init 10 (sprintf "%s%d""I'm element #") //val it : seq<string> = // seq // ["I'm element #0"; "I'm element #1"; "I'm element #2"; // "I'm element #3"; ...]
This function is very similar to the previous one, but it is missing the first argument indeed, as shown here (Ch6_4.fsx
):
Seq.initInfinite (sprintf "%s%d""I'm element #") //val it : seq<string> = // seq // ["I'm element #0"; "I'm element #1"; "I'm element #2"; // "I'm element #3"; ...]
Pretty much nothing has changed, but the underlying abstraction is more powerful than the finite variant. Unfortunately, the power of abstraction can be easily hurt by the implementation limitation that shrewd F# programmers may guess: it has only as many element sequence numbers as the hardware architecture allows. This is easy to check with the following little hack (Ch6_4.fsx
):
Seq.initInfinite (fun _ -> ()) |> Seq.skip (System.Int32.MaxValue) //> //val it : seq<unit> = // Error: Enumeration based on System.Int32 exceeded System.Int32.MaxValue.
Ouch, it hurts!
The Seq.unfold
library function, which concludes the matter of sequence generation, is my favorite. Instead of bothering with sequence numbers, its projection function unwraps the recurrent relationship between the current and the next element. It also demonstrates a very smart manner of addressing the halt problem by prescribing the projection result as option
when returning None
signals to stop generating further elements. Let's look at this library function in action using rather worn down by bloggers and Academia Fibonacci numbers (https://en.wikipedia.org/wiki/Fibonacci_number) as an example shown here (Ch6_4.fsx
):
// Oh NO! Not Fibonacci again! let fibnums = Seq.unfold (fun (current, next) -> Some(current, (next, current+next)))(1,1) fibnums |> Seq.take 10 |> Seq.toList // val it : int list = [1; 1; 2; 3; 5; 8; 13; 21; 34; 55]
After some years of using F#, I'm still excited what clarity of intent it allows! The projection function literally explains itself, so I do not have anything to add.