Sequence of an indefinite length as a design pattern

The conventional engineering vision of data transformations is that they occur over finite collections materialized in the memory, hence allowing these collections to be enumerated with Seq.length, yielding a number of elements. However, the F# sequences (as well as .NET IEnumerable<T> per se) grant the following generalization: in some cases, a more math-centric vision might be useful, which suggests looking at sequences as countable but not necessarily finite.

A meticulous reader may immediately object that the countable entity, when applied to practical computing, is necessarily finite because eventually, it is limited by underlying physical hardware, which comes out in boundary values, for example:

System.Int32.MaxValue = 2147483647 
System.Int64.MaxValue = 9223372036854775807L 

However, I would oppose this objection by saying that this mere consideration does not in any way limit the length of the F# sequences that might be produced. As a proof, let's implement at the low level without using any F# syntactic sugar the repeater, or sequence that when being given an element of any type returns the endless repetition of the given element.

I will begin with a plain vanilla IEnumerator<'T> implementation as shown in the following code (Ch6_3.fsx):

type private Repeater<'T>(repeated) = 
  let _repeated = repeated 
    interface System.Collections.Generic.IEnumerator<'T> with 
    member x.Current = _repeated 
 
  interface System.Collections.IEnumerator with  
    member x.Current = box _repeated 
    member x.MoveNext() = true 
    member x.Reset() = () 
 
  interface System.IDisposable with 
    member x.Dispose() = () 

The preceding snippet is quite straightforward. The Repeater<'T> type defines a class with which the single default constructor obtains the element to be repeated as repeated and persists it within the class instance as _repeated.

Then, as a fulfillment of the System.Collections.Generic.IEnumerator<'T> contract, this interface is implemented with the single property Current returning the persisted _repeated value.

Then, the implementation of the nongeneric System.Collections.IEnumerator interface follows with its three contract methods. Here is the place where the desired sequence behavior is coined: the Current untyped property also returns a persisted _repeated value, but this time, it's boxed according to the contract, yielding obj. The MoveNext() method, as Energizer Bunny says, should keep going, going, going... so that it always returns true, which means that the next element is available no matter what. The Reset() legacy method is just a stub.

Finally, a bogus implementation of System.IDisposable that is required by the IEnumerator<'T> contract completes the implementation.

Now, for the usage convenience, I add a thin wrapper that upcasts the implemented interface of Repeater<'T> to the explicit System.Collections.Generic.IEnumerator<'T> as shown in the following code (Ch6_3.fsx):

let repeat<'T>(e) = 
  (new Repeater<'T>(e) 
  :> System.Collections.Generic.IEnumerator<'T>) 

Finally, a generic makeSeq shim function provides the conversion of any IEnumerator<'T> into the corresponding seq<'T> sequence by implementing both generic and nongeneric flavors of IEnumerable as shown in the following code (Ch6_3.fsx):

let makeSeq enumerator = 
{ 
  new System.Collections.Generic.IEnumerable<'U> with 
    member x.GetEnumerator() = enumerator 
  interface System.Collections.IEnumerable with 
    member x.GetEnumerator() = 
    (enumerator :> System.Collections.IEnumerator) 
} 

Here, the enumerator argument provides the underlying IEnumerator<'T> for both implementations of IEnumerable constituting an arbitrary F# seq.

It's time for field tests! Executing the freshly created makeSeq function with three different arguments that represent the repeat '.', repeat 42, and repeat"Hooray!" enumerators in FSI yields sequences of indefinite length of the corresponding types, as demonstrated in the following screenshot:

Sequence of an indefinite length as a design pattern

Generating sequences of indefinite length

However, how can we prove that these sequences are indeed of indefinite length? Ironically, only by counting elements: if for any indiscriminately big number, the sequence yields that many elements, then that is the proof that these sequences are of indefinite length. Unfortunately, this is exactly where we hit the already mentioned counting problem: counting might be effectively limited by the underlying hardware.

But wait a minute; .NET provides a numeric type that, for all practical purposes, represents an arbitrarily large countable System.Numerics.BigInteger (https://msdn.microsoft.com/en-us/library/system.numerics.biginteger(v=vs.110).aspx). So, it would be nice to base the counting on this type.

Assuming that you are not afraid of challenges, it would be a good exercise to implement a generic technique of counting not limited by standard int. For F#, this task is not very complicated. I would suggest the following idiomatic approach (Ch6_3.fsx):

let inline traverse n s = 
  let counter = 
    (Seq.zip 
    (seq { LanguagePrimitives.GenericOne..n }) s) 
    .GetEnumerator() 
  let i = ref LanguagePrimitives.GenericOne 
  let mutable last = Unchecked.defaultof<_> 
  while counter.MoveNext() do 
    if !i = n then last <- counter.Current 
      i := !i + LanguagePrimitives.GenericOne 
  last 

The traverse counting function is inlined in order to allow the compiler to build the compiled code aligned with the type on argument n picked for the counting. The n argument of traverse represents the amount of elements expected to be generated. The second traverse argument s represents a generic unlimited sequence generator. makeSeq with a given generic repetitive element makes a perfect second argument for traverse.

The sequence counting enumerator can be elegantly expressed as Seq.zip zipping together the presumably unlimited length sequence of makeSeq and a limited length sequence that has exactly the expected arbitrarily large (within boundaries that the underlying type allows) number of elements. As zipping stops upon reaching the end of the shorter sequence, the counter value represents exactly the desired enumerator obtained from the zipper expression result.

Finally, I traverse the obtained enumerator until it stops yielding elements keeping track of the last traversed element. This last element, which is apparently a tuple of the last element number and the unbound sequence element, is returned as the evidence of arbitrary length. The following screenshot demonstrates how the field tests passed. The first test shows how traverse worked with the BigInteger counter; the second test just illustrates how to produce the sequence that is 10 elements longer than System.Int32.MaxValue:

Sequence of an indefinite length as a design pattern

Checking out unlimited sequence workings

Another interesting experiment would be to generate a sequence longer than System.Int64.MaxValue, which I leave to you as an exercise. My only concern is the time duration it may take to complete. My rough estimate shows that at the rate of 1,000,000 elements traversed per second, it would take no less than the next 29 centuries to complete; so, some serious revisions of the method and optimizations of the implementation may be due.

Generating the F# sequences

As you had a chance to notice recently, generating sequences with the de-sugared .NET way has a fair amount of moving parts and, frankly, is not the best one fits all use case. Fortunately, F# provides enough support through syntactic sugar as well as library functions, making the generation of sequences of finite and infinite lengths a breeze. Let's take a peek at them.

Sequence comprehensions

Sequence comprehensions allow you to represent a sequence as a special kind of expression, namely sequence expression (https://msdn.microsoft.com/en-us/library/dd233209.aspx). Or, the other way around, sequence expression, when being evaluated, yields a sequence.

Sequence comprehensions may take plenty of forms. We'll be discussing some that are typical.

Ranges

These are the simplest forms of comprehensions, producing sequences from ranges. Observe that ranges are not limited to just numeric ones; any type supporting the 'get_One' operator is fine too as shown here (Ch6_4.fsx):

// odd int64 between 1 and 1000 
seq { 1L .. 2L .. 1000L } 
// val it : seq<int64> = seq [1L; 3L; 5L; 7L; ...] 
 
// range not necessarily must be numeric! 
seq { 'A' .. 'Z' } 
// val it : seq<char> = seq ['A'; 'B'; 'C'; 'D'; ...] 

Maps

These expressions generalize ranges by allowing the projection of one or more enumerations into something of another type. Also, note that the enumeration definition can be very flexible: from a simple range to nested enumerations to just another sequence as shown here (Ch6_4.fsx):

// even int from 2 to 1000 
seq { for i in 1..2..999 -> ((+) 1 i) } 
// val it : seq<int> = seq [2; 4; 6; 8; ...] 
 
// nested enumerations 
seq { for i in 1..10 do for j in 1..10 -> if i = j then 1 else 0} 
// val it : seq<int> = seq [1; 0; 0; 0; ...] 
 
// cartesian product tuple projection 
seq { for i in 1..10 do for j in 1..10 -> (i,j) } 
// val it : seq<int * int> = seq [(1, 1); (1, 2); (1, 3); ...] 
 
// cartesian product nested enumerations 
seq { for i in seq {'a'..'b'} do for j in 1..2 -> (i,j) } 
val it : seq<char * int> = seq [('a', 1); ('a', 2); ('b', 1); ('b', 2)] 
 

Arbitrary sequence expressions

All sequence comprehensions represent the idiomatic F# syntax sugar related to the extremely powerful mechanism of computation expressions (https://msdn.microsoft.com/en-us/library/dd233182.aspx), in particular, providing a convenient syntax for dreaded M-word things, also known as monads. Computation expressions represent the extremely powerful F# pattern of sequencing and combining computations. They may be custom built; however, F# also offers some built-in computation expressions: along with sequence expressions, there are asynchronous workflows and query expressions as well. I will be covering built-in computation expressions in the upcoming chapters of this book.

Arbitrary sequence expressions are just computations wrapped by seq { and } tokens, although in contrast to the ranges and maps covered earlier, computations can be pretty much anything. Two constructs within sequence expression tokens play a special role as shown here:

  • yield <expression> makes the expression value the next element of the final sequence
  • yield! <sequence expression> (reads yield-bang) appends the sequence expression operand to the end of the final sequence

The presence of yield! turns arbitrary sequence expressions into extremely powerful data transformations. In particular, as seq {...} is still an expression, being used as a return value of recursive functions, this pattern allows you to extremely succinctly and elegantly implement sequences of finite and infinite length, in particular, easily turn any finite sequence into an infinite circular one, which is often very convenient for the partitioning of other sequences through the element markup.

Enough words; let's look at some code!

I begin with a sample demonstrating how the entire pattern match construct can be nested into a sequence expression in order to detect when the sequence should halt. The following snippet produces a descending sequence of an integer from any non-negative number down to zero (Ch6_4.fsx):

let rec descend top =  
  seq { 
    match top with 
      | _ when top < 0 -> () 
      | _ -> 
      yield top 
      yield! descend (top - 1) 
  } 
 
// descend 3;; 
// val it : seq<int> = seq [3; 2; 1; 0] 
// descend -3;; 
// val it : seq<int> = seq [] 

Note how generation halting is achieved by returning unit instead of yielding the next element.

So far, so good. Now let's generate just an endless sequence of alternating strings as shown (Ch6_4.fsx):

let rec fizzbuzz = seq {  
  yield "Fizz" 
  yield "Buzz" 
  yield! fizzbuzz 
} 
in fizzbuzz 
 
// val it : seq<string> = seq ["Fizz"; "Buzz"; "Fizz"; "Buzz";  ...]

To wrap up the theme, look at how elegantly the circularization of any arbitrary sequence can be achieved as shown here (Ch6_4.fsx):

let rec circular ss = 
  seq { yield! ss; yield! circular ss } 
 
circular (seq { yield '+'; yield '-' }) 
// val it : seq<char> = seq ['+'; '-'; '+'; '-'; ...] 

Deserves two bangs that are required in the definition above to arrange the circularization indeed.

Library functions generating sequences

Now I turn to the support for sequence generation that the F# core libraries provide.

Seq.init

This method is for sequences of predefined length as the length sits right in the function signature. This is quite a simple function that assumes but does not prescribe a projection of the current element number. Here goes a sample with a projection of the sequence number in a string performed in the tacit (https://en.wikipedia.org/wiki/Tacit_programming) manner (Ch6_4.fsx):

Seq.init 10 (sprintf "%s%d""I'm element #") 
//val it : seq<string> = 
//  seq 
//    ["I'm element #0"; "I'm element #1"; "I'm element #2"; 
//    "I'm element #3"; ...] 
 

Seq.initInfinite

This function is very similar to the previous one, but it is missing the first argument indeed, as shown here (Ch6_4.fsx):

Seq.initInfinite (sprintf "%s%d""I'm element #") 
//val it : seq<string> = 
//  seq 
//    ["I'm element #0"; "I'm element #1"; "I'm element #2"; 
//    "I'm element #3"; ...] 

Pretty much nothing has changed, but the underlying abstraction is more powerful than the finite variant. Unfortunately, the power of abstraction can be easily hurt by the implementation limitation that shrewd F# programmers may guess: it has only as many element sequence numbers as the hardware architecture allows. This is easy to check with the following little hack (Ch6_4.fsx):

Seq.initInfinite (fun _ -> ()) 
|> Seq.skip (System.Int32.MaxValue) 
//> 
//val it : seq<unit> = 
//  Error: Enumeration based on System.Int32 exceeded System.Int32.MaxValue. 

Ouch, it hurts!

Seq.unfold

The Seq.unfold library function, which concludes the matter of sequence generation, is my favorite. Instead of bothering with sequence numbers, its projection function unwraps the recurrent relationship between the current and the next element. It also demonstrates a very smart manner of addressing the halt problem by prescribing the projection result as option when returning None signals to stop generating further elements. Let's look at this library function in action using rather worn down by bloggers and Academia Fibonacci numbers (https://en.wikipedia.org/wiki/Fibonacci_number) as an example shown here (Ch6_4.fsx):

// Oh NO! Not Fibonacci again! 
let fibnums = Seq.unfold (fun (current, next) -> 
  Some(current, (next, current+next)))(1,1) 
 
fibnums |> Seq.take 10 |> Seq.toList 
// val it : int list = [1; 1; 2; 3; 5; 8; 13; 21; 34; 55] 

After some years of using F#, I'm still excited what clarity of intent it allows! The projection function literally explains itself, so I do not have anything to add.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset