The sequence: Duality of data and calculation

What makes the F# sequence so ambient and versatile is its dual nature. Being a strongly typed generic data collection, it exposes the contained data via two archetypal .NET interfaces of the System.Collections.Generic namespace, namely IEnumerable<T> (https://msdn.microsoft.com/en-us/library/9eekhta0(v=vs.110).aspx) and IEnumerator<T> (https://msdn.microsoft.com/en-us/library/78dfe2yb(v=vs.110).aspx).

These interfaces personify the classic data pull protocol, where a data consumer actively pulls data from the producer. Indeed, the type of seq<'T> in the F# is defined as the following abbreviation:

type seq<'T> = System.Collections.Generic.IEnumerable<'T> 

The preceding line of code means in practice that each F# sequence is a data collection, which can be traversed by getting an enumerator that allows you to stir through the sequence from its head towards its tail, obtaining the values of its elements. The enumerator itself can be obtained using the GetEnumerator() method of the IEnumerable<'T> interface.

With the enumerator, which in turn implements the IEnumerator<'T> interface, the sequence can be traversed using the pair of members that constitute this interface: the Current property, which gets the value of the sequence element at the current position of the enumerator, and the MoveNext() method, which advances the position of the enumerator to the next element of the sequence.

Rather boring, right? Well, it might be boring when being applied to materialized data collections such as the F# list, where all elements exist in the physical memory space. However, nothing in the preceding scheme insists upon the element materialization! It doesn't require much mental effort to imagine IEnumerator<'T> being implemented by a calculation that returns the freshly constructed value of 'T in response to getting the Current property, and for MoveNext(), it just advances the imaginable current position marker of the sequence. The whole arrangement is immaterial with regard to the memory space occupied by elements, as there is no need to keep more than just one materialized Current element, right? With that, you just rediscovered the inner workings of the F# sequences!

Sequence as a lazy data collection

The F# sequences do not eagerly materialize data elements in the memory. This feature aligns very well with the data pull protocol. That is, the current sequence element is not required unless the sequence enumerator has reached its position in the sequence after a series of MoveNext() method invocations and the element value has been demanded via getting the enumerator's Current property.

However, in order to really master the F# sequences, it is important that you understand its nuances. In particular, it is important to be aware of whether sequence elements are materialized or not. If a sequence has been calculated on the enumerator's demand and is not being converted from a materialized collection, such as a list or an array, or is not cached, then there is normally no backing memory where sequence element values are persisted. On the contrary, if a sequence has been produced from a concrete collection by a library function, for example, Seq.ofList, then at least one instance of the original list must be present for the entire lifespan of the derived collection, as this list can be completely arbitrary and no way exists to recreate it from scratch in a manner similar to a sequence being re-enumerated multiple times if the re-enumeration is cheap and performance considerations do not prompt for caching.

Sequence as a calculation

As I've just mentioned, a sequence can be an enumeration atop a concrete data collection where the enumerator is implemented on the collection's side. However, more interesting cases represent sequences that have an enumerator that programmatically generates sequence elements upon the traversal pull demand. These might be different, syntactically sugared forms as sequence comprehensions, sequence computation expressions, or standard library functions representing the generator pattern considered at the beginning of the chapter. As a last resort, a sequence can be brought to life in a fully de-sugared manner by implementing some required interfaces. The last approach is the most tedious and error-prone; however, it grants unprecedented flexibility in comparison to other methods. In a majority of development situations, the custom sequence enumerator implementation is unwarranted; however, there might be some situations where there is simply no alternative to the custom implementation. This topic will be my next subject.

Sequence as an enumerator interface wrapper

Although implementing a custom sequence is a tedious task, it's not rocket science. I'm going to walk you through the process and make you understand it. No matter how simple or complex the custom sequence is, the implementation process will be the same.

To begin with, what do you think defines the sequence behavior? Apparently, it is not syntactic constructions used for sequence traversing, and it does not matter whether it's sugared or not sugared. All implementation specifics are abstracted by the entity standing behind any sequence (and even more broadly, behind any .NET collection): the enumerator. Enumerator is a class that must implement the previously mentioned strongly typed interface IEnumerator<'T> (https://msdn.microsoft.com/en-us/library/78dfe2yb(v=vs.110).aspx) of the System.Collections.Generic namespace. In turn, IEnumerator<'T> inherits from two other interfaces: the routine System.IDisposable interface and the legacy untyped IEnumerator interface of the System.Collections namespace. (Pay attention to the difference between typed System.Collections.Generic and untyped System.Collections namespaces). IEnumerator<'T> overrides the Current property and inherits the MoveNext() and Reset() methods of the IEnumerator interface. As these relationships between involved components are quite intricate, I have provided a component relationship diagram in the following figure to facilitate comprehension:

Sequence as an enumerator interface wrapper

The relationship between components constituting the F# sequence implementation

Considering these intricacies, the implementation plan for any custom F# sequence is as following:

  1. Provide a custom Enumerator class for the sequence that implements System.Collections.Generic.IEnumerator<'T>, System.Collections.IEnumerator, and System.IDisposable interfaces. For the first interface, just implement an override of the Current property, and the rest of the implementation goes into nongeneric IEnumerator.
  2. Provide a factory function that is similar to the GetEnumerator() methods of generic .NET collections that have a unit -> System.Collections.Generic.IEnumerator<'T> signature. This function constructs the requested instance of the enumerator passing through its own arguments directly to the constructor, then it upcasts the constructed instance into System.Collections.Generic.IEnumerator<'T> and returns the result as a function of the previously listed signature.
  3. Provide another factory function, this time to build a sought-for sequence out of the function built in Step 2.

As this may still sound a bit complicated, let's take a quick walk-through. I want us to implement the easiest thing: a strongly typed empty sequence, which is a sequence without elements, when its enumerator does not have anything to enumerate.

At the same time, apparently, it must be a normal sequence similar to any other native sequence from .NET libraries or a sugared one constructed with the F# language facilities or core libraries. Let's do this.

Step 1 - The custom enumerator implementation

The behavior of an empty sequence is pretty straightforward: both typed and untyped versions of the Current property never have a chance to work as any attempt to enumerate empty sequence must immediately terminate; MoveNext() always returns false, indicating that the end of the sequence has already been reached. Expressed in the F# code, these considerations are shown in the following snippet (Ch6_2.fsx):

type private DummyEnumerate<'T>() = 
  interface System.Collections.Generic.IEnumerator<'T> with 
    member x.Current = Unchecked.defaultof<'T> 
 
  interface System.Collections.IEnumerator with  
    member x.Current = box Unchecked.defaultof<'T> 
    member x.MoveNext() = false 
    member x.Reset() = () 
 
  interface System.IDisposable with  
    member x.Dispose() = () 

As mentioned earlier, System.Collections.Generic.IEnumerator<'T> overrides Current and inherits MoveNext() and Reset() of System.Collections.IEnumerator. Both Current properties use the typed default value; the Current property of the untyped enumerator boxes this default value according to the specification. Step 1 is now complete.

Step 2 - The custom enumerator factory

Step 2 is quite simple, especially in our case, where the implemented sequence does not have any specifics to be communicated to the enumerator at the time of construction as shown in the following code (Ch6_2.fsx):

let makeDummyEnumerator<'T>() = 
  fun() -> (new DummyEnumerate<'T>() 
    :> System.Collections.Generic.IEnumerator<'T>) 

Step 2 is now complete too.

Step 3 - The custom sequence factory

This one is a breeze, thanks to the great object expressions (https://msdn.microsoft.com/en-us/library/dd233237.aspx) feature of the F# as shown (Ch6_2.fsx):

let makeSeq enumerator = 
{ 
  new System.Collections.Generic.IEnumerable<_> with 
    member x.GetEnumerator() = enumerator() 
  interface System.Collections.IEnumerable with 
    member x.GetEnumerator() = 
    (enumerator() :> System.Collections.IEnumerator) 
} 

Here we go; it is easy to spot that this particular piece does not in any way depend on the produced sequence and is a good candidate to be a member of a Helpers library. The implementation is complete.

Now is a perfect moment to give it a test drive to check whether everything is really OK and that I did not miss anything. The results of a concise testing are reflected in the following screenshot:

Step 3 - The custom sequence factory

Testing the implemented empty sequence

The empty sequence is created with the following code:

let ss = makeSeq (makeDummyEnumerator<int>()) 

Then some checks are performed as shown:

  • ss |> Seq.isEmpty, as expected, returns true
  • ss |> Seq.length, as expected, equals 0
  • An attempt to skip some elements with ss |> Seq.skip 10 fails with the expected diagnostics

Before we switch to the next topic, I want to reiterate this: the de-sugared custom sequence implementation using bare .NET interfaces is not much fun. The good thing about it is that in most situations, you simply do not need to descend to this level. Syntactically sugared language constructions and core library functions will do the same job. However, once in a while, you will need to do something special, such as counting the number of times your code traverses a sequence, and this technique will be at your service.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset