In this chapter, we will take a deep dive into one of the most essential and utterly important arrangements of functional programming, that is, sequences. The ability to represent any data transformation as a composition of atomic functions applied to the elements of an arbitrary enumerable data container is a must for a functional programmer. The goal of this chapter is to help you acquire this mental skill. The way towards this goal is paved by the following topics covered here:
Let's revisit the functional solution of the sample problem from Chapter 1, Begin Thinking Functionally. It represents the common functional pattern of finding a given property of the collection as follows:
Those of you who are attentive to detail may have already spotted the similarity of the preceding solution approach to the MapReduce (https://en.wikipedia.org/wiki/MapReduce) pattern, just without the possible partitioning and parallelization of the map phase for now. This similarity is not coincidental. After implementing a serious amount of F# ETL (https://en.wikipedia.org/wiki/Extract,_transform,_load) tasks, big and small for enterprise Line of Business (LOB) applications, I can conclude that the part of the F# core library covering basic operations upon enumerable sequences, namely the Collections.seq
library module of the Microsoft.FSharp.Collections (
https://msdn.microsoft.com/en-us/library/ee353635.aspx
) namespace, has already distilled the typical functional patterns of data sequence processing. Any effective F# developer should be conversant in representing a sought-for data transformation solution at hand into a combination of these library functions from Collections.seq
.
Based on my own experience, this set of 70 library functions (for version 4.0 of F#) is hard to grok when you consider it as a list that is just alphabetically ordered by the function name. It is hard to memorize what exactly this or that function is doing without distinguishing their commonalities and differences. This perception can be facilitated if we start seeing a certain data transformation pattern being implemented by each of these functions. These patterns stem from years of accumulated experience in applying functional programming to data processing and are coined into the selection of functions that the F# designers have slated for inclusion into the core library.
I believe that by observing the Collection.seq
library constituents from this data processing pattern relationship angle, the following function groups can be distinguished:
'T
dealing exclusively with seq<'T>
or 'T
objects. For example, create a new sequence by skipping the first 100 elements of the original one.Equipped with this classification approach, I've partitioned the library functions by the following set of patterns. Under each pattern, all the relevant library functions are listed along with their signatures. I encourage you to explore the signatures in order to spot the commonalities responsible for each group formation.
Additional information for those of you who are eager to dig deeper is given in the Ch6_1.fsx
script of this book's accompanying code, where the use of each of the library functions is illustrated by a brief code sample.
average : seq<^T> -> ^T (requires member (+) and member DivideByInt and member get_Zero)averageBy : ('T -> ^U) -> seq<'T> -> ^U (requires ^U with static member (+) and ^U with static member DivideByInt and ^U with static member Zero) fold : ('State -> 'T -> 'State) -> 'State -> seq<'T> -> 'State length : seq<'T> -> int sum : seq<^T> -> ^T (requires member (+) and member get_Zero) sumBy : ('T -> ^U) -> seq<'T> -> ^U (requires ^U with static member (+) and ^U with static member Zero) max : seq<'T> -> 'T (requires comparison) maxBy : ('T -> 'U) -> seq<'T> -> 'T (requires comparison) min : seq<'T> -> 'T (requires comparison) minBy : ('T -> 'U) -> seq<'T> -> 'T (requires comparison) isEmpty : seq<'T> -> bool reduce : ('T -> 'T -> 'T) -> seq<'T> -> 'T exactlyOne : seq<'T> -> 'T compareWith : ('T -> 'T -> int) -> seq<'T> -> seq<'T> -> int
empty : seq<'T> init : int -> (int -> 'T) -> seq<'T> initInfinite : (int -> 'T) -> seq<'T> singleton : 'T -> seq<'T> unfold : ('State -> 'T * 'State option) -> 'State -> seq<'T>
cast : IEnumerable -> seq<'T> cache : seq<'T> -> seq<'T> delay : (unit -> seq<'T>) -> seq<'T> readonly : seq<'T> -> seq<'T> toArray : seq<'T> -> 'T [] toList : seq<'T> -> 'T list ofArray : 'T array -> seq<'T> ofList : 'T list -> seq<'T>
iter : ('T -> unit) -> seq<'T> -> unit iter2 : ('T1 -> 'T2 -> unit) -> seq<'T1> -> seq<'T2> -> unit iteri : (int -> 'T -> unit) -> seq<'T> -> unit
append : seq<'T> -> seq<'T> -> seq<'T> collect : ('T -> 'Collection) -> seq<'T> -> seq<'U> concat : seq<'Collection> -> seq<'T> head : seq<'T> -> 'T last : seq<'T> -> 'T nth : int -> seq<'T> -> 'T skip : int -> seq<'T> -> seq<'T> take : int -> seq<'T> -> seq<'T> sort : seq<'T> -> seq<'T> sortBy : ('T -> 'Key) -> seq<'T> -> seq<'T> truncate : int -> seq<'T> -> seq<'T> distinct : seq<'T> -> seq<'T> distinctBy : ('T -> 'Key) -> seq<'T> -> seq<'T>
choose : ('T -> 'U option) -> seq<'T> -> seq<'U> exists : ('T -> bool) -> seq<'T> -> bool exists2 : ('T1 -> 'T2 -> bool) -> seq<'T1> -> seq<'T2> -> bool filter : ('T -> bool) -> seq<'T> -> seq<'T> find : ('T -> bool) -> seq<'T> -> 'T findIndex : ('T -> bool) -> seq<'T> -> int forall : ('T -> bool) -> seq<'T> -> bool forall2 : ('T1 -> 'T2 -> bool) -> seq<'T1> -> seq<'T2> -> bool pick : ('T -> 'U option) -> seq<'T> -> 'U skipWhile : ('T -> bool) -> seq<'T> -> seq<'T> takeWhile : ('T -> bool) -> seq<'T> -> seq<'T> tryFind : ('T -> bool) -> seq<'T> -> 'T option tryFindIndex : ('T -> bool) -> seq<'T> -> int option tryPick : ('T -> 'U option) -> seq<'T> -> 'U option where : ('T -> bool) -> seq<'T> -> seq<'T>
countBy : ('T -> 'Key) -> seq<'T> -> seq<'Key * int> groupBy : ('T -> 'Key) -> seq<'T> -> seq<'Key * seq<'T>> pairwise : seq<'T> -> seq<'T * 'T> map : ('T -> 'U) -> seq<'T> -> seq<'U> map2 : ('T1 -> 'T2 -> 'U) -> seq<'T1> -> seq<'T2> -> seq<'U> mapi : (int -> 'T -> 'U) -> seq<'T> -> seq<'U> scan : ('State -> 'T -> 'State) -> 'State -> seq<'T> -> seq<'State> windowed : int -> seq<'T> -> seq<'T []> zip : seq<'T1> -> seq<'T2> -> seq<'T1 * 'T2> zip3 : seq<'T1> -> seq<'T2> -> seq<'T3> -> seq<'T1 * 'T2 * 'T3>