I promised in Chapter 4, Basic Pattern Matching, that I would add to the subject by covering active patterns; now is a perfect time. Remember matching with guards? Guards provide a way to drill down into the matched pattern-expression
function by attaching an arbitrary calculation having the bool
result.
Guard mechanism adds a certain customization potential to the vanilla pattern matching, but it is kind of detached: regardless of how much data decomposition is required in order to complete the guard calculation, all this effort is discarded for both matching and non-matching possible calculation results. Wouldn't it be nice to have a fully customizable transition between the recognition and transformation phases of pattern matching? Active patterns aim exactly at this matter. Broadly speaking, active patterns represent a special kind of function allowed to be used inside pattern-expression
.
They allow you to implement some typical patterns of data transformations in a very terse and elegant manner as following:
Let's look at how active patterns play with each case of these data processing patterns.
Active patterns use a special naming convention when being defined within a let
binding:
``I'm active pattern``
(|
and
|)
as in (|``Another active pattern``|)
The data in which an active pattern works always comes as the last argument in the definition and at the time of its use being taken from the context (match
, function
, or any other F# construction where pattern matching occurs); all but the last arguments in a multi-argument definition are parameters that generalize the active pattern workings.
Finally, when a literal is used at the place of the last argument, the pattern-expression
is considered matched when the result of the active pattern calculation matches the literal. If a name is used instead of the literal, then this name gets bound to the result of the active pattern calculation to be used in the corresponding result-expression
transformation.
Does this sound confusing? In fact, it is easier than it may sound. Let me turn to some illustrative samples that might help.
The first one represents a dummy sample as shown in the following code (Ch7_7.fsx
):
let (|Echo|) x = x let checkEcho p = match p with | Echo 42 -> "42!" | Echo x -> sprintf "%O is not good" x
The Echo
active pattern is very minimalistic; it just echoes the input into the result. Then, the checkEcho
function puts this definition to use. In the first pattern-expression
, it simply checks whether the result of the Echo p
calculation (p
is implicitly taken from the head of the match
construction) equals 42
. If it does, then the corresponding result expression returns string "42!"
. Otherwise, the next result-expression
is evaluated by unconditionally binding the result of the Echo p
calculation to variable x
, which in turn is used in result-expression
to produce a "... is not good"
string.
So, when using the preceding sample in FSI, checkEcho 0
produces "0 is not good"
, while checkEcho 42
produces "42!"
.
Is it getting clearer? Another simple sample reinforcing this understanding would be an active pattern:
let (|``I'm active pattern``|) x = x + 2
While keeping the same type for the argument and result, this performs just a simple value transformation. The usage of the above active pattern is shown in the following screenshot:
The binding let (|``I'm active pattern``|) x = x + 2
that defines the active pattern does not match anything; instead, it takes the matched value and returns it, adding 2.
The binding let x = match 40 with ``I'm active pattern`` x -> x
is used as a part of the match construct and given the input argument 40
, it returns x
bound to a sum value of 42
.
The binding let (``I'm active pattern`` x) = 40
is a slightly mind boggling example that becomes clear if you remember that the let
binding of a value is a corner case of pattern matching based data disassembling, so ``I'm active pattern``
gets applied to input argument 40
and binds the result 42
to x
.
At this point, this specific use case of applying active patterns for data transformations should be clear enough; I want to apply it in a more practically sound use case.
It is a rather widespread technique to use globally unique identifiers, or GUIDs (https://en.wikipedia.org/wiki/Globally_unique_identifier) to label unique entities popping up in the course of running a business. For example, in Jet.com, GUIDs are used to label customer orders, merchant orders, merchant order items, shipping, fulfillment centers, SKUs...the complete list would be too long. These codes are mostly exchanged and displayed as strings of 32 hexadecimal digits. In some nodes of the system, it is required that you validate that a given string is a legitimate representation of a GUID. This task can be easily performed with the help of active patterns as shown here (Ch7_7.fsx
):
let hexCharSet = ['0'..'9'] @ ['a'..'f'] |> set in let (|IsValidGuidCode|) (guidstr: string) = let (|HasRightSize|) _ = guidstr.Length = 32 let (|IsHex|) _ = (guidstr.ToLower() |> set) = hexCharSet match () with (HasRightSize rightsize & IsHex hex)-> rightsize && hex
The preceding code has many interesting bits and pieces, such as the set of allowable hexCharSet
hexadecimal characters that are calculated only once and are local to the active pattern IsValidGuidCode
definition; the pair of internal active patterns HasRightSize
and IsHex
, each responsible only for the single verified property and disregarding its own input argument using one from the outer active pattern instead; and finally, the way two pattern-expressions
are combined with &
, again omitting the argument as it is already delivered to their bodies and combining the final result within result-expression
based upon entities distilled in the complementary pattern-expression
. Those of you who fully understand how the preceding snippet works can claim yourselves to be experts on the subject of active patterns.
To ensure that this code really works, let me perform a quick test drive. The upcoming figure reflects the results of this test, showing that the IsValidGuidCode
active pattern correctly identifies the "abc"
string as an invalid GUID and "0123456789AbCdEfFFEEDDCCbbAA9988 "
as a valid one:
By the way, active patterns of the (|active pattern name|
) form that I have covered so far are named single total active patterns, as they deal with a single data type, transforming it into the same or a different data type by the enclosed calculation. Another peculiarity of considered samples is that all of them were working on a single argument. I will cover active patterns with parameters later in this chapter.
My next foray into F# active patterns use as processing patterns is concerned with the typical practice of having data that may constitute one or more cases suitable for the processing and "just the rest" unsuitable. In the spirit of F#'s ubiquitous use of options active patterns capable of performing the above manner of partitioning transform the input data type into an Option
type, where the None
case represents unsuitable data and Some
wraps one or more types of suitable data.
The definition of such active patterns is unambiguously distinguishable by having |_
characters prepended to the right-hand side banana clip |)
of the active pattern definition. The active patterns of this type are called partial active patterns and their name groups look like this: (|name 1[|name 2...]|_|)
. Let's consider a pretty sizeable piece of real code from one of the Jet.com production systems to demonstrate this technique.
The task at hand is to process the invoices from Jet.com vendors (shipping carriers, payment processors, and others) that package their data in the form of comma-separated files. I use "comma-separated" broadly here, as separators can be any characters. Files may or may not have headers and can carry just a gazillion other irregularities. Uploading these invoices for processing followed by archiving is a problem that carries a certain complexity.
For the purposes of this chapter, I will take only a partially related problem, namely recognizing whether the last uploaded file is of a known Processable
type and should be processed or whether it is not and should be rejected.
In order to keep the code implementing the preceding task reasonably short for the purposes of the book, I'll limit the number of vendors to just three, that is, FedEx and OnTrac shipping carriers and the Braintree payment processor.
I begin with the Processable
here that lists known vendor files as following (Ch7_8.fsx
):
type Processable = | FedexFile | OnTracFile | BrainTreeFile with override this.ToString() = match this with | FedexFile -> "Fedex" | OnTracFile -> "OnTrac" | BrainTreeFile -> "BrainTree"
Nothing fancy here; just a common practice of representing domain entities with discriminated unions, perhaps slightly augmented.
Next, file headers are hardcoded here and also significantly stripped from the right-hand side as the complete contents do not matter much as shown here (Ch7_8.fsx
):
let BraintreeHdr = "Transaction ID,Subscription ID,..." let FedexHdr = ""Bill to Account Number";"Invoice Date";..." let OntracHdr = "AccountNum,InvoiceNum,Reference,ShipDate,TotalCharge,..."
And finally, the active pattern definition is as follows (Ch7_8.fsx
):
let (|IsProcessable|_|) (stream: Stream) = use streamReader = new StreamReader(stream) let hdr = streamReader.ReadLine() [(Processable.BrainTreeFile,BraintreeHdr); (Processable.FedexFile,FedexHdr); (Processable.OnTracFile,OntracHdr)] |> List.tryFind (fun x -> (snd x) = hdr) |> function | None -> (if hdr.StartsWith(""1",") then Some (Processable.OnTracFile) else None) | _ as zx -> Some (fst zx.Value)
The active pattern name, as expected, points to the partial active pattern, the argument is of type System.IO.Stream
carrying the file contents, and its return is of type Processable
option.
The function first creates StreamReader
and reads the first line from there into the hdr
value.
Then, it takes a list of tuples, which members perform pairing of Processable
cases with the string literals denoting the corresponding comma-separated file headers and tries to find the element that has the second part of the tuple that is equal to the hdr
. If such exists, then the file can be processed and the function returns option value Some
, wrapping the first part of the found tuple.
If the element is not found (option value None
case), consider at this point that often OnTrac
files may not carry headers. To exploit this knowledge, I examine a bit more into the already taken stream contents and whether the file begins with some symbols pointing to the OnTrac
origin the active pattern returns Some (Processable.OnTracFile)
; otherwise, the file is considered non-processable.
In my opinion, the IsProcessable
active pattern represents quite a terse and clean implementation of the business feature.
I wrap up our journey into the exciting world of F# active patterns with the active pattern type that applies to the processing pattern of categorization, or partitioning the data into the entirety of subcategories that completely cover the domain entity, not leaving any space for non-applicable outliers.
As some of you may have already deducted, the name associated with this active pattern is multicase active pattern. Its syntactic definition is also very distinguishable from the already considered cases. It has contained between the banana clips just few case names separated from each other by |
pipe symbols.
Let's delve into the illustrative sample. An e-commerce domain operating with payments considers different payment terms and policies. In particular, if the payment terms are not immediate, it make sense to introduce a certain policy or policies concerned with determining when each particular payment is due. Hence, given the date on which a service or merchandise was supplied, the corresponding payment is due or not due depends on the amount of time passed from that date to now.
The implementation using active patterns is very straightforward; just for simplicity, let's assume that the business has adopted a single policy of postponing the payments for no more than three days (certainly, the policy can be a subject of parameterization in a more sophisticated design) as shown here (Ch7_9.fsx
):
open System let (|Recent|Due|) (dt: DateTimeOffset) = if DateTimeOffset.Now.AddDays(-3.0) <= dt then Recent else Due let isDue = function | Recent -> printfn "don't do anything" | Due -> printfn "time to pay this one"
The function using the active pattern is also pretty simple, but this is OK for the purpose of illustration. The preceding code is presented in the following figure:
I forgot to mention that the maximal number of cases in F# 4.0 multicase active patterns as of today is limited to 7, which may be the limiting factor in using active patterns in some cases.