Active patterns

I promised in Chapter 4, Basic Pattern Matching, that I would add to the subject by covering active patterns; now is a perfect time. Remember matching with guards? Guards provide a way to drill down into the matched pattern-expression function by attaching an arbitrary calculation having the bool result.

Guard mechanism adds a certain customization potential to the vanilla pattern matching, but it is kind of detached: regardless of how much data decomposition is required in order to complete the guard calculation, all this effort is discarded for both matching and non-matching possible calculation results. Wouldn't it be nice to have a fully customizable transition between the recognition and transformation phases of pattern matching? Active patterns aim exactly at this matter. Broadly speaking, active patterns represent a special kind of function allowed to be used inside pattern-expression.

They allow you to implement some typical patterns of data transformations in a very terse and elegant manner as following:

  • Advanced transformations between types
  • Partitioning data into groups by relevant and irrelevant categories
  • Performing full categorization, in other words, taking any data and processing it according to this piece of data belonging to a specific category out of the couple given

Let's look at how active patterns play with each case of these data processing patterns.

Type transformations with active patterns

Active patterns use a special naming convention when being defined within a let binding:

  • The name of the active pattern function must begin with a capital letter even if it is a double-ticked like this: ``I'm active pattern``
  • The name of the active pattern function must be wrapped into banana clips(| and |) as in (|``Another active pattern``|)

The data in which an active pattern works always comes as the last argument in the definition and at the time of its use being taken from the context (match, function, or any other F# construction where pattern matching occurs); all but the last arguments in a multi-argument definition are parameters that generalize the active pattern workings.

Finally, when a literal is used at the place of the last argument, the pattern-expression  is considered matched when the result of the active pattern calculation matches the literal. If a name is used instead of the literal, then this name gets bound to the result of the active pattern calculation to be used in the corresponding result-expression transformation.

Does this sound confusing? In fact, it is easier than it may sound. Let me turn to some illustrative samples that might help.

The first one represents a dummy sample as shown in the following code (Ch7_7.fsx):

let (|Echo|) x = x 
let checkEcho p =  
  match p with 
  | Echo 42 -> "42!" 
  | Echo x -> sprintf "%O is not good" x 

The Echo active pattern is very minimalistic; it just echoes the input into the result. Then, the checkEcho function puts this definition to use. In the first pattern-expression, it simply checks whether the result of the Echo p calculation (p is implicitly taken from the head of the match construction) equals 42. If it does, then the corresponding result expression returns string "42!". Otherwise, the next result-expression is evaluated by unconditionally binding the result of the Echo p calculation to variable x, which in turn is used in result-expression to produce a "... is not good" string.

So, when using the preceding sample in FSI, checkEcho 0 produces "0 is not good", while checkEcho 42 produces "42!".

Is it getting clearer? Another simple sample reinforcing this understanding would be an active pattern:

let (|``I'm active pattern``|) x = x + 2 

While keeping the same type for the argument and result, this performs just a simple value transformation. The usage of the above active pattern is shown in the following screenshot:

Type transformations with active patterns

A simple type transformation with an active pattern

The binding let (|``I'm active pattern``|) x = x + 2 that defines the active pattern does not match anything; instead, it takes the matched value and returns it, adding 2.

The binding let x = match 40 with ``I'm active pattern`` x -> x is used as a part of the match construct and given the input argument 40, it returns x bound to a sum value of 42.

The binding let (``I'm active pattern`` x) = 40 is a slightly mind boggling example that becomes clear if you remember that the let binding of a value is a corner case of pattern matching based data disassembling, so ``I'm active pattern`` gets applied to input argument 40 and binds the result 42 to x.

At this point, this specific use case of applying active patterns for data transformations should be clear enough; I want to apply it in a more practically sound use case.

It is a rather widespread technique to use globally unique identifiers, or GUIDs (https://en.wikipedia.org/wiki/Globally_unique_identifier) to label unique entities popping up in the course of running a business. For example, in Jet.com, GUIDs are used to label customer orders, merchant orders, merchant order items, shipping, fulfillment centers, SKUs...the complete list would be too long. These codes are mostly exchanged and displayed as strings of 32 hexadecimal digits. In some nodes of the system, it is required that you validate that a given string is a legitimate representation of a GUID. This task can be easily performed with the help of active patterns as shown here (Ch7_7.fsx):

let hexCharSet = ['0'..'9'] @ ['a'..'f'] |> set in 
let (|IsValidGuidCode|) (guidstr: string) = 
  let (|HasRightSize|) _ = guidstr.Length = 32 
  let (|IsHex|) _ = (guidstr.ToLower() |> set) = hexCharSet 
  match () with (HasRightSize rightsize & IsHex hex)-> rightsize && hex  

The preceding code has many interesting bits and pieces, such as the set of allowable hexCharSet hexadecimal characters that are calculated only once and are local to the active pattern IsValidGuidCode definition; the pair of internal active patterns HasRightSize and IsHex, each responsible only for the single verified property and disregarding its own input argument using one from the outer active pattern instead; and finally, the way two pattern-expressions are combined with &, again omitting the argument as it is already delivered to their bodies and combining the final result within result-expression based upon entities distilled in the complementary pattern-expression. Those of you who fully understand how the preceding snippet works can claim yourselves to be experts on the subject of active patterns.

To ensure that this code really works, let me perform a quick test drive. The upcoming figure reflects the results of this test, showing that the IsValidGuidCode active pattern correctly identifies the "abc" string as an invalid GUID and "0123456789AbCdEfFFEEDDCCbbAA9988 " as a valid one:

Type transformations with active patterns

Verifying a GUID string using active patterns

By the way, active patterns of the (|active pattern name|) form that I have covered so far are named single total active patterns, as they deal with a single data type, transforming it into the same or a different data type by the enclosed calculation. Another peculiarity of considered samples is that all of them were working on a single argument. I will cover active patterns with parameters later in this chapter.

Data partitioning with active patterns

My next foray into F# active patterns use as processing patterns is concerned with the typical practice of having data that may constitute one or more cases suitable for the processing and "just the rest" unsuitable. In the spirit of F#'s ubiquitous use of options active patterns capable of performing the above manner of partitioning transform the input data type into an Option type, where the None case represents unsuitable data and Some wraps one or more types of suitable data.

The definition of such active patterns is unambiguously distinguishable by having |_ characters prepended to the right-hand side banana clip |) of the active pattern definition. The active patterns of this type are called partial active patterns and their name groups look like this: (|name 1[|name 2...]|_|). Let's consider a pretty sizeable piece of real code from one of the Jet.com production systems to demonstrate this technique.

The task at hand is to process the invoices from Jet.com vendors (shipping carriers, payment processors, and others) that package their data in the form of comma-separated files. I use "comma-separated" broadly here, as separators can be any characters. Files may or may not have headers and can carry just a gazillion other irregularities. Uploading these invoices for processing followed by archiving is a problem that carries a certain complexity.

For the purposes of this chapter, I will take only a partially related problem, namely recognizing whether the last uploaded file is of a known Processable type and should be processed or whether it is not and should be rejected.

In order to keep the code implementing the preceding task reasonably short for the purposes of the book, I'll limit the number of vendors to just three, that is, FedEx and OnTrac shipping carriers and the Braintree payment processor.

I begin with the Processable here that lists known vendor files as following (Ch7_8.fsx):

type Processable = 
| FedexFile 
| OnTracFile 
| BrainTreeFile 
with 
  override this.ToString() = match this with 
    | FedexFile -> "Fedex" 
    | OnTracFile -> "OnTrac" 
    | BrainTreeFile -> "BrainTree" 

Nothing fancy here; just a common practice of representing domain entities with discriminated unions, perhaps slightly augmented.

Next, file headers are hardcoded here and also significantly stripped from the right-hand side as the complete contents do not matter much as shown here (Ch7_8.fsx):

let BraintreeHdr = "Transaction ID,Subscription ID,..." 
let FedexHdr = ""Bill to Account Number";"Invoice Date";..." 
let OntracHdr = "AccountNum,InvoiceNum,Reference,ShipDate,TotalCharge,..." 

And finally, the active pattern definition is as follows (Ch7_8.fsx):

let (|IsProcessable|_|) (stream: Stream) = 
  use streamReader = new StreamReader(stream) 
  let hdr = streamReader.ReadLine() 
  [(Processable.BrainTreeFile,BraintreeHdr); 
  (Processable.FedexFile,FedexHdr); 
  (Processable.OnTracFile,OntracHdr)] 
  |> List.tryFind (fun x -> (snd x) = hdr) 
  |> function 
  | None -> (if hdr.StartsWith(""1",") then 
    Some (Processable.OnTracFile) else None) 
  | _ as zx -> Some (fst zx.Value) 

The active pattern name, as expected, points to the partial active pattern, the argument is of type System.IO.Stream carrying the file contents, and its return is of type Processable option.

The function first creates StreamReader and reads the first line from there into the hdr value.

Then, it takes a list of tuples, which members perform pairing of Processable cases with the string literals denoting the corresponding comma-separated file headers and tries to find the element that has the second part of the tuple that is equal to the hdr. If such exists, then the file can be processed and the function returns option value Some, wrapping the first part of the found tuple.

If the element is not found (option value None case), consider at this point that often OnTrac files may not carry headers. To exploit this knowledge, I examine a bit more into the already taken stream contents and whether the file begins with some symbols pointing to the OnTrac origin the active pattern returns Some (Processable.OnTracFile); otherwise, the file is considered non-processable.

In my opinion, the IsProcessable active pattern represents quite a terse and clean implementation of the business feature.

Data categorization with active patterns

I wrap up our journey into the exciting world of F# active patterns with the active pattern type that applies to the processing pattern of categorization, or partitioning the data into the entirety of subcategories that completely cover the domain entity, not leaving any space for non-applicable outliers.

As some of you may have already deducted, the name associated with this active pattern is multicase active pattern. Its syntactic definition is also very distinguishable from the already considered cases. It has contained between the banana clips just few case names separated from each other by | pipe symbols.

Let's delve into the illustrative sample. An e-commerce domain operating with payments considers different payment terms and policies. In particular, if the payment terms are not immediate, it make sense to introduce a certain policy or policies concerned with determining when each particular payment is due. Hence, given the date on which a service or merchandise was supplied, the corresponding payment is due or not due depends on the amount of time passed from that date to now.

The implementation using active patterns is very straightforward; just for simplicity, let's assume that the business has adopted a single policy of postponing the payments for no more than three days (certainly, the policy can be a subject of parameterization in a more sophisticated design) as shown here (Ch7_9.fsx):

open System 
 
let (|Recent|Due|) (dt: DateTimeOffset) = 
  if DateTimeOffset.Now.AddDays(-3.0) <= dt then Recent 
  else Due 
 
let isDue = function 
| Recent -> printfn "don't do anything" 
| Due  -> printfn "time to pay this one" 

The function using the active pattern is also pretty simple, but this is OK for the purpose of illustration. The preceding code is presented in the following figure:

Data categorization with active patterns

Multi-case active patterns for data categorization

I forgot to mention that the maximal number of cases in F# 4.0 multicase active patterns as of today is limited to 7, which may be the limiting factor in using active patterns in some cases.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset