Chapter 8. Data Crunching – Data Transformation Patterns

After dealing with advanced patterns of the function definition and application in the previous chapter, I want to revisit the topic that was just slightly scratched in Chapter 6, Sequences - The Core of Data Processing Patterns in connection with sequences. There, I claimed that the quite bulky Collection.seq library absorbs and implements just a handful of universal data processing patterns. Then I regrouped the library members by assigning to one of these patterns.

This chapter digs deeper into these patterns of data transformation that are applicable not only to sequences, but also to other data collections. The goal of this chapter is to help you develop the skill to convey your data processing needs with functions belonging to a handful of typical polymorphic transformation categories composed of a handful of combinators, and by operating upon data collection types that are best suitable for the task at hand. This approach allows you to uniformly cover the widest assortment of specific data transformations. Sticking to the above approach is essential for F# programmer practitioners as it effectively curbs the development of lengthy custom solutions without compelling reasons and overall adds to the positive properties of F# programs, such as succinctness, correctness, and performance.

In this chapter, we will inspect:

  • How normalization of data transformation libraries in F# 4.0 reflects upon underlying transformation patterns commonalities. These commonalities have a polymorphic nature being applicable to the various data collections that the libraries aim.
  • How the transformation patterns scooped in Chapter 6, Sequences - The Core of Data Processing Patterns reveal themselves over various data collections.

It will be a long trip, so please stay with me, cool and hydrated.

Core data transformation libraries in F# 4.0

One of the enhancements to the FSharp.Core run-time brought by F# 4.0 is normalized data collection modules (https://blogs.msdn.microsoft.com/fsharpteam/2014/11/12/announcing-a-preview-of-f-4-0-and-the-visual-f-tools-in-vs-2015/). It is quite interesting that this development:

  • Confirms the commonality of data processing patterns across data processing platforms. Functions such as map or filter can be found in functional programming languages such as F#, query tools such as LINQ, and scripting engines such as PowerShell, to name a few.
  • Recognizes that concrete functions belonging to these patterns are polymorphic and may be uniformly apply across different data collection types. F# 4.0 successfully delivers this polymorphism over the most frequently used data collection types, namely for Array, List, and Seq modules.

Overall, this library normalization added 95 new optimized per collection type function implementations to F# 4.0 data crunching offering. This addition bumps the overall amount of individual functions in the previously mentioned three collection modules to 309 (as of April 2016), which is definitely a sizable result. However, it would be really challenging for a random developer to memorize and recall this arrangement by heart without recognizing some formative principles.

Considering that most of the functions apply uniformly to three base collection types (some of them naturally do not apply to some concrete collections; for example, toList does not apply to List), this still leaves 117 (as of April 2016) different function names just for the base data collections. And do not forget about a certain number of functions related to less widely used data collections, such as set, IDictionary, or Array2D. How should you approach this variety?

Fortunately, the number of data transformation patterns is just a handful. Recognizing the underlying pattern most often imposes an order on associated library functions, leaving just a dozen or so functions associated with each pattern. Such categorized amounts are much easier to recall.

In the rest of the chapter, we will examine these concealed patterns and their correspondent cohesive function groups. The provided idiomatic code examples facilitate the pattern retention, recognition, and reuse.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset