Filtering patterns

Also known as transformation patterns, filtering patterns find a subset of data, whether it be small, like a top 10 listing, or large, like the results of a deduplication:

Four patterns are presented in this chapter: filtering, bloom filtering, top ten, and distinct.

As the most basic pattern, filtering serves as an abstract pattern for some of the other patterns. Filtering simply evaluates each record separately and decides, based on some condition, whether it should stay or go. Filter out records that are not of interest and keep ones that are. Consider an evaluation function f that takes a record and returns a Boolean value of true or false. If this function returns true, keep the record; otherwise, toss it out.

The SingleMapper job seen earlier is a good example of a filtering patterns.

Depending on the use case, a transformation pattern can be customized to generate the intended output.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset