Chapter 9

Non-Data-Warehouse Antipatterns

Most applications rely on their database structure to help enforce data quality. Data warehouse applications are an exception—a data warehouse emphasizes the reading of data and makes data quality the responsibility of loading programs. The antipatterns in this chapter simplify reading but compromise the ability of database structure to enforce quality. Hence, these antipatterns are acceptable for data warehouses, but you should avoid them otherwise.

9.1 Derived Data Antipattern

9.1.1 Observation

A model has elements (entity types, relationships, attributes) that are not fundamental. These elements can be computed from other elements and lack substance in their own right. Derived data can complicate development and increases the likelihood of data corruption.

9.1.2 Exceptions

Consider using derived data for critical application elements or to resolve performance bottlenecks. For example, it is faster to store the current security positions for a broker account, than to compute them by processing past transactions. Derived data is often used for data warehouses to speed performance and ease the writing of queries.

9.1.3 Resolution

When possible, rework the model to eliminate derived elements. Otherwise carefully document derived data and pay special attention to checking for inconsistencies in your test plan. Sometimes it is helpful to use a generic mechanism to keep derived data consistent with base data—for example, it is entirely acceptable to have derived data that is computed via a database view.

9.1.4 Examples

In Figure 9.1 it is much better to include a person's birthdate rather than age which changes over time. The original model is also undesirable for a data warehouse.

Figure 9.1

Figure showing derived data: UML person model. Try to avoid derived data and focus on fundamental data.

Derived data: UML person model. Try to avoid derived data and focus on fundamental data.

Figure 9.2 and Figure 9.3 show a partial model for a library. The slash prefix in Figure 9.2 is UML notation for derived data. The dueDate can be computed from checkoutDate and checkoutPeriod. The calculatedFine is computed according to the formula. Figure 9.4 shows SQL Server code for the calculations.

Figure 9.2

Figure showing derived data: UML library model. If you do use derived data, then carefully document it.

Derived data: UML library model. If you do use derived data, then carefully document it.

Figure 9.3

Figure showing derived data: IDEF1X library model.

Derived data: IDEF1X library model.

Figure 9.4

Figure showing derived data: Sample SQL code.

Derived data: Sample SQL code.

Figure 9.5 shows a proper use of derived data for a fact table of a simple data warehouse. The gross profit is the retail price less the wholesale cost.

Figure 9.5

Figure showing derived data: Sale fact in a data warehouse. Derived data is often acceptable for data warehouses.

Derived data: Sale fact in a data warehouse. Derived data is often acceptable for data warehouses.

9.2 Parallel Attributes Antipattern

9.2.1 Observation

An entity type has groups of similar attributes. Such a model can be brittle, verbose, and awkward to extend.

9.2.2 Exceptions

Parallel attributes are often used for data warehouses to ease data retrieval. Data warehouses forego the enforcement of data quality, leaving that responsibility to the feeder applications and scripts that load data.

9.2.3 Resolution

Abstract and factor out commonality.

9.2.4 Examples

Figure 9.6a is taken from a legacy application. The repetition is obvious. An organization has different kinds of products for which both sales and profit are recorded. Figure 9.6b avoids the repetition and readily extends to new products and financial metrics. The original model is also undesirable for a data warehouse.

Figure 9.6

Figure showing parallel attributes: UML financial model. Abstract and factor out commonality.

Parallel attributes: UML financial model. Abstract and factor out commonality.

Dimensions and facts of data warehouses often have parallel attributes. For example, saleQuantity and salePrice in Figure 6.6 are parallel attributes. Figure 9.7 shows a Customer dimension with parallel attributes for address and phone number.

Figure 9.7

Figure showing parallel attributes: Customer dimension. Parallel attributes are often acceptable for data warehouses.

Parallel attributes: Customer dimension. Parallel attributes are often acceptable for data warehouses.

9.3 Parallel Relationships Antipattern

9.3.1 Observation

Two entity types have several (at least three) similar relationships.

9.3.2 Exceptions

As with parallel attributes, parallel relationships can be acceptable for a data warehouse in order to simplify queries for retrieving data.

9.3.3 Resolution

Abstract and factor out commonality.

9.3.4 Example

In Figure 9.8, a person can participate in making a movie in various ways. The left model shows five movie roles. The right model permits any number of roles.

Figure 9.8

Figure showing parallel relationships: UML movie model. Abstract and factor out commonality.

Parallel relationships: UML movie model. Abstract and factor out commonality.

Figure 9.9 shows another representation for a Customer dimension. The various addresses can be noted via parallel relationships. The use of parallel relationships could be preferred to Figure 9.7 if there were extensive Address fields to record.

Figure 9.9

Figure showing parallel relationships: Customer dimension. Parallel relationships are often acceptable for data warehouses.

Parallel relationships: Customer dimension. Parallel relationships are often acceptable for data warehouses.

9.4 Combined Entity Types Antipattern

9.4.1 Observation

An entity type has disparate attributes and lacks cohesion.

9.4.2 Exceptions

Giant tables are reasonable for input/output staging. It is also acceptable to combine entity types for the dimension tables of data warehouses.

9.4.3 Resolution

Make each concept its own entity type.

9.4.4 Example

In Figure 9.10 the contact position and contact phone depend on the contact name which in turn depends on the account. In the left model, several account records could have the same contact name with inconsistent positions and phone numbers. Note that the left model violates third normal form.

Figure 9.10

Figure showing combined entity types: UML customer account model. Make each concept its own entity type.

Combined entity types: UML customer account model. Make each concept its own entity type.

As a counterexample, it can be appropriate to combine entity types for a data warehouse. Specifically, such a combination can flatten a dimension. Many dimensions, when logically modeled, consist of several related entities. Data warehouses often collapse such snowflakes into a single dimension entity. For example, Figure 9.11 collapses the Product dimension of Figure 6.6 into a single entity.

Figure 9.11

Figure showing combined entity types: Collapsing a snowflaked dimension. It is acceptable to combine entity types for a dimension of a data warehouse.

Combined entity types: Collapsing a snowflaked dimension. It is acceptable to combine entity types for a dimension of a data warehouse.

9.5 Chapter Summary

An antipattern is a characterization of a common software flaw. The antipatterns in Table 9.1 simplify reading but compromise the ability of database structure to enforce quality. These antipatterns are often acceptable for data warehouses, but you should avoid them otherwise.

Table 9.1

Summary of Non-Data-Warehouse Antipatterns

Antipattern name

Observation

Exceptions

Resolution

Frequency

Derived data

A model has elements that are not fundamental.

OK for critical elements, bottlenecks, and data warehouses.

Rework the model to eliminate derived data.

Common

Parallel attributes

An entity type has groups of similar attributes.

Often used for data warehouses.

Abstract and factor out commonality.

Occasional

Parallel relationships

Two entity types have several similar relationships.

Can be acceptable for a data warehouse.

Abstract and factor out commonality.

Occasional

Combined entity types

An entity type has disparate attributes.

OK for I/O staging and data warehouses.

Make each concept its own entity type.

Occasional

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset