WHAT DENORMALIZATION ISN’T (II)

So far in this chapter I’ve given what I think is a reasonable definition of what denormalization is, and I’ve given some examples of what it isn’t. However, perhaps it was simply a mistake on my part to think the term ought to be used in any kind of precise or logical sense. Certainly it’s used very imprecisely in the industry at large; in fact, it seems to be used—especially in a data warehouse context—to refer to just about anything that could be regarded as bad design practice. What’s more, the practices in question are often explicitly recommended! (by the people who talk that way, that is). Examples of such bad practice include:

  • Using repeating groups

  • Permitting duplicate rows

  • Using nulls; worse (?), allowing nulls in keys

  • Mixing different kinds of information in the same column (using a separate “flag” column to specify the type of individual values in the column in question)

  • Using a single text column to represent what ought logically to be distinct columns

I’d like to add a note here on star schemas, since the “star schema” concept and “denormalization” are often mentioned together.[80] The basic idea behind this concept as follows. Suppose we wish to collect a history of business transactions for analysis purposes; for example, suppose in the case of suppliers and parts that we wish to record, for each shipment, the particular time interval in which that shipment occurred. Thus, we might identify time intervals by a time interval identifier (TINO), and introduce another relvar TI to relate those identifiers to the corresponding time intervals per se. The revised shipments relvar SP and the new time intervals relvar TI might look as shown in Figure 8-1. In star schema terminology, SP is the fact table and TI is a dimension table. The suppliers relvar S and the parts relvar P are also dimension tables (see Figure 8-2).[81] And the overall structure is referred to as a “star schema” because of a fancied resemblance of the corresponding entity/relationship diagram to a star, with the fact table being surrounded by—and connected by “spokes” or “rays” to—the dimension tables, as shown in Figure 8-2. (Those “spokes” or “rays” represent foreign key references, of course.)

Sample fact table (SP) and dimension table (TI)

Figure 8-1. Sample fact table (SP) and dimension table (TI)

Star schema for suppliers and parts (with time intervals)

Figure 8-2. Star schema for suppliers and parts (with time intervals)

Now, you might be wondering what the difference is between a star schema and a conventional relational design. In fact, a star schema for a simple example like the one under discussion is likely to be identical to a good relational design. In more complex situations, however, the dimension tables are often less than fully normalized (the objective here apparently being to avoid joins).[82] What's more, other relational design recommendations are often violated, too (see the bullet list earlier in this section).

Detailed discussion of star schemas and related matters is beyond the scope of this book; you can find a slightly more extended discussion in my book An Introduction to Database Systems (8th edition, Addison-Wesley, 2004).



[80] At least one authority claims it’s misleading to refer to star schemas as denormalized, however. “[The] use of denormalized when describing a star implies that the design started out as normalized. Most designs are not produced in such a manner. Not normalized would be a better description” (from Star Schema: The Complete Reference, by Chris Adamson, McGraw-Hill, 2010).

[81] For simplicity I choose to ignore (just for the sake of thye present discussion) the fact that the FD {CITY} → {STATUS} is supposed to hold in relvar S, and hence that relvar S is less than fully normalized.

[82] In this connection, consider this advice from a book on data warehouses: “[Resist] normalization ... Efforts to normalize any of the tables in a dimensional database solely in order to save disk space [sic!] are a waste of time ... The dimension tables must not be normalized ... Normalized dimension tables destroy the ability to browse” (from Ralph Kimball, The Data Warehouse Toolkit, John Wiley & Sons, 1996).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset