There are five denormalization techniques:

·         Standard

·         FUBES

·         Repeating groups

·         Repeating data elements

·         Summarization.

We will apply each of these five techniques to the Contact logical data model from the previous chapter, repeated here in Figure 17.1.

Figure 17.1 Contact logical data model

The standard method is the most common of the five denormalization techniques. The parent entity in the relationship disappears, and all of the parent’s data elements and relationships are moved down to the child entity. You’ll recall that the child entity is on the many side of the relationship and contains a foreign key back to the parent entity, which appears on the one side of the relationship.

Although the standard technique is traditionally applied to a one-to-many relationship, we can illustrate the standard technique using the subtyping structure from Figure 17.1. The subtyping structure must be resolved on the physical data model, as there is no subtyping relationship defined in a relational database. There are three ways of resolving the subtyping symbol on the physical data model: identity, rolling down, and rolling up.

Identity is the closest to subtyping, itself, because the subtyping symbol is replaced with a one-to-one relationship for each supertype/subtype combination. The main advantage of identity is that all of the business rules at the supertype level and at the subtype level remain the same as in the logical model. That is, we can continue to enforce relationships at the supertype or subtype levels, as well as enforce that certain data elements be required at the supertype or subtype levels. The main disadvantage of identity is that it can take more time to retrieve data, as it requires navigating multiple tables to access both the supertype and subtype information. Identity is not a form of denormalization, but it is shown in Figure 17.2 for completeness.

Figure 17.2 Identity method of resolving subtyping

In this example, we can continue to enforce certain rules at the supertype level, such as a Contact can have only one City and must have one Email Address. We can also enforce rules at the subtype level, such as a Person can work for one Company, and Person Last Name is a required data element that must always contain a value.

  In Figure 17.2, note the difference in the notation for dependent references (relationships). I’m sure you remember that, in the Logical Data Model, the triangle on the relationship indicates dependency, as shown below:

In the Physical Data Model, there is no notation to represent dependency. Here is the equivalent reference:

  In Figure 17.2, the table and column names include spaces, but this is not a software bug, nor is it a problem when generating the database. Remember that PowerDesigner draws a distinction between the Name and the Code of an object, and allows you to determine how Codes are derived from Names, or vice versa. It is the Code of an object, not the name, that is generated in the database. The columns in the Contact Phone table below illustrate default code names, which are suitable for use in a relational database.

You can choose to display object codes in symbols, rather than their names.

Another standard denormalization method is to eliminate table joins by merging tables. If you merge a parent table into a child table, it’s referred to as rolling down. Merging child tables into a parent table is referred to as rolling up.

  The ‘Collapse Tables’ feature allows you to roll up or roll down tables.

In our example model, rolling down is when the supertype is ‘rolled down’ into each subtype, moving all of the data elements and relationships from the supertype into each subtype, then removing the supertype from the data model. Rolling down can produce a more user-friendly structure than identity or rolling up, because subtypes are often more concrete concepts than supertypes, making it easier for the users of the data model to relate to the subtypes. For example, users of our Contact model would probably feel more comfortable with the concepts of Person and Company, rather than the concept of Contact. However, we are repeating relationships and data elements, which could reduce any user-friendliness gained from removing the supertype. In addition, the rolling down technique enforces only those rules present in the subtypes. This could lead to a less flexible data model, as we can no longer easily accommodate new subtypes without modifying the data model. See Figure 17.3 for what rolling down would look like in our Contact example.

Figure 17.3 Rolling down method of resolving subtyping

 

In the rolling down technique, we no longer have the Contact concept, and everything at the supertype level has been copied down into each subtype. Notice that although we have the user friendly concepts of Person and Company, we also have the redundancy of extra relationships to Phone, Specialty, and City, as well as repeating Email Address, Web Address, and address information for each subtype.

In Figure 17.3, two of the table names include the word ‘Collapsed’ – these tables were created by PowerDesigner during the de-normalization process. For the time being, I have retained the original Contact, Person, and Company tables in the model, though they don’t appear in the diagram. The Rolling Down technique can be driven from the LDM, by unchecking the ‘Generate parent’ option in the inheritance property sheet, and selecting ‘Generate children’.

Rolling up is when each subtype is ‘rolled up’ into the supertype, moving the data elements and relationships from each subtype into the supertype. The subtypes disappear and all data elements and relationships only exist at the supertype level. Rolling up adds flexibility to the data model because new types of the supertype can be added, often with no model changes. However, rolling up can also produce a more obscure model, as the audience for the model may not relate to the supertype as well as they would to the subtypes. In addition, we can only enforce business rules at the supertype level, not the subtype level. See Figure 17.4 for rolling up in our Contact example.

When we roll up, we need a way to distinguish the original subtypes from each other. Therefore, we add a data element that distinguishes people from companies, in this case Contact Type Code. Two of the values of Contact Type Code would represent ‘Person’ and ‘Company’.

Using rolling up, we still retain the business rules for Contact, yet lose the rules that were only enforced for Person or Company. For example, we cannot make Company Name a mandatory data element in Contact because a Person does not have a Company Name and may not be assigned to a Company.

The Rolling Up technique can be driven from the LDM, by unchecking the ‘Generate children’ option in the inheritance property sheet, and selecting ‘Generate parent’.  This causes all the attributes from sub-types to be migrated to the super-type entity.

In addition to choosing denormalization because of the need for faster retrieval time or for more user friendly structures, the standard way of denormalizing can be chosen in the following situations:

·         When you need to maintain the flexibility of the normalized model. Folding the data elements and relationships together using the standard approach still allows one-to-one and one-to-many relationships to exist. In Figure 17.4 for example, we did not lose the flexibility that a Contact can be a Person or Company (it is just harder to see and enforce).

·         When you want to reduce development time and complexity. Often there is a direct relationship between the number of tables and relationships on a model, and the amount of effort it will take to develop the application. A developer will need to write code that jumps from one table to another to collect certain data elements, and this can take time and add complexity. Denormalizing into fewer tables using the standard method means the data elements and relationships from different entities now exist in the same entity. In Figure 17.4, for example, if the developer needs to retrieve both the person and company name, they can easily do so from the same entity, Contact.

Use the ‘Table Collapsing’ feature to apply standard denormalization

Figure 17.4 Rolling up method of resolving subtyping

FUBES (Fold Up But Easily Separate) is an acronym I made up for a technique that uses the standard method of denormalizing while also allowing access to just the data elements from the parent side of a one-to-many relationship. There is an additional data element that contains a level code and additional instances for each of the parents. In Figure 17.5, for example, we have a subset of the logical data model for Calendar.

Figure 17.5 Subset of the Calendar logical data model

Table 17.2 contains sample values for each of these entities.

Table 17.2 Sample values for Figure 17.5

Year

Year Code

Year Description Text

2007

Two Thousand Seven

2008

Two Thousand Eight

2009

Two Thousand Nine

 

Quarter

Quarter Code

Year Code

Quarter Description Text

Q12009

2009

First Quarter Two Thousand Nine

Q22009

2009

Second Quarter Two Thousand Nine

Q32009

2009

Third Quarter Two Thousand Nine

 

Month

Month Code

Quarter Code

Month Description Text

Jan2009

Q12009

January Two Thousand Nine

Feb2009

Q12009

February Two Thousand Nine

Mar2009

Q12009

March Two Thousand Nine

 

If we decide to denormalize these three entities using the FUBES option, we would have the one entity in Figure 17.6.

Figure 17.6 FUBES being applied to Calendar

Table 17.3 contains sample values for this table.

You may be wondering what value FUBES provides. After all, we are adding a substantial amount of redundancy when using FUBES: we have all of the redundancy of the standard option, plus we are repeating each parent for the child, and repeating all of the parents for that parent. That is, repeating each quarter in our example, plus repeating the year for each quarter.

FUBES should be chosen when there is value in denormalizing, but there is still a need to access parent instances. Having an instance for each parent allows us to achieve better report performance, as we can directly tie to parent levels without having to roll up from the child. We can store sales at a year level, for example, and save the time of summarizing monthly level sales up into a year level. The value is that results can be retrieved extremely fast if we are doing reporting at the parent levels, in this example at the quarter and year level. For example, see Figure 17.7.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset