Using our ice cream example from Figure 16.9, with sample values in Table 16.5, recall the values for Ice Cream Theme Name:

·         Cake - Cake Ice Cream Theme

·         Trad – Traditional Ice Cream Theme

·         Winter – Winter Ice Cream Theme.

By examining this sample data and then validating it through a business expert, we see that Ice Cream Theme Name contains more than just the name of the theme. It also contains a theme description. Therefore, to resolve repeating data elements, we need to break description out from name, as shown in Figure 16.14. This now puts our data model in 1NF.

Figure 16.14 Ice cream example in 1NF

Returning to our contact example, by examining the data in Table 16.3 and the data model in Figure 16.13, and by asking even more questions, we can identify those data elements that need to be broken up into more refined data elements.

We may find ourselves asking many questions just to determine if there are any multi-valued data elements we need to identify. We can have another question template, such as:

Does a [[insert data element name here]] contain more than one piece of business information?

So these are all valid questions:             

·         Does a Contact Id contain more than one piece of business information?

·         Does an Email Address contain more than one piece of business information?

·         Does a Person Name contain more than one piece of business information?

·         Does a Person Title contain more than one piece of business information?

·         Does a Company Name contain more than one piece of business information?

·         Does a Mailing Address contain more than one piece of business information?

·         Does a Logo Image contain more than one piece of business information?

·         Does a Web Address contain more than one piece of business information?

·         Does a Phone Number contain more than one piece of business information?

·         Does a Specialty Description contain more than one piece of business information?

Often a modeler will encounter multi-valued data elements that do not need to be separated into distinct data elements. This is common when the distinct data element parts are not of interest to the business or industry and the data element is considered to be in its atomic form, even though it contains more than one piece of information. For example, Phone Number contains an area code and may contain a country code. We might, therefore, decide to show country code and area code as separate data elements on the model. However, do your business people ever need to see the area code or do they view phone number in the most granular form? The answer to this question determines whether the modeler breaks apart the Phone Number data element.

The cost of breaking apart a multi-valued data element unnecessarily is the potential for confusion when explaining the data model, extra development effort to break apart the actual data, and then extra development effort to put the pieces back together for the business user. There is also a chance that in putting the pieces back together, some data may be lost or changed, and the newly combined values no longer match the original data, thereby causing a data quality issue. Looking at the values from Table 16.3, we see that Phone Number contains different formats, and one phone number is even missing an area code. Phone Number, therefore, would be an example of a data element that would require extra development effort to break apart.

Assume that in our contact example, Person Name and Mailing Address are the only two data elements that require being shown in a more granular form. The model in Figure 16.16 resolves these two multi-valued data elements and is therefore in 1NF.

Recall the summary of all three normalization levels: Every data element must provide a fact about the key, the whole key, and nothing but the key. First Normal Form (1NF) is the Every data element must provide a fact about the key part. Second Normal Form (2NF) is the whole key part. This means each entity must have the minimal set of data elements that uniquely identifies each entity instance.

For example, in Figure 16.15, we have Employee Assignment.

Figure 16.15 Example of model not in 2NF

 

Figure 16.16 Contact data model with multi-valued data elements resolved, and therefore in 1NF

The primary key of Employee Assignment is Employee Identifier and Department Identifier. Do we have the minimal primary key for each of the non-key data elements? We would need to confirm with business experts whether the Employee Identifier is unique within a department or unique across the whole company. Most likely it is unique across the whole company, so we therefore do not have the minimal primary key for each of the non-key data elements. Employee Last Name probably needs only Employee Identifier to retrieve one value. Similarly, both Department Name and Department Cost Center most likely need only Department Identifier to retrieve one of their values. Therefore, we need to modify this model to get it into 2NF, as shown in Figure 16.17.

Figure 16.17 One option for putting the model in 2NF

Now Department Name and Department Cost Center are facts about only Department Identifier, and Employee Last Name is a fact about only Employee Identifier. The associative entity Employee Assignment links employees with their departments.

Normalization is a process of asking business questions. In this example, we could not complete 2NF without asking the business Can an Employee work for more than one Department at the same time? If the answer is Yes or Sometimes, then the model in Figure 16.17 is accurate. If the answer is No, then the model in Figure 16.18 prevails.

Figure 16.18 Another option for putting the model in 2NF

This model assumes answers to two other business questions:

·         Can a Department exist without an Employee?

·         Can an Employee exist without a Department?

The answers from the business would be Yes, a Department can exist without an Employee, and No, an Employee cannot exist without a Department.

As with 1NF, we will find ourselves asking many questions to determine if we have the minimal primary key. We can have another question template, such as:

Are all of the data elements in the primary key needed to retrieve a single instance of [[insert data element name here]]? In the Contact example shown in Figure 16.15, the ‘minimal set of primary key instances’ are Contact Id and Email Address.

So these are all valid questions for our contact example:             

·         Are both Contact Id and Email Address needed to retrieve a single instance of Person First Name?

·         Are both Contact Id and Email Address needed to retrieve a single instance of Person Last Name?

·         Are both Contact Id and Email Address needed to retrieve a single instance of Person Title?

·         Are both Contact Id and Email Address needed to retrieve a single instance of Company Name?

·         Are both Contact Id and Email Address needed to retrieve a single instance of Web Address?

·         Are both Contact Id and Email Address needed to retrieve a single instance of Address Line Text?

·         Are both Contact Id and Email Address needed to retrieve a single instance of Address City Name?

·         Are both Contact Id and Email Address needed to retrieve a single instance of Address State Province Code?

·         Are both Contact Id and Email Address needed to retrieve a single instance of Address Postal Code?

·         Are both Contact Id and Email Address needed to retrieve a single instance of Address Country Code?

·         Are both Contact Id and Email Address needed to retrieve a single instance of Logo Image?

We realize that the answer to all of these questions is No. We do not need both the Contact Id and Email Address in the primary key of Contact. Either one of these is enough to uniquely identify a contact and return a single instance of any of the data elements in Contact. Therefore, we only need one of these in the primary key. We have full control over the values of Contact Id, so let’s use it as the primary key, and make Email Address an alternate key. You must also consider the uniqueness of the Email Address – can an Email Address be shared by several contacts? If the answer is Yes, then Email Address cannot be a primary key, merely a non-unique candidate key.

The updated model is shown in Figure 16.19.

Figure 16.19 Contact model in 2NF

Asking the previous list of questions at this point might lead us to a more descriptive and refined model. For example, we realize there are people and company information in the Contact entity. We can therefore make our model more descriptive and capture more business rules if we introduce subtyping into this structure.

Subtyping captures many more rules, but before we can show them, we need to confirm them with the business by asking more questions. For example, here are the questions we would need to ask to arrive at the model in Figure 16.20:

·         Which data elements in Contact are shared by both a Person and Company?

·         Which data elements are just for a Person?

·         Which data elements are just for a Company?

·         Which relationships to Contact are shared by both a Person and a Company?

Note that we learn that the Logo Image is really just for a Company, and not for a Person.

Figure 16.20 Contact model in 2NF updated with subtyping

In a subtyping structure, the primary key from the supertype is also the primary key for each subtype. By default, the primary key attributes for the subtype have the same names as the primary key attributes for the supertype, as shown in Figure 16.20. The modeler often renames the subtype primary keys to make them more meaningful. Renaming any foreign key, including a subtype’s primary key, is called role naming. For example, we can role name Contact Id in Person to Person Id, and Contact Id in Company to Company Id. Person Id and Company Id still contain the same value as Contact Id, but renaming the subtype’s primary key facilitates better communication.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset