Third Normal Form (3NF)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

PowerDesigner allows you to rename ordinary foreign key attributes, but it does not allow you to rename inherited supertype attributes. In Figure 16.20, therefore, the subtype entities are both identified by the ‘Contact Id’ attribute, though we may prefer to rename them to Person Id and Company Id.

Figure 16.21 shows a simple PowerDesigner model with renamed foreign key attributes, and illustrates the benefits and a potential pitfall of this approach.

Figure 16.21 Telephone Call Model with Attribute Role Names

An important relationship that was assumed in Figure 16.1 is missing in Figure 16.20; people work for Companies. By having the data elements together in the same entity in Figure 16.19, we assumed this was the case. Can a Person work for more than one Company? Can a Company contain more than one Person? These questions need to be answered by the business.

Let’s assume the answers to these questions are that a Person can only work for one Company, but doesn’t have to work for a Company at all, and that a Company can employ many People, but doesn’t have to employ any People. The updated model is shown in Figure 16.22. Note that the foreign key attribute in the Person entity has been renamed to ‘Employing Company Id’, to reflect the relationship role.

Figure 16.22 Contact model in 2NF updated with Person Company relationship

Third Normal Form (3NF)

Again, recall our summary of all three normalization levels: Every data element must provide a fact about the key, the whole key, and nothing but the key. First Normal Form (1NF) is the “Every data element must provide a fact about the key” part. Second Normal Form (2NF) is “the whole key” part, and Third Normal Form (3NF) is the “nothing but the key” part.

3NF requires the removal of hidden dependencies. Each data element must be directly dependent on the primary key, and not directly dependent on any other data elements within the same entity.

The data model is a communication tool. The relational logical data model communicates which data elements are facts about the primary key and only the primary key. Hidden dependencies complicate the model and make it difficult to determine how to retrieve values for each data element.

To resolve a hidden dependency, you will either need to remove the data element that is a fact about non-primary key data element(s) from the model, or you will need to create a new entity with a different primary key for the data element that is dependent on the non-primary key data element(s).

As with 1NF and 2NF, we will find ourselves asking many questions to uncover hidden dependencies. We can have another question template, such as:

“Is” [[insert data element name here]] “a fact about any other data element in this same entity?”

So these are all valid questions for the Contact entity within our contact example:

· Is Web Address a fact about any other data element within this same entity?

· Is Address Line Text a fact about any other data element within this same entity?

· Is Address City Name a fact about any other data element within this same entity?

· Is Address State Province Code a fact about any other data element within this same entity?

· Is Address Postal Code a fact about any other data element within this same entity?

· Is Address Country Code a fact about any other data element within this same entity?

We learn from asking these questions that there are some hidden dependencies within the address data elements. If we know the postal code, we can determine the city, state province, and country. Therefore, by moving city, state province, and country to a different entity, as shown in Figure 16.23 our model is now in 3NF.

Figure 16.23 Contact model in 3NF

You may be wondering why we did not also move country and state province to their own entities. After all, if we know the city, we should know the state province, and if we know the state province, we should know the country. The reason we did not break these apart is because we do not have enough information to guarantee we can find the single country for a given state province or the state province for a given city. Take Halifax for example – there are towns and cities called Halifax in England and Canada, and in North Carolina and Virginia, in the United States.

Note that we needed to ask even more questions to determine the right cardinality on the relationship line between City and Contact:

· Does a Contact have to have a City?

· Does a City have to contain at least one Contact?

We learned from the business that no, a contact does not have to have a city. We also learned that there could be cities in which no contacts reside. Therefore, the answer to the second question is “No”.

You will find that the more you normalize, the more you go from applying rules sequentially to applying them in parallel. For example, instead of first applying 1NF to your model everywhere, and then when you are done applying 2NF, and so on, you will find yourself looking to apply all levels at once. This can be done by looking at each entity and making sure the primary key is correct and that it contains a minimal set of data elements, and that all data elements are facts about only the primary key.

Abstraction Explained

Normalization is a mandatory technique on the relational logical data model. Abstraction is an optional technique. As mentioned earlier, abstraction brings flexibility to your data models by redefining and combining some of the data elements, entities, and relationships within the model into more generic terms.

Returning to our contact example, we may decide to abstract Email Address and the phone number structure into a more generic Communication Medium structure. Communication Medium could include any method of communicating with a contact, including email and phone numbers, as well as all future types of communication mediums such as text messaging, or anything else that comes along. This more abstract Contact data model, based on the model from Figure 16.23, is shown in Figure 16.24. The new entity symbols have a bolder line style than the others.

Table 16.6 contains some of the values in the Communication Medium structure.

Figure 16.24 Contact data model with Communication Medium

Table 16.6 Business card values in Communication Medium structure

Contact

Contact Id	Web Address	Address Line Text	Postal Code
123	www.stevehoberman.com	10 Main St	10021
54	findsonline.com
58	findsonline.com
42
14		58 Church Avenue	08901

Communication Medium Type

Communication Medium Type Code	Communication Medium Type Description
01	Telephone
02	Fax Number
03	Email Address
04	Text Messaging

Table 16.6 Business card values in Communication Medium structure (continued)

Communication Medium

Communication Medium Value	Communication Medium Type Code
212-555-1212	01
(973)555-1212	01
732-555-1212	01
908-333-1212	01
908-555-1212	02
554-1212	01
[email protected]	03
[email protected]	03
[email protected]	03
[email protected]	03
[email protected]	03

Contact Communication Medium

Contact Id	Communication Medium Value
123	212-555-1212
54	973-555-1212
58	973-555-1212
42	732-555-1212
14	908-333-1212
14	908-555-1212
14	554-1212
123	[email protected]
54	[email protected]
58	[email protected]
42	[email protected]
14	[email protected]

Notice the extra flexibility we gain with abstraction. A new email address for findsonline.com, for example, leads to a new entity instance in Communication Medium and Contact Communication Medium, instead of a new data element on a model, with subsequent database and application changes. Abstraction allows for greater flexibility, but does come with a price. Actually, three high prices:

· Loss of communication. The concepts we abstract are no longer represented explicitly on the model. That is, when we abstract, we often convert column names to entity instances. For example, Email Address is no longer a data element on the data model in Figure 16.24, but is, instead, an entity instance of Communication Medium, with a Communication Medium Type Code value of ‘03’ for ‘Email Address’. One of the main reasons we model is for communication, and abstracting can definitely hinder communication.

· Loss of business rules. When we abstract, we can also lose business rules. To be more specific, the rules we enforced on the data model before abstraction now need to be enforced through other means, such as through programming code. If we wanted to enforce that a Contact must have one and only one Email Address, for example, we can no longer enforce this rule through the abstracted data model in Figure 16.24. In PowerDesigner, we can document the rule as a Business Rule, and the Business Rule can be linked to the abstract entity and/or attributes, whichever makes the most sense.

· Additional development complexity. Abstracting requires sophisticated development techniques to turn columns into rows, when loading an abstract structure, or to turn rows back into columns, when populating a structure from an abstract source. Imagine the work to populate the data elements in Communication Medium from a source data element called Email that only contains the email address. It would be much easier for a developer to load a data element called Email into a data element called Email Address. The code would be simple and it would be very fast to load.

So, although abstraction provides flexibility to an application, it does come with a cost. It makes the most sense to use abstraction when the modeler or analyst anticipates additional types of something coming in the near future. For example, additional types of communication mediums, such as text messaging, would be a good justification for using the Communication Medium structure in Figure 16.24. Make sure you only abstract where it makes sense.

Abstract data models must be subject to the same rigorous change control as any other data model. For example, code values such as Communication Medium Type Code must be managed in the Logical Data Model, not just in the Physical Data Model. They are not ‘just data’ that can be added or removed without affecting the underlying model.

Consider using an ASAM to show the non-abstract view of the model. Use this model to agree on exactly what the data requirements are for a proposed change, and then consider the changes to the abstract Logical Data Model.

For a comprehensive review of four possible levels of abstraction for data models, see (Len Silverston, 2009). Chapter 1 introduces the four levels of abstraction, and further chapters describe how they can be applied in different scenarios.

Dimensional Modeling FAQ

In a previous section of this chapter, we discussed the dimensional logical data model. There are quite a few terms and guidelines in dimensional modeling that differ from relational modeling. This section covers these terms and guidelines in the form of Frequently Asked Questions (FAQ).

Recall the dimensional model from Figure 16.3, repeated below as Figure 16.25, as we go through each of these questions.

Figure 16.25 Dimensional logical data model of ice cream

Should you always model the lowest level of detail available in the business? The lowest level of detail for the meter is known as the grain. The grain in Figure 16.25, for example, is Ice Cream Container and Date. Do you design exactly for the questions the business needs answered, or do you add a more detailed grain in anticipation of more detailed questions with the same measures? My belief is that it is OK to provide the richer grain as long as it does not impact performance (e.g. Will it take a lot longer to return results at a date level instead of at a month level?), scope-creep (e.g. Will the project take another month to finish because of the extra grain?), data quality (e.g. Will the business be responsible for checking the quality of additional data?), or user-friendliness (Is the dimensional model no longer easy to understand?).

Can you relate dimensions to each other? Not on a dimensional logical data model. All relationships from dimensions must go through the meter. That is, you can never have relationships between different dimensions. In the dimensional model from Figure 16.25, you cannot show, for example, the rule capturing which Ice Cream Containers are available in a given Year. That is, we cannot create a relationship between Ice Cream Container and Year.

Can you have optional cardinality on the relationship lines connecting dimension levels? No, don’t leave navigation paths empty. It is a good practice to avoid null (i.e. empty) foreign keys on a dimensional model, because nulls can be interpreted differently by different databases and reporting tools (and users!). So to avoid unnecessary confusion and data discrepancies, make sure every hierarchy level is populated. If a specific Product does not roll up to a specific Product Line, for example, using a default value instead of leaving the foreign key in Product empty will minimize reporting errors and misunderstandings.

Can you combine relational and dimensional modeling techniques on the same logical model? No, not on the same logical model. A logical model must be relational OR dimensional; it cannot have properties of both. When you get to the physical model, one solution would be to combine them within the same schema. There are pros and cons, though (like everything in the physical). The pros include meeting multiple needs within the same structure, while cons include creating reporting complexity, such as loops. (A loop is when there is more than one navigation path to arrive at the meter.)

What is a conformed dimension? A conformed dimension is one that is shared across the business intelligence environment. Customer, Account, Employee, Product, Time, and Geography are examples of conformed dimensions. The term was made popular by Ralph Kimball and requires the modeler to design the conformed dimension with a much broader perspective than just the requirements for a single data mart. Conformed dimensions allow the navigator to ask questions that cross multiple marts. For example, a user could navigate from a Promotions data mart directly to a Sales data mart to track the impact of certain promotions. Most dimensions need to be conforming. In fact, it is rare to come across dimensions that will only ever be relevant for a single data mart.

What is a factless fact? The meter on a dimensional logical data model, is called a fact table on a dimensional physical data model. A fact table that does not contain any facts (i.e. measures) is called a factless fact (good name, huh?). Factless facts count events by summing relationship occurrences between the dimensions. For example, a fact table called Attendance contains no measures. It links to the Student, Course, and Semester dimensions with the goal of counting relationship occurrences of how many students take a particular course in a particular semester.

Logical Data Models in PowerDesigner

If you have worked through the earlier chapters, you have already tried all the PowerDesigner techniques you need to create a Logical Data Model:

Chapter 10	General ‘how-to’ knowledge
Chapter 11	Creating Entities
Chapter 12	Creating Data Elements
Chapter 13	Creating Relationships
Chapter 14	Creating Keys
Chapter 15	Creating Subject Area Models

Always use a PowerDesigner LDM to create a Logical Data Model.

LDM Settings

Before you add content to your model, we suggest that you set the options listed in Table 16.7. Remember to click on <Set As Default> within the model options and display preferences.

Table 16.7 LDM settings

Category	Settings
Model Options – Model Settings	Choose the notation required by your modeling standards. Enable links to requirements (if you intend to create a Requirements model).
Model Options – Migration Settings	Migrate Attribute Properties – select all options Foreign Attribute Name – set to “%PARENT% %ATTRIBUTE” (that’s just one space in the middle)
Model Options – Naming Convention	Enable Glossary for auto completion and compliance checking (if you use a Glossary).
Display Preferences – General Settings	Enable the following: Show Bridges at Intersections Auto Link Routing Snap to Grid Enable Word Wrapping If you have a preference for the color of lines and symbol fill, or for the use of symbol shadows, set them here.
Display Preferences – Entity	Hide all properties, except the list of attributes. Show the Stereotype property if you use it. Add the Description property to the list of available properties.
Display Preferences – Relationship	Hide all properties, except the name or the role names, depending on your approach to naming relationships (see below). Set the line style to ‘rounded right-angles’.
Display Preferences – Inheritance	Hide all properties.
Display Preferences – Inheritance Link	Set the line style to ‘rounded right-angles’.
Display Preferences – Traceability Link	Set the line style to ‘rounded right-angles’.

Approaches for Creating a LDM

You can use a variety of techniques to create a LDM in PowerDesigner. Which approach is appropriate for you will depend on the models and information you have available.

· Create from scratch

· Generate a LDM from a SAM (CDM) or another LDM – see Exercise 19

· Generate a LDM from a PDM

· Copy or replicate entities, etc. from other LDMs, including reference models held in the Library

· Use the Excel Import facility to create or update entities in your LDM. This is a great way of making effective use of existing documentation.

If you want to create a LDM to describe existing XML-based data structures, you will need a slightly more complex approach, using an intermediate model. In the steps listed below, we suggest you use a CDM as the intermediate model, but you could also use a PDM or Object-oriented class model if that’s useful to you:

Import the XML Schema into a new XML Model (XSM)
Generate a CDM from the XSM
Generate a LDM from the CDM.

To create new LDM entities based on LDM entities in PowerDesigner models, copy them from the existing model, and paste them into the new model. To create LDM entities based on existing CDM entities, open the CDM, and choose Generate Logical Data Model from the Tools menu.

Presenting the LDM

In order to achieve the potential benefits of a LDM, it must communicate concepts clearly. Look at the figures in this chapter and in Chapter 15, and you will see that we have taken more than one approach regarding the content of symbols – the content will need to suit the audience. As this is a monochrome book, we have not been able to use color as part of the message, but you can use color very effectively in a LDM, and even continue the color scheme to other related models.

Be prepared to create more than one diagram in your LDM, if you need to communicate your message to different audiences. For layout and formatting tips, refer to Chapter 4 in Data Modeling for the Business: A Handbook for Aligning the Business with IT using High-Level Data Models, ISBN 9780977140077.

Linking the LDM to Other Data Models

Linking a LDM to other related models is essential for impact analysis. The most important models to link to are:

1. Subject Area Models

o See Chapter 15 for a discussion of how to create and maintain these links

2. Reference models that shortcut or replica entities originate in

o Links are automatically maintained by PowerDesigner when you create shortcuts or replicas

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Third Normal Form (3NF)

Create new playlist

Sign In

Sign Up

Third Normal Form (3NF)

Abstraction Explained

Dimensional Modeling FAQ

Logical Data Models in PowerDesigner

LDM Settings

Approaches for Creating a LDM

Presenting the LDM

Linking the LDM to Other Data Models

Table of Contents for
Third Normal Form (3NF)