PowerDesigner allows you to rename ordinary foreign key attributes, but it does not allow you to rename inherited supertype attributes. In Figure 16.20, therefore, the subtype entities are both identified by the ‘Contact Id’ attribute, though we may prefer to rename them to Person Id and Company Id.

Figure 16.21 shows a simple PowerDesigner model with renamed foreign key attributes, and illustrates the benefits and a potential pitfall of this approach.

Figure 16.21 Telephone Call Model with Attribute Role Names

An important relationship that was assumed in Figure 16.1 is missing in Figure 16.20; people work for Companies. By having the data elements together in the same entity in Figure 16.19, we assumed this was the case. Can a Person work for more than one Company? Can a Company contain more than one Person? These questions need to be answered by the business.

Let’s assume the answers to these questions are that a Person can only work for one Company, but doesn’t have to work for a Company at all, and that a Company can employ many People, but doesn’t have to employ any People. The updated model is shown in Figure 16.22. Note that the foreign key attribute in the Person entity has been renamed to ‘Employing Company Id’, to reflect the relationship role.

Figure 16.22 Contact model in 2NF updated with Person Company relationship

Again, recall our summary of all three normalization levels: Every data element must provide a fact about the key, the whole key, and nothing but the key. First Normal Form (1NF) is the Every data element must provide a fact about the key part. Second Normal Form (2NF) is the whole key part, and Third Normal Form (3NF) is the nothing but the key part.

3NF requires the removal of hidden dependencies. Each data element must be directly dependent on the primary key, and not directly dependent on any other data elements within the same entity.

The data model is a communication tool. The relational logical data model communicates which data elements are facts about the primary key and only the primary key. Hidden dependencies complicate the model and make it difficult to determine how to retrieve values for each data element.

To resolve a hidden dependency, you will either need to remove the data element that is a fact about non-primary key data element(s) from the model, or you will need to create a new entity with a different primary key for the data element that is dependent on the non-primary key data element(s).

As with 1NF and 2NF, we will find ourselves asking many questions to uncover hidden dependencies. We can have another question template, such as:

Is [[insert data element name here]] a fact about any other data element in this same entity?

So these are all valid questions for the Contact entity within our contact example:             

·         Is Web Address a fact about any other data element within this same entity?

·         Is Address Line Text a fact about any other data element within this same entity?

·         Is Address City Name a fact about any other data element within this same entity?

·         Is Address State Province Code a fact about any other data element within this same entity?

·         Is Address Postal Code a fact about any other data element within this same entity?

·         Is Address Country Code a fact about any other data element within this same entity?

We learn from asking these questions that there are some hidden dependencies within the address data elements. If we know the postal code, we can determine the city, state province, and country. Therefore, by moving city, state province, and country to a different entity, as shown in Figure 16.23 our model is now in 3NF.

Figure 16.23 Contact model in 3NF

You may be wondering why we did not also move country and state province to their own entities. After all, if we know the city, we should know the state province, and if we know the state province, we should know the country. The reason we did not break these apart is because we do not have enough information to guarantee we can find the single country for a given state province or the state province for a given city. Take Halifax for example – there are towns and cities called Halifax in England and Canada, and in North Carolina and Virginia, in the United States.

Note that we needed to ask even more questions to determine the right cardinality on the relationship line between City and Contact:

·         Does a Contact have to have a City?

·         Does a City have to contain at least one Contact?

We learned from the business that no, a contact does not have to have a city. We also learned that there could be cities in which no contacts reside. Therefore, the answer to the second question is No.

You will find that the more you normalize, the more you go from applying rules sequentially to applying them in parallel. For example, instead of first applying 1NF to your model everywhere, and then when you are done applying 2NF, and so on, you will find yourself looking to apply all levels at once. This can be done by looking at each entity and making sure the primary key is correct and that it contains a minimal set of data elements, and that all data elements are facts about only the primary key.

Normalization is a mandatory technique on the relational logical data model. Abstraction is an optional technique. As mentioned earlier, abstraction brings flexibility to your data models by redefining and combining some of the data elements, entities, and relationships within the model into more generic terms.

Returning to our contact example, we may decide to abstract Email Address and the phone number structure into a more generic Communication Medium structure. Communication Medium could include any method of communicating with a contact, including email and phone numbers, as well as all future types of communication mediums such as text messaging, or anything else that comes along. This more abstract Contact data model, based on the model from Figure 16.23, is shown in Figure 16.24. The new entity symbols have a bolder line style than the others.

Table 16.6 contains some of the values in the Communication Medium structure.

Figure 16.24 Contact data model with Communication Medium

 

Table 16.6 Business card values in Communication Medium structure

Contact

Contact Id

Web Address

Address Line Text

Postal Code

123

www.stevehoberman.com

10 Main St

10021

54

findsonline.com

 

 

58

findsonline.com

 

 

42

 

 

 

14

 

58 Church Avenue

08901

 

Communication Medium Type

Communication Medium Type Code

Communication Medium Type Description

01

Telephone

02

Fax Number

03

Email Address

04

Text Messaging

 

Table 16.6 Business card values in Communication Medium structure (continued)

Communication Medium

Communication Medium Value

Communication Medium Type Code

212-555-1212

01

(973)555-1212

01

732-555-1212

01

908-333-1212

01

908-555-1212

02

554-1212

01

[email protected]

03

[email protected]

03

[email protected]

03

[email protected]

03

[email protected]

03

 

Contact Communication Medium             

Contact Id

Communication Medium Value

123

212-555-1212

54

973-555-1212

58

973-555-1212

42

732-555-1212

14

908-333-1212

14

908-555-1212

14

554-1212

123

[email protected]

54

[email protected]

58

[email protected]

42

[email protected]

14

[email protected]

 

Notice the extra flexibility we gain with abstraction. A new email address for findsonline.com, for example, leads to a new entity instance in Communication Medium and Contact Communication Medium, instead of a new data element on a model, with subsequent database and application changes. Abstraction allows for greater flexibility, but does come with a price. Actually, three high prices:

·         Loss of communication. The concepts we abstract are no longer represented explicitly on the model. That is, when we abstract, we often convert column names to entity instances. For example, Email Address is no longer a data element on the data model in Figure 16.24, but is, instead, an entity instance of Communication Medium, with a Communication Medium Type Code value of ‘03’ for ‘Email Address’. One of the main reasons we model is for communication, and abstracting can definitely hinder communication.

·         Loss of business rules. When we abstract, we can also lose business rules. To be more specific, the rules we enforced on the data model before abstraction now need to be enforced through other means, such as through programming code. If we wanted to enforce that a Contact must have one and only one Email Address, for example, we can no longer enforce this rule through the abstracted data model in Figure 16.24. In PowerDesigner, we can document the rule as a Business Rule, and the Business Rule can be linked to the abstract entity and/or attributes, whichever makes the most sense.

·         Additional development complexity. Abstracting requires sophisticated development techniques to turn columns into rows, when loading an abstract structure, or to turn rows back into columns, when populating a structure from an abstract source. Imagine the work to populate the data elements in Communication Medium from a source data element called Email that only contains the email address. It would be much easier for a developer to load a data element called Email into a data element called Email Address. The code would be simple and it would be very fast to load.

So, although abstraction provides flexibility to an application, it does come with a cost. It makes the most sense to use abstraction when the modeler or analyst anticipates additional types of something coming in the near future. For example, additional types of communication mediums, such as text messaging, would be a good justification for using the Communication Medium structure in Figure 16.24. Make sure you only abstract where it makes sense.

Abstract data models must be subject to the same rigorous change control as any other data model. For example, code values such as Communication Medium Type Code must be managed in the Logical Data Model, not just in the Physical Data Model. They are not ‘just data’ that can be added or removed without affecting the underlying model.

Consider using an ASAM to show the non-abstract view of the model. Use this model to agree on exactly what the data requirements are for a proposed change, and then consider the changes to the abstract Logical Data Model.

For a comprehensive review of four possible levels of abstraction for data models, see (Len Silverston, 2009). Chapter 1 introduces the four levels of abstraction, and further chapters describe how they can be applied in different scenarios.

In a previous section of this chapter, we discussed the dimensional logical data model. There are quite a few terms and guidelines in dimensional modeling that differ from relational modeling. This section covers these terms and guidelines in the form of Frequently Asked Questions (FAQ).

Recall the dimensional model from Figure 16.3, repeated below as Figure 16.25, as we go through each of these questions.

Figure 16.25 Dimensional logical data model of ice cream

Should you always model the lowest level of detail available in the business? The lowest level of detail for the meter is known as the grain. The grain in Figure 16.25, for example, is Ice Cream Container and Date. Do you design exactly for the questions the business needs answered, or do you add a more detailed grain in anticipation of more detailed questions with the same measures? My belief is that it is OK to provide the richer grain as long as it does not impact performance (e.g. Will it take a lot longer to return results at a date level instead of at a month level?), scope-creep (e.g. Will the project take another month to finish because of the extra grain?), data quality (e.g. Will the business be responsible for checking the quality of additional data?), or user-friendliness (Is the dimensional model no longer easy to understand?).

Can you relate dimensions to each other? Not on a dimensional logical data model. All relationships from dimensions must go through the meter. That is, you can never have relationships between different dimensions. In the dimensional model from Figure 16.25, you cannot show, for example, the rule capturing which Ice Cream Containers are available in a given Year. That is, we cannot create a relationship between Ice Cream Container and Year.

Can you have optional cardinality on the relationship lines connecting dimension levels? No, don’t leave navigation paths empty. It is a good practice to avoid null (i.e. empty) foreign keys on a dimensional model, because nulls can be interpreted differently by different databases and reporting tools (and users!). So to avoid unnecessary confusion and data discrepancies, make sure every hierarchy level is populated. If a specific Product does not roll up to a specific Product Line, for example, using a default value instead of leaving the foreign key in Product empty will minimize reporting errors and misunderstandings.

Can you combine relational and dimensional modeling techniques on the same logical model? No, not on the same logical model. A logical model must be relational OR dimensional; it cannot have properties of both. When you get to the physical model, one solution would be to combine them within the same schema. There are pros and cons, though (like everything in the physical). The pros include meeting multiple needs within the same structure, while cons include creating reporting complexity, such as loops. (A loop is when there is more than one navigation path to arrive at the meter.)

What is a conformed dimension? A conformed dimension is one that is shared across the business intelligence environment. Customer, Account, Employee, Product, Time, and Geography are examples of conformed dimensions. The term was made popular by Ralph Kimball and requires the modeler to design the conformed dimension with a much broader perspective than just the requirements for a single data mart. Conformed dimensions allow the navigator to ask questions that cross multiple marts. For example, a user could navigate from a Promotions data mart directly to a Sales data mart to track the impact of certain promotions. Most dimensions need to be conforming. In fact, it is rare to come across dimensions that will only ever be relevant for a single data mart.

What is a factless fact? The meter on a dimensional logical data model, is called a fact table on a dimensional physical data model. A fact table that does not contain any facts (i.e. measures) is called a factless fact (good name, huh?). Factless facts count events by summing relationship occurrences between the dimensions. For example, a fact table called Attendance contains no measures. It links to the Student, Course, and Semester dimensions with the goal of counting relationship occurrences of how many students take a particular course in a particular semester.

If you have worked through the earlier chapters, you have already tried all the PowerDesigner techniques you need to create a Logical Data Model:

Chapter 10

General ‘how-to’ knowledge

Chapter 11

Creating Entities

Chapter 12

Creating Data Elements

Chapter 13

Creating Relationships

Chapter 14

Creating Keys

Chapter 15

Creating Subject Area Models

Always use a PowerDesigner LDM to create a Logical Data Model.

Before you add content to your model, we suggest that you set the options listed in Table 16.7. Remember to click on <Set As Default> within the model options and display preferences.

Table 16.7 LDM settings

Category

Settings

Model Options –
Model Settings

Choose the notation required by your modeling standards.

Enable links to requirements (if you intend to create a Requirements model).

Model Options –
Migration Settings

Migrate Attribute Properties

– select all options

Foreign Attribute Name

– set to %PARENT% %ATTRIBUTE (that’s just one space in the middle)

Model Options –
Naming Convention

Enable Glossary for auto completion and compliance checking (if you use a Glossary).

Display Preferences – General Settings

Enable the following:

Show Bridges at Intersections

Auto Link Routing

Snap to Grid

Enable Word Wrapping

If you have a preference for the color of lines and symbol fill, or for the use of symbol shadows, set them here.

Display Preferences – Entity

Hide all properties, except the list of attributes. Show the Stereotype property if you use it.

Add the Description property to the list of available properties.

Display Preferences – Relationship

Hide all properties, except the name or the role names, depending on your approach to naming relationships (see below).

Set the line style to ‘rounded right-angles’.

Display Preferences – Inheritance

Hide all properties.

Display Preferences – Inheritance Link

Set the line style to ‘rounded right-angles’.

Display Preferences – Traceability Link

Set the line style to ‘rounded right-angles’.

You can use a variety of techniques to create a LDM in PowerDesigner. Which approach is appropriate for you will depend on the models and information you have available.

·         Create from scratch

·         Generate a LDM from a SAM (CDM) or another LDM – see Exercise 19

·         Generate a LDM from a PDM

·         Copy or replicate entities, etc. from other LDMs, including reference models held in the Library

·         Use the Excel Import facility to create or update entities in your LDM. This is a great way of making effective use of existing documentation.

If you want to create a LDM to describe existing XML-based data structures, you will need a slightly more complex approach, using an intermediate model. In the steps listed below, we suggest you use a CDM as the intermediate model, but you could also use a PDM or Object-oriented class model if that’s useful to you:

  1. Import the XML Schema into a new XML Model (XSM)
  2. Generate a CDM from the XSM
  3. Generate a LDM from the CDM.

To create new LDM entities based on LDM entities in PowerDesigner models, copy them from the existing model, and paste them into the new model. To create LDM entities based on existing CDM entities, open the CDM, and choose Generate Logical Data Model from the Tools menu.

In order to achieve the potential benefits of a LDM, it must communicate concepts clearly. Look at the figures in this chapter and in Chapter 15, and you will see that we have taken more than one approach regarding the content of symbols – the content will need to suit the audience. As this is a monochrome book, we have not been able to use color as part of the message, but you can use color very effectively in a LDM, and even continue the color scheme to other related models.

Be prepared to create more than one diagram in your LDM, if you need to communicate your message to different audiences. For layout and formatting tips, refer to Chapter 4 in Data Modeling for the Business: A Handbook for Aligning the Business with IT using High-Level Data Models, ISBN 9780977140077.

Linking a LDM to other related models is essential for impact analysis. The most important models to link to are:

1.      Subject Area Models

o        See Chapter 15 for a discussion of how to create and maintain these links

2.      Reference models that shortcut or replica entities originate in

o        Links are automatically maintained by PowerDesigner when you create shortcuts or replicas

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset