CHAPTER 7

What are keys?

More than one John Doe

Which is the right Customer?

Recall by the key

There is a lot of data out there, but how do you sift through it all to find what you’re looking for? That’s where keys come in. Keys allow us to efficiently retrieve data, as well as navigate from one physical table to another. This chapter defines keys and distinguishes between the terms candidate, primary, and alternate keys. Surrogate keys, foreign keys and their importance are also explained.

Data elements identify, describe, or measure the entity instances in which they reside. There is often a need to find specific entity instances using one or more data elements. Those data element(s) that allow us to find specific entity instances are known as keys. The Library of Congress assigns an ISBN (International Standard Book Number) to every book. When the ISBN for this book, 9780977140091, is entered into many search engines and database systems, the book entity instance Data Modeling Made Simple With PowerDesigner will be returned (try it!). A particular tax identifier can help us find an organization. The key Account Code can help us find a particular account.

A candidate key is one or more data elements that uniquely identify an entity instance. Sometimes a single data element identifies an entity instance, such as ISBN for a book, or Account Code for an account. Sometimes it takes more than one data element to uniquely identify an entity instance. For example, both a Promotion Type Code and Promotion Start Date are necessary to identify a promotion. When more than one data element makes up a key, we use the term ‘composite key’. Therefore, Promotion Type Code and Promotion Start Date together are a composite candidate key for a promotion.

A candidate key has three main characteristics:

·         Unique. There cannot be duplicate values in the data in a candidate key and it cannot be empty (also known as ‘nullable’). Therefore, the number of distinct values of a candidate key must be equal to the number of distinct entity instances. If the entity Book has ISBN as its candidate key, and if there are 500 book instances, there will also be 500 unique ISBNs.

·         Non-volatile. A candidate key value on an entity instance should never change. Since a candidate key is used to find a unique entity instance, you would be unable to find that instance if you were still trying to use the value before it was changed. Changing a candidate key would also mean changing it in every other entity in which it appears with the original value.

·         Minimal. A candidate key should contain only those data elements that are needed to uniquely identify an entity instance. If four data elements are listed as the composite candidate key for an entity, but only three are really needed for uniqueness, then only those three should make up the candidate key.

Figure 7.1 contains a data model before candidate keys have been identified.

Figure 7.1 Data model before candidate keys have been identified

·         Each Student may attend one or many Classes.

·         Each Class may contain one or many Students.

Note that we have a many-to-many relationship between Student and Class that was replaced by the entity Attendance and two one-to-many relationships (more on this in our normalization section). In reading a many-to-many relationship, I have found it helpful to ignore the entity in the middle (Attendance, in this example) and just read the labels between the entities on either side. For example, each Student may attend one or many Classes and each Class may contain one or many Students. Table 7.1 contains sample values for each of these entities.

Table 7.1 Sample values for Figure 7.1

Student

Student Number

Student First Name

Student Last Name

Student Date Of Birth

SM385932

Steve

Martin

1/25/1958

EM584926

Eddie

Murphy

3/15/1971

HW742615

Henry

Winkler

2/14/1984

MM481526

Mickey

Mouse

5/10/1982

DD857111

Donald

Duck

5/10/1982

MM573483

Minnie

Mouse

4/1/1986

LR731511

Lone

Ranger

10/21/1949

EM876253

Eddie

Murphy

7/1/1992

Attendance

Attendance Date

5/10/2009

6/10/2009

7/10/2009

 

Class

Class Full Name

Class Short Name

Class Description Text

Data Modeling Fundamentals

Data Modeling 101

An introductory class covering basic data modeling concepts and principles.

Advanced Data Modeling

Data Modeling 301

A fast-paced class covering techniques such as advanced normalization and ragged hierarchies.

Tennis Basics

Tennis One

For those new to the game of tennis, learn the key aspects of the game.

Juggling

 

Learn how to keep three balls in the air at once!

Based on our definition of a candidate key and a candidate key’s characteristics of being unique, non-volatile, and minimal, what would you choose as the candidate keys for each of these entities?

For Student, Student Number appears to be a valid candidate key. There are eight students and eight distinct values for Student Number. So unlike Student First Name and Student Last Name, which can contain duplicates like Eddie Murphy, Student Number appears to be unique. Student Date Of Birth can also contain duplicates such as ‘5/10/1982’, which is the Student Date Of Birth for both Mickey Mouse and Donald Duck. However, the combination of Student First Name, Student Last Name, and Student Date Of Birth may make a valid candidate key.

For Attendance, we are currently missing a candidate key. Although the Attendance Date is unique in our sample data, we will probably need to know which student attended which class on this particular date.

For Class, on first glance it appears that any of its data elements are unique, and would therefore qualify as a candidate key. However, Juggling does not have a Class Short Name. Therefore, because Class Short Name can be empty, we cannot consider it a candidate key. In addition, one of the characteristics of a candidate key is that it is non-volatile. I know, based on my teaching experience, that class descriptions can change. Therefore, Class Description Text also needs to be ruled out as a candidate key, leaving Class Full Name as the best option for a candidate key.

PowerDesigner reserves the term ‘Key’ for the Physical Data Model, and uses the term ‘Identifier’ in the Conceptual and Logical Data Models, hence the acronyms ‘pi’ (primary identifier), ‘fi’ (foreign identifier), and ‘ai’ (alternate identifier) in entity symbols. We’ll discuss what these mean shortly.

Even though an entity may contain more than one candidate key, we can only select one candidate key to be the primary key for an entity. A primary key is the candidate key that has been chosen to be the unique identifier for an entity. An alternate key is a candidate key that although unique, was not chosen as the primary key, but still can be used to find specific entity instances.

We have only one candidate key in the Class entity, so Class Full Name becomes our primary key. We have to make a choice in Student, however, because we have two candidate keys. Which Student candidate key would you choose as the primary key?

In selecting one candidate key over another as the primary key, consider succinctness and security. Succinctness means if there are several candidate keys, choose the one with the fewest data elements or shortest in length. In terms of security, it is possible that one or more data elements within a candidate key will contain sensitive data whose viewing should be restricted. We want to avoid having sensitive data in our entity’s primary key because the primary key can propagate as a foreign key and therefore spread this sensitive data throughout our database.

Considering succinctness and security in our example, I would choose Student Number over the composite Student First Name, Student Last Name, and Student Date Of Birth. It is more succinct and contains less sensitive data.

Figure 7.2 shows our data model updated with primary and alternate keys. In PowerDesigner, primary key attributes are indicated by appending <pi> to their name in the entity symbol. Columns in alternate keys have <ai> appended to their name. If the entity has more than one alternate key, a number is appended to <ai> to differentiate between them. For example, <ai1> and <ai2>.

Attendance now has as its primary key Student Number and Class Full Name, which appear to make a valid primary key.

Figure 7.2 Data model updated with primary and alternate keys

The primary key attributes in Attendance are both inherited from other entities; Attendance is dependent on and partially identified by the other entities. In PowerDesigner this is documented by marking the relationships as ‘dependent’; this changes the relationship notation to include a triangular symbol:

So to summarize, a candidate key consists of one or more data elements that uniquely identify an entity instance. The candidate key that is selected as the best way to identify each unique record in the entity becomes the primary key. The other candidate keys become alternate keys. Keys containing more than one data element are known as composite keys.

A surrogate key is a primary key that substitutes for a natural key, which is what the business sees as the unique identifier for an entity. It has no embedded intelligence and is used by IT (and not the business) for integration or performance reasons.

Surrogate keys are useful for integration, which is an effort to create a single, consistent version of the data. Applications such as data warehouses often house data from more than one application or system. Surrogate keys enable us to bring together information about the same entity instance that is identified differently in each source system. If the same concept, such as Student or Class, exists in more than one system, there is a good chance some amount of integration will be necessary. For example, Robert Jones in system XYZ and Renee Jane in system ABC might both be identified in their respective systems as RJ. But if we tried to bring them together using RJ to link them, the data would be incorrect – we’d have Robert and Renee identified as the same person. Instead, a different, non-overlapping, surrogate key could be assigned to each of them. Similarly, if Robert Jones is identified as RJ in system XYZ and BJ in system DEF, the information about him could be consolidated under a single surrogate key value. The fact that they’re the same would need to be determined through a separate effort.

Surrogate keys are also efficient. You’ve seen that a primary key may be composed of one or more attributes of the entity. A single surrogate key is more efficient to use than having to specify three or four (or five or six) attributes to locate the single record you’re looking for.

When using a surrogate key, always make an effort to determine the natural key, and then define an alternate key on this natural key. For example, assuming a surrogate key is a more efficient primary key than Class Full Name, we can create the surrogate key Class Id for Class and define an alternate key on Class Full Name, as shown in Figure 7.3. Table 7.2 contains the values in Class.

Figure 7.3 Data model updated with surrogate key

Table 7.2 Class values updated with surrogate key

Class Id

Class Full Name

Class Short Name

Class Description Text

1

Data Modeling Fundamentals

Data Modeling 101

An introductory class covering basic data modeling concepts and principles.

2

Advanced Data Modeling

Data Modeling 301

A fast-paced class covering techniques such as advanced normalization and ragged hierarchies.

3

Tennis Basics

Tennis One

For those new to the game of tennis, learn the key aspects of the game.

4

Juggling

 

Learn how to keep three balls in the air at once!

In this example, we are using Class Full Name as a candidate key, but in the real world, it is never a good idea to use a name as a key because names can change. It is often also difficult or impossible to make sure that names are unique.

A foreign key is a data element that provides a link to another entity. A foreign key allows a database management system to navigate from one entity to another. For example, we need to know who owns an Account, so we would want to include the identifier of the customer to whom it belongs in the entity. The Customer Id in Account is the primary key of that Customer in the Customer entity. Using this foreign key back to Customer enables the database management system to navigate from a particular account or accounts, to the customer or customers that own each account. Likewise, the database can navigate from a particular customer or customers, to find all of their accounts.

A foreign key is automatically created when we define a relationship between two entities. When a relationship is created between two entities, the entity on the many side of the relationship inherits the primary key from the entity on the one side of the relationship.

PowerDesigner Conceptual Data Models do not support Foreign Keys explicitly; Foreign Keys are not necessary when describing business concepts and the relationships between them. Indeed, Primary Keys are not mandatory in abstract Conceptual Data models.

Foreign Keys (Foreign Identifiers) are automatically included in any Logical or Physical Data Model you generate from a Conceptual Data Model.

PowerDesigner allows you to choose which attributes to migrate, and which identifier to migrate them from (if you have more than one, of course).

In Figure 7.3, there are two foreign keys in the Attendance entity. The Student Number foreign key points back to a particular student in the Student entity. The Class Id foreign key points back to a particular Class in the Class entity. Table 7.3 contains a few Attendance entity instances.

Table 7.3 Attendance entity instances

Student Number

Class Id

Attendance Date

SM385932

1

5/10/2009

EM584926

1

5/10/2009

EM584926

2

6/10/2009

MM481526

2

6/10/2009

MM573483

2

6/10/2009

LR731511

3

7/10/2009

By looking at these values and recalling the sample values from Tables 13.1 and 13.2, we learn that Steve Martin and Eddie Murphy both attended the Data Modeling Fundamentals class on 5/10/2009. Eddie Murphy also attended the Advanced Data Modeling Class with Mickey and Minnie Mouse on 6/10/2009. Lone Ranger took Tennis Basics (by himself, as usual) on 7/10/2009.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset