Chapter 16

Relational Database Design

This chapter summarizes my approach to database design. I start with a UML model of conceptual and logical intent and use that as the basis for preparing an IDEF1X model. Given an IDEF1X model, modern tools can readily generate SQL code to create the database structure. As a matter of software engineering discipline, I recommend a uniform policy for database design and deviate from the recommendations that follow only for good cause (such as demanding performance).

16.1 Mapping: Entity Types

As Figure 16.1 shows, each entity type normally maps to a table. Each attribute becomes a column in the table. There may be additional columns for existence-based identity, buried relationships, and generalization discriminators (to be explained).

Figure 16.1

Figure showing entity type: Mapping. Map each entity type to a table.

Entity type: Mapping. Map each entity type to a table.

Two alternative mappings are sometimes useful.

  • Fragmentation. For distributed databases it may be helpful to split an entity type into vertical or horizontal fragments. A vertical split maps attributes to columns in different tables. A horizontal split apportions records across multiple tables with identical columns.
  • Elided table. If an entity type has no attributes other than a primary key, it may not be mapped to a table and simply omitted. However, such an optimization seldom boosts performance. I discourage this option because it leads to an irregular schema. I prefer a strict correspondence between UML and IDEF1X models as a matter of software engineering rigor.

16.2 Mapping: Non-Qualified Relationships

Relationship mappings mostly depend on the multiplicity.

  • Many-to-many relationship. Map a many-to-many relationship to a distinct table (Figure 16.2). The primary key of the relationship combines the primary keys of the entity types. Role names become part of foreign key attribute names.

  • Simple one-to-many relationship. Bury a foreign key in the “many” table (Figure 16.3). Role names become part of foreign key attribute names.

  • Simple optional-to-exactly-one relationship. Bury a foreign key in the “optional” table (Figure 16.4). Role names become part of foreign key attribute names. The foreign key is unique in the buried table.

  • Other simple one-to-one relationship. Bury a foreign key in either table.
  • Relationship with attributes. Regardless of the multiplicity, map the relationship to a distinct table. Add relationship attributes to the distinct table.
  • Aggregation and composition. Follow the same mappings as the underlying relationship.
  • Ordered relationship. Use the same mapping as without ordering. Add a sequence number attribute and define a uniqueness constraint on the source entity type plus the sequence number (Figure 16.5).

  • Ternary and n-ary relationships. These seldom occur and many database design tools do not support them. Where possible, restate them as binary relationships. Otherwise promote them to entity types and define the appropriate uniqueness constraints.

Figure 16.2

Figure showing many-to-many relationship: Mapping. Map a many-to-many relationship to a distinct table.

Many-to-many relationship: Mapping. Map a many-to-many relationship to a distinct table.

Figure 16.3

Figure showing simple one-to-many relationship: Mapping. Bury a foreign key in the “many” table.

Simple one-to-many relationship: Mapping. Bury a foreign key in the “many” table.

Figure 16.4

Figure showing simple optional-to-exactly-one relationship: Mapping. Bury a foreign key in the “optional” table.

Simple optional-to-exactly-one relationship: Mapping. Bury a foreign key in the “optional” table.

Figure 16.5

Figure showing ordered relationship: Mapping. Use the same mapping as without ordering. The source entity type plus a sequence number is unique.

Ordered relationship: Mapping. Use the same mapping as without ordering. The source entity type plus a sequence number is unique.

Sometimes it is desirable to disregard the recommended mappings and promote simple one-to-one and one-to-many relationships to distinct tables. For example, Chapter 14 had a separate table for the relationship between Entity and Icon even though it was one-to-many (as well as for the relationships between Model-Diagram, Port-Tab, and Connection-Line). It would have been correct to bury the relationships, but from an architectural point of view I wanted to isolate dependencies between subject areas.

16.3 Mapping Qualified Relationships

There are four possible situations for qualified relationships.

  • One-to-optional after qualification. This is the most common situation (Figure 16.6). The underlying relationship is one-to-many without the qualifier and you should bury the source entity type key and the qualifier in the “many” table. The combination of the source entity type plus the qualifier is unique.

  • Optional-to-optional after qualification. This situation is also common (Figure 16.7). Bury the source entity type key and the qualifier in the table that is “many” without qualification. In the example, one record (the root of the tree has a null parentID) keeps the source entity type plus the qualifier from being unique. (As an alternative, you can promote the relationship to a table and then the qualified relationship has a unique key.)

  • Many-to-optional after qualification. These qualified relationships seldom occur (Figure 16.8). Promote the relationship to a table with a primary key of the source entity type plus the qualifier. Note that the combination of the related entity types (assemblyID and componentID in Figure 16.8) need not be unique; the same component part may be used multiple times for the same assembly. For example, a wiper may be used for the left windshield, right windshield, and rear window.

  • Optional-to-many after qualification. These qualified relationships also seldom occur (Figure 16.9). Bury the source entity type key and the qualifier in the “many” table. In the example the “ordered” constraint yields an alternate key — the combination of the source entity type plus the qualifier plus a sequence number is unique. If “ordered” was omitted from the model, there would still be a buried source entity type key plus the qualifier, but no alternate key constraint.

Figure 16.6

Figure showing qualified relationship, one-to-optional: Mapping. Bury fields in the “many” table. The source entity type plus the qualifier is unique.

Qualified relationship, one-to-optional: Mapping. Bury fields in the “many” table. The source entity type plus the qualifier is unique.

Figure 16.7

Figure showing qualified relationship, optional-to-optional: Mapping. Bury fields in the “many” table. The source entity type plus the qualifier is not unique.

Qualified relationship, optional-to-optional: Mapping. Bury fields in the “many” table. The source entity type plus the qualifier is not unique.

Figure 16.8

Figure showing qualified relationship, many-to-optional: Mapping. Promote the relationship to a table with a primary key of the source entity type plus the qualifier.

Qualified relationship, many-to-optional: Mapping. Promote the relationship to a table with a primary key of the source entity type plus the qualifier.

Figure 16.9

Figure showing qualified relationship, optional-to-many: Mapping. Bury fields in the “many” table. The source entity type plus the qualifier alone is not unique.

Qualified relationship, optional-to-many: Mapping. Bury fields in the “many” table. The source entity type plus the qualifier alone is not unique.

16.4 Mapping: Generalizations

Generalization organizes entity types by their similarities and differences. This book only considers single inheritance—a generalization for which a subtype has one supertype. (Chapter 8 recommends that data models avoid multiple inheritance—a generalization for which a subtype has multiple supertypes.)

As Figure 16.10 shows, I recommend that you map the supertype and each subtype to a table. The tables share common primary key values. The discriminator indicates the appropriate subtype table for each supertype record. Views can be helpful for consolidating the data of an entity across generalization levels.

Figure 16.10

Figure showing generalization (single inheritance): Mapping: Map the supertype and each subtype to a table.

Generalization (single inheritance): Mapping: Map the supertype and each subtype to a table.

Relational databases only partially support generalization. Relational databases cannot express the generalization partition, that each supertype record is further described by exactly one subtype. Also referential integrity is a one-way mechanism and cannot enforce the two-way correspondence between records of a supertype and a subtype. (Section 16.6 elaborates.) Applications must compensate with additional programming.

Several alternative mappings are sometimes helpful for generalization.

  • Elimination. You can optimize away tables that have no attributes other than a primary key. This reduces the number of tables but provides an irregular implementation.
  • Push supertype attributes down. You can eliminate the supertype table and duplicate supertype attributes for each subtype. This has the advantage of describing each entity in one table. However, it causes redundancy of structure and you may need to search multiple subtype tables to find an entity.
  • Push subtype attributes up. As a third alternative, you can eliminate the subtype tables and store all subtype attributes in the supertype table. Each record populates the relevant columns. This describes each entity in one table, but violates third normal form.
  • Generalization table. Use separate tables for the supertype and the subtypes and implement the generalization itself as a table. This can be helpful if you are merging two databases, one of which has the supertype and the other which has subtypes.

16.5 Design Identity

Chapter 11 discussed intrinsic identity — the ability to start from outside a database and find data with fields that have application meaning. In contrast, design identity is the ability to find data within a database. Databases implement design identity with primary keys. A primary key is a candidate key that is chosen for internal (foreign key) references. A table normally has a primary key, and has at most one primary key.

There are several approaches to implementing primary keys. Of these approaches I favor existence-based identity and existence-based identity + lookups.

  • Existence-based identity. An artificial number (called an ID) is the primary key of each entity table (Figure 16.11). The major DBMS products can generate such numbers; examples include Oracle sequences and the SQL Server identity property. With existence-based identity all primary keys are single attribute, small, and uniform in size. One of the few downsides is that the IDs can complicate debugging. (When debugging you have to dereference IDs to see meaningful fields.)

  • Existence-based identity + lookups. Another option (Figure 16.12) is to use a mnemonic abbreviation for lookup tables, and existence-based identity for everything else. “Lookup” data means metadata such as AccountType that is often implemented as a pick list in a user interface. Lookup tables have a small number of records (usually no more than tens of records) that are seldom updated.

  • Value-based identity. A combination of application attributes identify each entity. In Figure 16.13 names identify Region, Bank, and AccountType. Account is identified by bankName combined with accountNum. An advantage of value-based identity is that primary keys have intrinsic meaning.

    However primary key values can be difficult to change — a change to a primary key may require propagation to many foreign keys. Some entities lack natural real-world identifiers. Also primary keys can become lengthy, from the foreign keys of a series of related tables as Figure 16.14 illustrates. Figure 16.15 shows an awkward situation for Transaction. With value-based identity and referential integrity, the two paths lead to two copies of bankID and accountNumber. With value-based identity developers must abandon referential integrity to have a single copy of the fields.

  • Hybrid identity. A schema can combine existence-based identity with value-based identity. In Figure 16.16 Region, Bank, and AccountType have artificial identity and Account has identity derived from a bank reference combined with an account number.

  • Propagated identity. Identity can also propagate from one entity type to another. In Figure 16.17 the primary key of Exercise is also the primary key for Answer. (A book can have many exercises; an exercise may have an answer.) This is seldom a good idea as there is a “leakage” of information from one entity type to the other.

Figure 16.11

Figure showing existence-based identity (recommended): Example. A system-generated field is the primary key of the table for each entity.

Existence-based identity (recommended): Example. A system-generated field is the primary key of the table for each entity.

Figure 16.12

Figure showing existence-based + lookups (recommended): Example. Another option is to use a mnemonic abbreviation for lookups and existence-based identity for everything else.

Existence-based + lookups (recommended): Example. Another option is to use a mnemonic abbreviation for lookups and existence-based identity for everything else.

Figure 16.13

Figure showing value-based identity (discouraged): Example. A combination of application attributes identify each entity.

Value-based identity (discouraged): Example. A combination of application attributes identify each entity.

Figure 16.14

Figure showing value-based identity: Flaw. A sequence of foreign keys can lead to lengthy primary keys.

Value-based identity: Flaw. A sequence of foreign keys can lead to lengthy primary keys.

Figure 16.15

Figure showing value-based identity: Flaw. The two paths lead to two copies of bankID and accountNumber for Transaction. You must abandon referential integrity to have a single copy.

Value-based identity: Flaw. The two paths lead to two copies of bankID and accountNumber for Transaction. You must abandon referential integrity to have a single copy.

Figure 16.16

Figure showing hybrid identity (discouraged): Example. A schema can combine existence-based identity with value-based identity.

Hybrid identity (discouraged): Example. A schema can combine existence-based identity with value-based identity.

Figure 16.17

Figure showing propagated identity (discouraged): Example. Identity can also propagate from one entity type to another.

Propagated identity (discouraged): Example. Identity can also propagate from one entity type to another.

Table 16.1 summarizes the approaches to design identity and their trade-offs.

Table 16.1

Summary of Approaches to Design Identity

Identity approach

Definition

Advantages

Drawbacks

Recommendation

Existence-based

An artificial number is the primary key of each entity.

  • PKs are single attribute, small, and uniform in size.
  • Handles entities without real-world identifiers.
  • IDs can complicate debugging.

A good choice

Existence-based + lookups

A mnemonic for lookups and existence-based for everything else.

  • PKs are single attribute, small, and uniform in size.
  • Handles entities without real-world identifiers.
  • Simpler debugging than with existence-based identity alone.

A good choice

Value-based

A combination of application attributes identifies each entity.

  • PKs have intrinsic meaning.
  • Propagated PK values are difficult to change.
  • Some entities lack real-world identifiers.
  • Can have multiple paths for the same FK.

Seldom a good idea

Hybrid

Schema combines existence-based and value-based identity.

  • Irregular approach that lacks rigor.

A poor option

Propagated

PK of an entity comes from a relationship.

  • Awkward dependency between entities.

A poor option

Note: I recommend the use of existence-based identity or existence-based + lookups.

16.6 Referential Integrity

Once the IDEF1X model is in place, you should define referential integrity actions to enforce the model’s meaning. Referential integrity is a database mechanism that ensures that references to data really exist and that there are no dangling foreign keys. Normally you should let a DBMS enforce referential integrity constraints rather than write custom application code. If you use existence-based identity, there are no primary key updates to propagate and referential integrity actions are only needed for deletions.

Neither the UML model nor the IDEF1X model specify referential integrity precisely, so you will have to rely on your application understanding. I recommend the following guidelines for deletions.

  • Generalization. Cascade deletions for foreign keys that arise from generalization. For example, in Figure 16.18 OwnedAsset is a subtype of Asset and ownedAssetID references assetID. Upon deletion, an Asset record and an OwnedAsset record should both be deleted. Given separate supertype and subtype tables, there should be a on delete cascade clause for each subtype reference to the supertype.

    A relational DBMS can propagate deletion downward from the supertype to the subtypes. However, a relational DBMS cannot propagate deletion upward from a subtype toward the supertype. For example, the deletion of an Asset record could cause the deletion of the corresponding OwnedAsset. Then the deletion of an OwnedAsset could cause deletion of its RentedAssets. But deletion of a RentedAsset record cannot propagate upward to Asset. Consequently, it is necessary to first delete the Asset records that are RentedAssets and then delete the Asset record for the OwnedAsset.

    Some organizations forbid cascaded deletions as a matter of policy (fearful of inadvertent errors). In that case there will have to be additional programming code to propagate a deletion from a supertype record to descendant subtype records.

  • Buried relationship, minimum multiplicity of zero. There are two choices — either set the foreign key to null or forbid deletion. The choice depends on a model’s meaning. For example in Figure 10.23, upon deletion of a Facility the references in the corresponding Locations should be set to null. In contrast, it is appropriate to forbid the deletion of a Document with child Documents via DocumentFlow (Figure 10.15).
  • Buried relationship, minimum multiplicity of one. Forbid the deletion or cascade the effect of a deletion. For example, in Figure 10.1 it should not be possible to delete an AccountType that is referenced by Accounts. In Figure 10.5 the deletion of an Address should cascade to deletion of its AddressRoles (AddressRoles are clearly secondary to Address).
  • Relationship table. Cascade deletions to the records of a relationship table or forbid the deletions. For example, in Figure 10.5 deletion of an Actor could lead to deletion of the AddressRole_Actor records (thinking that the relationship records are incidental to an Actor). It would be reasonable to forbid deletion of an AddressRole with dependent AddressRole_Actor records (to avoid accidentally deleting important Actor data).

Figure 16.18

Figure showing referential integrity for generalization. A relational DBMS cannot propagate deletion upward from the subtype toward the supertype.

Referential integrity for generalization. A relational DBMS cannot propagate deletion upward from the subtype toward the supertype.

16.7 Miscellaneous Database Constraints

SQL has powerful constraint mechanisms that are part of the language. As much as possible, it is desirable to place declarative constraints in the database rather than write imperative constraints via programming code. The not null clause enforces that a column of a table must have a value. The previous section discussed referential integrity which ensures that there are no dangling referents. Unique indexes (next section) can enforce candidate keys. In addition SQL has triggers and general constraints.

16.7.1 SQL Triggers

A trigger performs a database command upon the occurrence of a specified event and satisfaction of a condition. [Ehnasri-2006] Although it is a dangerous practice, triggers can be used to enforce database constraints. The concern is that careless use of triggers can lead to explosions of database activity — one trigger fires, causing other triggers to fire, leading to an extensive cascade. One trigger in isolation is straightforward to understand. However a database with numerous triggers can be inscrutable.

It is especially important not to use SQL triggers to implement referential integrity. This was done with some of the old DBMS products of the past. Modern SQL has declarative referential integrity that is well understood and efficient — you should use it. Triggers are several orders of magnitude slower for executing referential integrity and should not be used for that purpose.

A proper use of triggers is for propagating data — to update related applications, to synchronize distributed databases, or to feed data warehouses. Triggers can also be helpful for keeping derived data consistent with its underlying base data.

16.7.2 General SQL Constraints

SQL also supports general constraints with the check constraint. Models imply some of these constraints. Others are details that are lacking from the model and rely on your application understanding.

The purpose of the generalization discriminator is to indicate which subtype record elaborates each supertype record. Accordingly a discriminator must be an enumeration with one value for each of the subtypes. For example, in Figure 16.18 assetDiscrim is an enumeration with two values: RentedAsset and OwnedAsset. With SQL assetDiscrim would be stored as a string that is not null. A check constraint could enforce that the string value was in the list {‘RentedAsset’, ‘OwnedAsset’}.

SQL check constraints are also useful for enforcing domains. A SQL table has many columns each of which has a domain. A domain specifies a datatype, constraints on the data, and semantic meaning of the data. Thus the domain for UPC codes may store data as a string of digits with a specified length and have a rule to verify the check digit at the end. (See Chapter 11 for a discussion of UPC codes.) As another example, in Figure 16.18 a RentedAsset’s endTime must be greater than its startTime.

SQL check constraints can also enforce enumerations. Enumerations often arise and should be enforced by the database rather than application code. The following are enumerations: actualOrEstimate (Figure 10.7), grade (Figure 10.11), format (Figure 10.15), priority (Figure 10.37), and outcome (Figure 10.37).

16.8 Indexes

Indexes serve two purposes: enforcing uniqueness for primary and candidate keys as well as enabling fast database traversal. Most relational DBMSs create indexes as a side effect of declaring primary keys and candidate keys. I recommend that you also create an index for each foreign key that is not subsumed by a primary key or candidate key. These foreign key indexes are important because they enable the fast performance that users expect when they traverse a model. Joins often occur across relationships and across the levels of generalization hierarchies. Joins are orders of magnitude more efficient if foreign keys and primary keys have indexes.

You should incorporate foreign key indexes in your initial database design because they are straightforward to include and there is no good reason to defer them. The database administrator (DBA) may define additional indexes to fine-tune performance. The DBA may also use DBMS-specific features.

16.9 Generating SQL Code

If you have a modern tool, it is relatively easy to generate SQL code from a database design. With ERwin I pay attention to the following.

  • Domains. Define pertinent domains for the application, giving each a datatype and relevant constraints.
  • Nulls. Specify nullability. ERwin enforces that primary keys are not null. You can check the box so that candidate key fields and mandatory application fields are also not null. For flexibility, if you are unsure, you should permit a column to be null.
  • Default value. Enter a default value for the appropriate columns. ERwin adds default values to create table statements.
  • Check constraints. Enter miscellaneous constraints. I include check constraints in create table statements (instead of alter statements).
  • Keys. I check the options to include primary keys and unique (candidate) keys as part of the create table statements.
  • Referential integrity. Add referential integrity actions via relationship properties. Given the use of existence-based identity, there are no on-update clauses for foreign keys. I specify that alter statements be used to create on-delete clauses for foreign keys. (There can be problems with circular code if foreign key clauses are included with the create table statement.)
  • Indexes. Check the flag to index foreign keys. ERwin does not consider if a foreign key index is subsumed by a primary key or candidate key index. The overhead of this duplicate indexing is usually trivial.
  • Storage. You can set the initial size of each table and indicate how space should grow as records are added.

16.10 Chapter Summary

This chapter summarizes my approach to database design. I start with a UML model of conceptual and logical intent and use that as the basis for preparing an IDEF1X model. Modern tools, such as ERwin, can then generate SQL code to create the database design. Here is a summary of my preferred database design practices.

  • Entity type. Map each entity type to a table and each attribute to a column. Define a primary key for each entity type and additional unique keys as needed. Make sure all primary-key and unique-key columns are not null.
  • Many-to-many relationships. Promote each one to a table. The primary key of the relationship combines the primary keys of the entity types.
  • Simple one-to-x relationships. Bury a foreign key in the table for the x entity type. If the one-end is mandatory, then the foreign key is not null.
  • Relationship with attributes. Regardless of the multiplicity, promote each one to a table. Add relationship attributes to the table.
  • Aggregation and composition. Use the same mappings as the underlying relationship.
  • Ordered relationship. Use the same mapping as without ordering. Add a sequence number attribute and define a uniqueness constraint on the source entity type plus the sequence number.
  • Qualified relationship, one-to-optional. Bury the source entity type key and the qualifier in the “many” table. The combination of the source entity type plus the qualifier is unique.
  • Qualified relationship, optional-to-optional. Bury the source entity type key and the qualifier in the “many” table. The combination of the source entity type plus the qualifier is not unique.
  • Qualified relationship, many-to-optional. Promote the relationship to a table with a primary key of the source entity type plus the qualifier. The combination of the related entity types need not be unique.
  • Qualified relationship, optional-to-many. Bury the source entity type key and the qualifier in the “many” table. The source entity type key plus the qualifier is not unique.
  • Generalization. Create separate tables for the supertype and each subtype. With my naming protocol the primary key names vary, but an entity should have the same primary key value throughout the levels of a generalization.
  • Identity. Add an artificial number column to the table for each entity type and make it the primary key. Modern relational DBMSs can readily generate existence-based IDs. As an option it is acceptable to instead use a mnemonic abbreviation for lookup tables.
  • Referential integrity. Enforce referential integrity for every foreign key (unless there is an unusual performance issue). Specify referential integrity actions for deletion.
  • General constraints. Forego the use of triggers for constraints, but use SQL check constraints on domains and tables as needed.
  • Indexes. Make sure that every foreign key is covered by an index. These indexes are important for searching and joining tables efficiently. Add other incidental indexes as required.

Table 16.2 summarizes the recommended mapping rules.

Table 16.2

Summary of Relational DBMS Mapping Rules

Concept

Model construct

Relational DBMS construct

Entity type

Entity type

Table

Non-qualified relationship

Many-to-many

Distinct table

Simple one-to-many

Buried foreign key

Simple one-to-one

Relationship with attributes

Distinct table

Aggregation

Same as underlying relationship

Composition

Ordered relationship

Qualified relationship

One-to-optional

Buried foreign key + qualifier

Optional-to-optional

Buried foreign key + qualifier

Many-to-optional

Distinct table

Optional-to-many

Buried foreign key + qualifier

Generalization

Separate supertype and subtype tables

Bibliographic Notes

Many of the ideas in this chapter come from my consulting and database reverse engineering experiences.

[Bruce-1992] is a good reference for IDEF1X. [Elmasri-2006] is a good general database reference.

References

[Bruce-1992] Thomas A. Bruce. Designing Quality Databases with IDEF1X Information Models. New York, New York: Dorset House, 1992.

[Elmasri-2006] Ramez Elmasri and Shamkant B. Navathe. Fundamentals of Database Systems (5th Edition). Boston, Massachusetts: Addison-Wesley, 2006.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset