Chapter 8
An antipattern is a characterization of a common software flaw. When you find an antipattern, substitute the correction. Universal antipatterns are antipatterns that you should avoid for all applications.
An entity type has a self relationship with the same multiplicity and role names on each end. Typically this is a many-to-many self relationship. Symmetric relationships can be troublesome for programming and are always troublesome for relational databases.
There are no exceptions for relational database designs. Avoid symmetric relationships.
Promote the relationship to an entity type in its own right. The improved model not only resolves the symmetry but is often more expressive.
Consider Figure 8.1a and Figure 8.2a. RelatedContract involves two contracts but the symmetry is troublesome. If each pairing is entered once, it is not clear which contract should be first and which is second. If each pairing is entered twice, the amount of storage increases and any change requires double update. If more than two contracts are related, the situation is messier yet (for three contracts that are double stored: C1-C2, C2-C1, C1-C3, C3-C1, C2-C3, C3-C2). Furthermore, the antipattern does not require that the related contracts be different. None of this is desirable.
The improved model (Figure 8.1b and Figure 8.2b) breaks the symmetry. To find related contracts traverse as follows: start with a Contract, find the possible ContractRelationship, then traverse back to Contract (excluding the initial contract) to obtain the related Contracts. Figure 8.3 shows the corresponding SQL Server code; the code is efficient if the join fields are indexed. (The SQL code presumes existence-based identity; see Chapter 16.)
The revised model has further advantages. The coupling is no longer binary and can readily support three or more related contracts. The model could be extended to make Contract to ContractRelationship many-to-many with different relationship types. For example, one relationship type could be successor contracts (one contract replacing another). A second relationship type could be alternative contracts (several contracts being considered as alternatives for purchase).
For another example, consider the words in a dictionary (Figure 8.4). An inferior model relates word meanings directly. Also the inferior model cannot handle a group of interchangeable words. Looking in the Framemaker 8 online thesaurus, the first definition of “account” has four synonyms (chronicle, history, annals, and report). The SynonymSet supports a group of word meanings.
A model has obsolete elements (entity types, relationships, attributes). They may have been relevant in the past but are extraneous now.
It is acceptable for a model to have small amounts (no more than a few percent of the total) of dead elements. Large amounts of junk cause confusion and complicate maintenance.
Either cut the dead elements from the model or place them in isolation. For example, some commercial products have a special documentation section for deprecated database tables that will be removed in future releases.
Some databases have relic tables from past releases. It is acceptable to keep deprecated tables for a while, but eventually they should be removed. You should be suspicious of tables with zero records.
The name and documentation for a field do not indicate the kind of data that is stored.
A few user-defined fields as well as miscellaneous comments are acceptable as an extensibility mechanism. Otherwise disguised fields are seldom justified.
A relational database is supposed to be declarative. A field name should be informative and describe the data that is stored.
Disguised fields can arise in several ways.
Anonymous fields. Figure 8.5 shows an excerpt from a legacy application with anonymous address fields. Figure 8.6 shows some corresponding data. To find a city, you must search multiple fields. Worse yet, it could be difficult to distinguish the city of Chicago from Chicago street. You may need to parse a field to separate city, state, and postal code. It would be much better to put address information in distinct fields that are clearly named.
Chapter 2 presented hardcoded trees with a different entity type for each level. Such an approach can be justified for models where the structure does not vary and it is important to enforce the sequence of types.
The antipattern also involves a fixed hierarchy but one with little difference between the entity types. Such a model is brittle, permits duplicate and contradictory data, and is difficult to maintain and extend.
Sometimes the hardcoding of artificial levels is desirable for its simplicity. For example, I needed to convert bill-of-material formats for a past project. The source was a hierarchical indented list and the target was parent-child pairings. One program generated the hierarchy as output and another required the pairings as input. I did not want to program the recursive descent of a tree. Instead I hardcoded a fixed number of levels and quickly wrote a SQL query. Hardcoded levels can be acceptable for a prototype or throwaway code.
Abstract and consolidate the levels. Use one of the tree patterns to relate the levels.
Figure 8.7a shows a three-level hierarchy from a legacy application where an individual contributor has a supervisor who in turn has a manager. The limitation to three levels is arbitrary. Many questions come to mind. How should the software deal with an individual contributor who becomes a supervisor and then becomes a manager? Should there be three different records? Does the user multiply enter data, such as names, phone numbers, and addresses (omitted in the example)?
The improved model (Figure 8.7b) is simpler, more expressive, and avoids these issues. There can be an arbitrary number of management levels. An employee reports to a boss who is also an employee. The boss reports to his or her boss continuing up the reporting hierarchy. The field employeeType is an enumeration with the values of “Manager,” “Supervisor,” and “IndividualContributor.” A boss can manage many subordinates and a subordinate has at most one boss. The highest ranking employee in the database has no boss. The ‘/’ prefix is UML notation for derived data (see the Appendix for further explanation).
A model has a deep generalization. In many cases extensive taxonomies are motivated by object-oriented programming and are inadvisable.
If there is a formal standard for a taxonomy (such as for biological organisms) you should use it. Otherwise I cannot think of a justification. Normally, it is best to avoid deep generalization.
As an example, [Americazoo] presents a taxonomy for mammals, including Figure 8.8 for wolves. Each indentation is a lower level in the taxonomy. Figure 8.8 corresponds to a generalization that is seven levels deep.
As a guideline a database taxonomy should be no more than four levels deep. (A programming taxonomy should be no more than five levels deep.)
Many years ago at GE Global Research, we built a software modeling tool, called OMTool. The tool was based on a metamodel with a taxonomy that was seven levels deep. Even though the taxonomy was sound, we found it difficult to remember all the levels complicating development. In retrospect we regretted using such a deep taxonomy.
On another project we prepared a large equipment taxonomy (50 pages long!). [Blaha-2003] We understood the problem well and had access to domain experts. Nevertheless, we had trouble with modeling. The taxonomy was so extensive that it was difficult to determine where to place equipment. Also the various types of equipment had many fields and we kept discovering additional data. For this project, it would have been better to forego a hardcoded taxonomy and instead use a softcoding approach. [Blaha-2006]
Section 10.2 shows a sound taxonomy that is three levels deep.
A model has a number of free-standing entity types. From the problem understanding, it would seem that they should be related.
Some disconnected entity types can be acceptable (as a guideline, no more than 10% of the entity types). But it is suspicious when a model has many of them.
Recognize that the model is likely to be incomplete. Determine the missing relationships and add them.
Many Eclipse (www.eclipse.org) applications generate XML files for storing persistent data. An application may also have a database that developers populate apart from Eclipse. If you reverse engineer the database, the resulting model can appear to be incomplete. Ideally the database should store both the added data and the Eclipse data.
A model has one or more serious conceptual flaws. Modeling errors lead to bugs in the finished software, complicate development and maintenance, and can impair performance.
Errors in models become errors in the finished software. Errors are never acceptable, but sometimes you have to live with them, such as with legacy systems and vendor software.
Understand the flaw and how it may have come about. If possible, correct the model.
Over the years I have reverse engineered several modeling tools and inspected their internal metamodels. One would expect tool developers to have excellent models, but that is not always the case. Some data modeling tools have the deep flaw of directed relationships.
As Figure 8.9 shows, a directed relationship has “from” and “to” entity types. Using such a tool to construct Figure 8.1b, Contract would be “from” and ContractRelationship would be “to” or vice versa. This is nonsense. The model simply states that Contract and ContractRelationship are related. An improved model (Figure 8.9b) introduces the notion of a role which is the intersection of an entity type and a relationship. As a side benefit, the improved model can support ternary relationships.
For another example, I reverse engineered a vendor product for a library catalog system and found that the database implemented a linked list. There is no such thing as a “linked list” in a conceptual model of a library. Someone did not abstract properly and misguidedly put implementation concepts in a conceptual model.
A model has multiple inheritance. Multiple inheritance is a generalization for which an entity type inherits information from multiple supertypes.
Multiple inheritance is not appropriate for data models. It can be acceptable as a mechanism for programming reuse and for other kinds of models.
Avoid multiple inheritance in data models. Degrade the model if necessary. There is no clean way to implement multiple inheritance with a database. In practice I have found that multiple inheritance seldom occurs with databases and is not worth the bother.
In Figure 8.10a and Figure 8.11a FullTimeUnionEmployee is both FullTimeEmployee and UnionEmployee. The model does not show it, but an extended model could define three additional combinations: FullTimeNonUnionEmployee, PartTimeUnionEmployee, and PartTimeNonUnionEmployee. Figure 8.10b and Figure 8.11b use a workaround (others are possible) to eliminate the multiple inheritance.
A relational database is degraded to fit some other paradigm.
Such a technique is highly questionable.
Rework the model and architecture to avoid such degradation.
Many years ago, I was reverse engineering the database of a commercial product and was perplexed. The resulting model had many disconnected entity types with only a smattering of relationships. I could not understand how so much information could be missing and suspected that many relationships were disguised. I decided to cross check the schema with the hierarchical screen layout and discovered the missing relationships. The software vendor confirmed my understanding.
The product supported a fixed hierarchy of depth three. Apparently the vendor created the original product with a proprietary hierarchical database. Then the vendor migrated to client-server technology and devised an isomorphic hierarchical database using a relational database for the server. Figure 8.12 shows the structure for level 1, 2, and 3 tables. The pointer fields are hidden parent pointers.
Figure 8.13 shows a table from another application. I am not sure what the attributes mean. My best guess is that this table was being used for populating a spreadsheet.
An antipattern is a characterization of a common software flaw. As you construct models, you should be alert for antipatterns and correct them. Table 8.1 summarizes universal antipatterns—antipatterns to always avoid—along with their exceptions and resolution.
Summary of Universal Antipatterns
Antipattern name |
Observation |
Exceptions |
Resolution |
Frequency |
---|---|---|---|---|
Symmetric relationship |
A self relationship has the same multiplicity and roles on each end. |
None |
Promote the relationship to an entity type. |
Common |
Dead elements |
A model has unused elements. |
Acceptable in small amounts. |
Delete them or isolate them. |
Common |
Disguised fields |
Field names do not describe data. |
A few user-defined fields. |
Use meaningful names. |
Common |
Artificial hardcoded levels |
There is a fixed hierarchy of similar entity types. |
Use only with great caution. |
Consolidate the levels and use a tree pattern. |
Occasional |
Excessive generalization |
There is a deep generalization. |
None |
A db taxonomy should be at most four levels deep. |
Occasional |
Disconnected entity types |
A model has freestanding entity types. |
A few can be acceptable. |
Add the missing relationships. |
Occasional |
Modeling errors |
There is a serious conceptual flaw. |
None |
Fix the model. |
Occasional |
Multiple inheritance |
A model has multiple inheritance. |
Avoid for data models. |
Use a workaround. |
Seldom |
Paradigm degradation |
A relational db is degraded to some other paradigm. |
Highly questionable. |
Rework the model and architecture. |
Seldom |
[Brown-1998] discusses antipatterns for programming, architecture, and management. The book is informative, but the authors oversell the technology—antipatterns are not a panacea for the difficulties of software development. As [Brooks-1987] notes, there is no silver bullet for improving software quality. [Laplante-2006] builds on [Brown-1998] and adds further management antipatterns as well as cultural antipatterns.
Many of the examples in this chapter came from my experiences with database reverse engineering — starting with existing database structures and inferring the underlying models. [Premerlani-1994] and [Blaha-1995] present unusual database designs that were found during reverse engineering.
[Americazoo] http://www.americazoo.com/goto/index/mammals/classification.htm
[Blaha-1995] Michael Blaha and William Premerlani. Observed idiosyncrasies of relational database designs. Second Working Conference on Reverse Engineering, July 1995, Toronto, Ontario, 116—125.
[Blaha-2003] Michael Blaha. Data store models are different than data interchange models. Proceedings of the International Workshop on Meta-Models and Schemas for Reverse Engineering (ateM 2003), November 2003, Victoria, BC.
[Blaha-2006] Michael Blaha. Designing and Implementing Softcoded Values. IEEE Computer Society ReadyNote, 2006.
[Brown-1998] William J. Brown, Raphael C. Malveau, Hays W. “Skip” McCormick, and Thomas J. Mowbray. AntiPatterns: Refactoring Software, Architectures, and Projects in Crisis. New York: John Wiley & Sons, Ltd, 1998.
[Brooks-1987] Frederick P. Brooks, Jr. No silver bullet: Essence and accidents of software engineering. IEEE Computer 20, 4 (April 1987), 10-19.
[Laplante-2006] Phillip A. Laplante and Colin J. Neill. Antipatterns: Identification, Refactoring, and Management. Boca Raton, FL: Auerbach Publications, 2006.
[Premerlani-1994] William Premerlani and Michael Blaha. An approach for reverse engineering of relational databases. Communications ACM 37, 5 (May 1994), 42-49.