Now, to use this data, you have to indicate the table you want to join to, which is very much an unnatural way to do a join. You can use a universally unique GUID key so that all references to the data in the table are unique, eliminating the need for the specifically specified related table name. However, I find when this method is employed if the RelatedTableName is actually used, it is far clearer to the user what is happening.
A major concern with this method is that you cannot use constraints to enforce the relationships; you need either to use triggers or to trust the middle layers to validate data values, which definitely increases the costs of implementation/testing, since you have to verify that it works in all cases, which is something we trust for constraints; even triggers are implemented in one single location.
One reason this method is employed is that it is very easy to add references to the one table. You just put the key value and table name in there, and you are done. Unfortunately, for the people who have to use this for years and years to come, it would have just been easier to spend a bit longer and do some more work, because the generic relationship means that using a constraint is not possible to validate keys, leaving open the possibility of orphaned data.
A second way to do this that is marginally better is to just include keys from all tables, like this:
CREATE TABLE JournalEntry
(
JournalEntryId <int or uniqueidentifier> PRIMARY KEY,
SalesOrderId <int or uniqueidentifier> NULL REFERERENCES
SalesOrder(SalesOrderId),
TroubleTicketId <int or uniqueidentifier> NULL REFERERENCES
TroubleTicket(TroubleTicketId),
<other columns>
);
This is better, in that now joins are clearer and the values are enforced by constraints, but now, you have one more problem (that I conveniently left out of the initial description). What if you need to store some information about the reason for the journal entry? For example, for an order, are you commenting in the journal for a cancelation notice?
Extending the design, it seems like a decent idea that one JournalEntry might relate to more than one SalesOrder or JournalEntry. So, the better idea is to model it more like Figure 8-20.
Figure 8-20. Objects linked for maximum usability/flexibility
CREATE TABLE JournalEntry
(
JournalEntryId <int or uniqueidentifier> PRIMARY KEY,
<other columns>
)
CREATE TABLE SalesOrderJournalEntry
(
JournalEntryId <int or uniqueidentifier>
REFERENCES JournalEntry(JournalId),
SalesOrderId <int or uniqueidentifier>,
REFERENCES SalesOrder(SalesOrderId),
<SalesOrderSpecificColumns>
PRIMARY KEY (JournalEntryId, SalesOrderId)
)
CREATE TABLE TroubleTicketJournalEntry
(
JournalEntryId <int or uniqueidentifier>
REFERENCES JournalEntry(JournalId),
TroubleTicketId <int or uniqueidentifier>,
REFERENCES TroubleTicket (TroubleTicketId),
<TroubleTicketSpecificColumns>
PRIMARY KEY (JournalEntryId, SalesOrderId)
)
Note that this database is far more self-documented as well, though it does make it harder to implement. You can easily find the relationships between the tables and join on them. Yes, there are a few more tables, but that can play to your benefit as well in some scenarios, but most important, you can represent any data you need to represent, in any cardinality or combination of cardinalities needed. This is the goal in almost any design.
As much as I would like to deny it, or at least find some way to avoid it, people need to have unstructured notes to store various bits and pieces of information about their data. I will confess that a large number of the systems I have created in my career included some column that allowed users to insert freeform text. In the early days, it was a varchar(256) column, then varchar(8000) or text, and now varchar(max). It is not something that you can get away from, because users need this scratchpad just slightly more than Linus needs his security blanket. And it is not such a terrible practice, to be honest. What is the harm in letting the user have a place to note some bit of information about their data? I know I personally have tons of OneNote notebooks with lots of unstructured data.
However, if given too much leeway, or too many generic buckets for text strewn about in a database, far too often what happens is that notes become a replacement for doing actual design. The notes section becomes a replacement for things that ought to be a full-blown column. Should we have a column for special dietary restrictions? Nah, just put it in the notes column. Once the users do something once and particularly find it useful, they will do it again. And they tell their buddies, “Hey, I have started using notes to indicate that the order needs processing. Saved me an hour yesterday.” And then it costs the programmers 200 hours sorting out unstructured data.
Probably the most common use of this I have seen that concerns me is contact notes. I have done this myself in the past, where I have a column that contains formatted text something like the following on a Customer table. Users can add new notes but usually are not allowed to go back and change the notes.
ContactNotes
--------------------------------------------------------------------------------
2008-01-11 – Stuart Pidd -Spoke to Fred on the phone. Said that his wangle was
broken, referencing Invoice 20001. Told him I would check and call back
tomorrow.
2008-02-15 – Stuart Pidd – Fred called back, stating his wangle was still
broken, and now it had started to dangle. Will call back tomorrow.
2008-04-12 – Norm Oliser – Stu was fired for not taking care of one of our best
customers.
--------------------------------------------------------------------------------
This generally is not the best solution to the problem, even for a very small organization. The proper solution is almost certainly to take this data that is being stored into this text column and apply the rigors of normalization to it. Clearly, in this example, you can see three “rows” of data, with at least three “columns.” So instead of having a Customer table with a ContactNotes column, implement the tables like this:
CREATE TABLE Customer
(
CustomerId int CONSTRAINT PKCustomer PRIMARY KEY
<other columns>
)
CREATE TABLE CustomerContactNotes
(
CustomerId int,
NoteTime datetime,
PRIMARY KEY (CustomerId, NoteTime),
UserId datatype, --references the User table
Notes varchar(max)
)
You might even stretch this to the model we discussed earlier with the journal entries where the notes are a generic part of the system and can refer to the customer, multiple customers, and other objects in the database. This might even link to a reminder system to remind Stu to get back to Fred, and he would not be jobless. Though one probably should have expected such out of a guy named Stu Pidd (obviously).
Even using XML to store the notes in this structured manner would be an amazing improvement. You could then determine who entered the notes, what the day was, and what the notes were, and you could fashion a UI that allowed the users to add new fields to the XML, right on the fly. What a tremendous benefit to your users and, let’s face it, to the people who have to go in and answer questions like this, “How many times have we talked to this client by phone?”
The point of this section is simply this: educate your users. Give them a place to write the random note, but teach them that when they start to use notes to store the same specific sorts of things over and over, their jobs could be easier if you gave them a place to store their values that would be searchable, repeatable, and so on. Plus, never again would you have to write queries to “mine” information from notes.
Tip SQL Server provides a tool to help search text called Full Text Search. It can be very useful for searching textual data in a manner much like a typical web search. However, it is no replacement for proper design that makes a different column and row from every single data point that the users are typically interested in.
Summary
This chapter was dedicated to expanding the way you think about tables and to giving you some common solutions to problems that are themselves common. I was careful not to get too esoteric with my topics in this chapter. The point was simply to cover some solutions that are a bit beyond the basic table structures I covered in earlier chapters but not so beyond them that the average reader would say “Bah!” to the whole chapter as a waste of time.
The following are the “good” patterns we covered:
We finished up with a section on anti-patterns and poor design practices, including some pretty heinous ones:
Of course, these lists are not exhaustive of all of the possible patterns out there that you should use or not use, respectively. The goal of this chapter was to help you see some of the common usages of objects so you can begin to put together models that follow a common pattern where it makes sense. Feedback, particularly ideas for new sections, is always desired at [email protected].