There are lots of principles and heuristics involved in designing a relational database schema—in fact, it can get quite complicated—but in practice you can usually come up with a pretty reasonable schema if you just keep breaking up a problematic table or field until you have a set of related tables that comply with the rules. The process is called DECOMPOSITION.
Eliminate Non-Scalar Values
In this sense, SCALAR means single—as opposed to compound—values, like separating the given names into separate fields. Sometimes these values should be separate fields in the table, and sometimes you’ll just need to find a way to keep people from entering the wrong stuff. The decision always has to be based on what the data means and how it’s going to be used. There aren’t any absolute answers.
Eliminate Repeating Groups
Repeating groups are almost always going to need to be decomposed into one or more additional tables.
Make sure every table has a candidate key
A CANDIDATE KEY is one or more fields that uniquely identify the table. A table might have more than one candidate key, and it not the same as a PRIMARY KEY, which is one or more fields that you choose to identify the table in relationships. An identity value is never a candidate key (although identity values are useful primary keys).
Make sure all the fields depend on the candidate key
Anything that doesn’t almost certainly belongs in a separate table.
Put On Your Thinking Hat
Fix the Ingredient and Supplier schema that we’ve been working with.
Extra Credit:
In the recipe schema on page 16, one table contains a field that violates normal form. Can you find it?
How’d You Do?
Here’s one way to do it. You might have made some slightly different decisions based on how you think the data will be used, and that’s fine. Notice that I haven’t shown identity values here, which I almost certainly would use in a real implementation. In this example, I thought it was clearer to use semantic primary keys. If you used them, that’s fine too.
Extra Credit:
In the schema on page 16, the BakersPercentage
value depends on multiple rows.