CHAPTER 8

8.1 This exercise is (deliberately) a repeat in different words of Exercise 4.6. The claim is incorrect, as was shown in the answer to that earlier exercise.

8.2 Here are some of my own responses to the views expressed:

  • “Many queries are much easier to understand if the data is denormalized”: I suspect that understand here ought really to be formulate (for instance, understanding the query “Get all supplier details” has nothing to do with how the database is designed). If I’m right, then the claim might be valid. But the opposite claim is valid too!—many queries are easier to formulate if the data isn’t denormalized, as I showed in the body of the chapter.

  • The interviewer suggests that denormalization can cause integrity problems and can reduce flexibility in supporting unanticipated queries. I agree with these suggestions.

  • “Normalization, and its emphasis on elimination of redundant storage, is purely a transaction processing issue”: Normalization is about reducing redundancy, not reducing redundant storage—though I suppose the consultant might be forgiven for conflating the two, given the implementations most widely available today. But it’s certainly not “a transaction processing issue”! As I put it in Chapter 1, when we do database design in general, and when we do normalization in particular, we’re concerned primarily with what the data is, not with how it’s going to be used.

  • “When users view data, they see it in a redundant form”: Sometimes they do, sometimes they don’t. But even if they do, that’s not an argument for a denormalized design; for example, the user could be presented with a denormalized perception of the data by means of the conventional view mechanism, while the underlying database remains properly normalized.

  • “In order to transform data into a form that is useful to users ...”: This is simply a tendentious remark.

  • “[Join] is essentially a way of dynamically denormalizing data for greater ease of use”: The user might think of joins being done dynamically, but there’s no reason in general why they can’t be done statically (i.e., ahead of time)—and I believe they often would be, given a well architected DBMS.[193] It’s also untrue to suggest that the result of a join must always be denormalized. “Greater ease of use” is another tendentious remark.

  • “[Users] can’t tolerate the time and cost of joins”: Joins aren’t necessarily time consuming or expensive. Again, it depends on the implementation.

  • “To address the problem, companies replicate data in an ever increasing number of decision support databases, which represent denormalized views of the data”: This might be true, but if it is, it’s an indictment of current implementations, not an argument for denormalization as such.

8.3 First of all, surrogate keys are not the same thing as tuple IDs. For one thing (to state the obvious), surrogates identify entities and tuple IDs identify tuples, and there’s certainly nothing like a one to one correspondence between entities and tuples. (Think of derived tuples in particular—for example, tuples in the result of some query. In fact, it’s not at all clear that derived tuples will have tuple IDs anyway.) Furthermore, tuple IDs usually have performance connotations, but surrogates don’t (access to a tuple via its tuple ID is usually assumed to be fast, but no such observation applies to surrogates). Also, tuple IDs are usually concealed from the user, but surrogates mustn’t be, thanks to The Information Principle (see Exercise 2.1); in other words, it’s probably (and desirably!) not possible to store a tuple ID in a database relvar, while it certainly (and desirably) is possible to store a surrogate in a database relvar. In a nutshell: Surrogate keys have to do with logical design, tuple IDs have to do with physical design.

Are surrogate keys a good idea? Well, observe first that the relational model has nothing to say on this question; like the business of design in general, in fact, whether or not to use surrogate keys has to do with how to apply the relational model, not with the relational model as such.

That said, I have to say too that the question of whether surrogate keys are good or bad is far from straightforward. There are strong arguments on both sides: so many such, in fact, that I can’t possibly do justice to them here (though some of them are summarized in Chapter 15). For further details, see the paper “Composite Keys,” in my book Relational Database Writings 1989-1991 (Addison-Wesley, 1992). Note: The paper is called “Composite Keys” because surrogate keys are most likely to be useful in practice in situations in which existing keys, and corresponding foreign keys, are composite keys specifically.

8.4 I show solutions in SQL, just for a change. Defining the first in terms of the second (in outline):

SELECT DISTINCT EX.ENO ,
          ( SELECT PAY
            FROM   EMP AS EY
            WHERE  EY.ENO = EX.ENO
            AND    MONTH = 'Jan' ) AS JAN_PAY ,
             ...
          ( SELECT PAY
            FROM   EMP AS EY
            WHERE  EY.ENO = EX.ENO
            AND    MONTH = 'Dec' ) AS DEC_PAY
     FROM   EMP AS EX

Defining the second in terms of the first (again in outline):

     SELECT ENO , 'Jan' AS MONTH , JAN_PAY AS PAY FROM EMP
     UNION
      ...
     UNION
     SELECT ENO , 'Dec' AS MONTH , DEC_PAY AS PAY FROM EMP


[193] I have in mind here, primarily, a DBMS implemented using the facilities of The TransRelationaltm Model (and a similar remark applies to all of my uses of that phrase “well architected” throughout the present book). You can find a preliminary (and very incomplete) description of that model in my book An Introduction to Database Systems (8th edition, Addison-Wesley, 2004), and a much more comprehensive account in my book Go Faster! The TransRelationaltm Approach to DBMS Implementation (Ventus, 2002,2011).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset