MANAGING REDUNDANCY

The fact that the design of Example 12 from the previous section is redundant is clearly shown by the fact that the specified equality dependency holds (constraint C12). And there are, at least in principle, four basic approaches to dealing with the kind of redundancy illustrated by that example:

  1. Raw design only

  2. Declare the constraint

  3. Use a view

  4. Use a snapshot

Let’s take a closer look.

1. Raw Design Only

This is perhaps the approach most likely to be encountered in practice, given the limited functionality provided by most of today’s DBMS implementations. The idea is simply that:

  1. Relvars PAYMENTS and TOTALS are defined exactly as shown in the previous section.

  2. Constraint C12 is not declared to the DBMS.

  3. Maintaining the derived data is the user’s responsibility one hundred percent. (Or some user’s responsibility, at any rate; the maintenance might be done by means of a triggered procedure, but some user still has to write the code for that procedure.)[167]

In effect, this approach trades off (a) the extra work involved on the part of the user—or some user, at any rate—in executing certain updates (as well as the associated performance hit) against (b) the improved performance obtained when executing certain queries. But there are no guarantees; if the user makes a mistake during some update that (in effect) causes Constraint C12 to be violated, well, tough.

2. Declare the Constraint

In this approach Constraint C12 is explicitly declared to the DBMS and the DBMS takes the responsibility for enforcing it. Maintaining the derived data is still the user’s responsibility, though, exactly as it was under the previous approach. What’s more, if the user carries out this task reliably and correctly, the constraint checking will never fail, and it will thus, in effect, constitute pure overhead on the user’s updates. But we can’t dispense with the constraint, precisely because we need the system to check that the user is carrying out the maintenance task reliably and correctly.

3. Use a View

Clearly it would be better if, instead of simply declaring the constraint, we could actually inform the system of the rule by which the derived data is defined and have the system perform the derivation process automatically. And we can; that’s exactly what the view mechanism does. To be specific, we can replace the base relvar TOTALS by a view (or “virtual relvar”) of the same name, thus:

     VAR TOTALS VIRTUAL   /* Tutorial D syntax for defining a view */
       ( SUMMARIZE PAYMENTS BY { CUSTNO } :
                 { TOTAL := SUM ( AMOUNT ) } ) ;

Now the user no longer has to worry about maintaining the derived data; moreover, there’s now no way that Constraint C12 can possibly be violated, and there’s no need even to state it any more, except perhaps informally (as a means of telling the user the semantics of the view, perhaps). Note, however, that the user does have to be explicitly told not to try to maintain the totals! This fact doesn’t mean the user has to be told that relvar TOTALS is a view, though; it just means the user has to be told that the maintenance task will effectively be performed by the system.

4. Use a Snapshot

The drawback to the view solution, however, is that the derivation process is performed every time the view is referenced (even if no updates have been done since the last time it was referenced). Thus, if the object of the exercise is in to do the derivation work at update time in order to improve subsequent query performance, the view solution is clearly inadequate. In that case, we should use a snapshot instead of a view:

     VAR TOTALS SNAPSHOT
       ( SUMMARIZE PAYMENTS BY { CUSTNO } :
                 { TOTAL := SUM ( AMOUNT ) } )
         REFRESH ON EVERY UPDATE ;

The snapshot concept has its origins in a paper by Michel Adiba.[168] Basically, snapshots, like views, are derived relvars; unlike views, however, they’re real, not virtual—that is, they’re represented not just by their definition in terms of other relvars, but also (at least conceptually) by their own separately materialized copy of the data. In other words, defining a snapshot is much like executing a query, except that:

  1. The result of the query is kept in the database under the specified name (TOTALS in the example) as a read-only relvar (read-only, that is, apart from the periodic refresh—see point b. immediately following).

  2. Periodically (ON EVERY UPDATE in the example) the snapshot is refreshed—that is, its current value is discarded, the query is executed again, and the result of that new execution becomes the new snapshot value.

    The general form of the REFRESH clause is

         REFRESH EVERY <now and then>

where <now and then> might be, for example, MONTH or WEEK or DAY or HOUR or n MINUTES or MONDAY or WEEKDAY (and so on). In particular, the specification REFRESH [ON] EVERY UPDATE means the snapshot is kept permanently in synch with the relvar(s) from which it is derived—which is presumably just what we want, in the case of Example 12.

Now, in this section so far I’ve concentrated on Example 12 and “derived data.” However, the fact is that all forms of redundancy can be thought of as derived data: If x is redundant, then by definition x can be derived from something else in the database. (Limiting use of the term derived data to the kind of situation illustrated by Example 12 is thus misleading, and not recommended.) It follows that the foregoing analysis—in particular, the four different approaches to dealing with derived data—can be generalized to apply to all kinds of redundancy, at least in principle. Note in particular that the third and fourth of those approaches, using views and snapshots respectively, both constitute examples of what’s sometimes called controlled redundancy. Redundancy is said to be controlled if it does exist (and the user is aware of it), but the task of “propagating updates” to ensure that it never leads to any inconsistencies is managed by the system, not the user. Uncontrolled redundancy can be a problem, but controlled redundancy shouldn’t be. In fact, I want to go further—I want to say that while it’s probably impossible, and maybe not even desirable, to eliminate redundancy one hundred percent, any redundancy that isn’t eliminated ought at least to be controlled. In particular, we need support for snapshots. (Fortunately, many commercial products do now support snapshots, albeit under the deprecated name materialized views.)



[167] Note in particular that relvar TOTALS ought never to be updated at all, except for the updates that are needed to keep the two relvars “in synch,” as it were. (As a matter of fact, an analogous observation applies to the other three approaches as well, mutatis mutandis.)

[168] Michel Adiba: “Derived Relations: A Unified Mechanism for Views, Snapshots, and Distributed Data,” Proc. 1981 Int. Conf. on Very Large Data Bases, Cannes, France (September 1981). See also the earlier version “Database Snapshots,” by Michel E. Adiba and Bruce G. Lindsay, IBM Research Report RJ2772 (March 7th, 1980).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset