Chapter 9

SQL and Views

They’re concerned by adverse publicity and that I have to move more into public eye.
Problem is to define first the exact view we want to project.

—H. R. Haldeman: The Haldeman Diaries: Inside the Nixon White House (1994)

Intuitively, there are several different ways of looking at what a view is, all of which are valid and all of which can be helpful in the right circumstances:

  • A view is a virtual relvar; in other words, it’s a relvar that “looks and feels” just like a base relvar but (unlike a base relvar) doesn’t exist independently of other relvars—rather, it’s defined in terms of such other relvars.

  • A view is a derived relvar; in other words, it’s a relvar that’s explicitly derived (and known to be derived, at least by some people) from certain other relvars. Note: If you’re wondering what the difference is between a derived relvar and a virtual one (see the previous bullet item), I should explain that all virtual relvars are derived but some derived ones aren’t virtual. See the section “Views and Snapshots,” later in this chapter.

  • A view is a “window into” the relvars from which it’s derived; thus, operations on the view are to be understood as “really” being operations on those underlying relvars.

  • A view is what some writers call a “canned query” (more precisely, it’s a named relational expression).

As usual, in what follows I’ll discuss these ideas in both relational and SQL terms. But talking of SQL, let me remind you of something I said in Chapter 1: A view is a table!—or, as I would prefer to say, a relvar. SQL documentation often uses expressions like “tables and views,” thereby suggesting that tables and views are different things, but they’re not; in fact, the most important thing about a view is precisely that it’s a table (just as, in mathematics, the most important thing about, say, the union of two sets is precisely that it’s a set). So don’t fall into the common trap of thinking the term table means a base table specifically. People who fall into that trap aren’t thinking relationally, and they’re likely to make mistakes as a consequence; in fact, several such mistakes can be found in the design of the SQL language itself. Indeed, it could be argued that the very names of the operators CREATE TABLE and CREATE VIEW in SQL are and always were at least a psychological mistake, in that they tend to reinforce both (a) the idea that the term table means a base table specifically and (b) the idea that views and tables are different things. Be on the lookout for confusion in this area.

One last preliminary point: On the question of whether the database should “always” be accessed through views, see the section “SQL Column Naming” in Chapter 3 or the section “The Reliance on Attribute Names” in Chapter 6.

VIEWS ARE RELVARS

Of those informal characterizations listed above of what a view is, the following definition might appear to favor one over the rest (but those informal characterizations are all equivalent anyway, loosely speaking):

Definition: A view V is a relvar whose value at time t is the result of evaluating a certain relational expression at that time t. The expression in question (the view defining expression) is specified when V is defined and must mention at least one relvar.

The following examples (“London suppliers” and “non London suppliers”) are repeated from Chapter 8, except that now I give SQL definitions as well:

VAR LS VIRTUAL                         CREATE VIEW LS
  ( S WHERE CITY = 'London' ) ;          AS ( SELECT *
                                              FROM   S
                                              WHERE  CITY = 'London' )
                                           WITH CHECK OPTION ;

VAR NLS VIRTUAL                        CREATE VIEW NLS
  ( S WHERE CITY ≠ 'London' ) ;          AS ( SELECT *
                                              FROM   S
                                              WHERE  CITY <> 'London' )
                                            WITH CHECK OPTION ;

Note that these views are restriction views—their value at any given time is a certain restriction of the value at that time of relvar S. Some syntax issues:

  • The parentheses in the SQL examples are unnecessary but not wrong; I include them for clarity. The parentheses in the Tutorial D examples are required.

  • CREATE VIEW in SQL allows a parenthesized commalist of view column names to appear following the view name, as in this example:

    CREATE VIEW SDS ( SNAME , DOUBLE_STATUS )
      AS ( SELECT DISTINCT SNAME , 2 * STATUS
           FROM   S ) ;

    Recommendation: Don’t do this—follow the recommendations given in Chapter 3 under “SQL Column Naming” instead. For example, the foregoing view can equally well (in fact, better) be defined like this:

    CREATE VIEW SDS
      AS ( SELECT DISTINCT SNAME , 2 * STATUS AS DOUBLE_STATUS
           FROM S ) ;

    Note in particular that this latter style means we’re telling the system once instead of twice that one of the view columns is called SNAME.

  • CREATE VIEW in SQL also allows WITH CHECK OPTION to be specified if—but only if!—SQL regards the view as updatable. Recommendation: Always specify this option if possible. See the section “Update Operations” for further discussion.

The Principle of Interchangeability

Since views are relvars, essentially everything I’ve said in previous chapters regarding relvars in general applies to views in particular. Subsequent sections discuss specific aspects of this observation in detail. First, however, there’s a more fundamental point I need to explain.

Consider the example of London vs. non London suppliers again. In that example, S is a base relvar and LS and NLS are views. But it could have been the other way around—that is, we could have made LS and NLS base relvars and S a view, like this (Tutorial D only, for simplicity):

VAR LS BASE RELATION
  { SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR }
  KEY { SNO } ;

VAR NLS BASE RELATION
  { SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR }
  KEY { SNO } ;

VAR S VIRTUAL ( LS UNION NLS ) ;

Note: In order to guarantee that this design is logically equivalent to the original one, we would also have to state and enforce certain additional constraints—including in particular constraints to the effect that every CITY value in LS is London and no CITY value in NLS is— but I omit such details here for simplicity. See the sections “Views and Constraints” and “Update Operations” later for further consideration of such matters.

Be that as it may, the message of the example is that, in general, which relvars are base ones and which virtual is arbitrary (at least from a formal point of view). In the example, we could design the database in at least two different ways: ways, that is, that are logically distinct but information equivalent. (By information equivalent here, I mean the two designs represent the same information; i.e., for any query on one, there’s a logically equivalent query on the other.1) And The Principle of Interchangeability follows logically from such considerations:

Definition: The Principle of Interchangeability (of base and virtual relvars) states that there must be no arbitrary and unnecessary distinctions between base and virtual relvars. In other words, views should “look and feel” just like base relvars so far as users are concerned.

Here are some implications of this principle:

  • As in fact I’ve already suggested, views are subject to integrity constraints, just like base relvars. (We usually think of integrity constraints as applying to base relvars specifically, but The Principle of Interchangeability shows this position isn’t really tenable.) See the section “Views and Constraints,” later.

  • In particular, views have keys (and so I should perhaps have included some key specifications in my examples of views prior to this point; Tutorial D permits such specifications but SQL doesn’t). They might also have foreign keys, and foreign keys might refer to them. Again, see the section “Views and Constraints,” later.

  • I didn’t mention this point in Chapter 1, but the “entity integrity” rule is supposed to apply specifically to base relvars, not views. It thereby violates The Principle of Interchangeability. Of course, I reject that rule anyway, because it has to do with nulls (I also reject it because it has to do with primary keys specifically instead of keys in general, but let that pass).

  • Many SQL products, and the SQL standard, provide some kind of “row ID” feature.2 If that feature is available for base tables but not for views—which in practice is quite likely—then it violates The Principle of Interchangeability. (It probably violates The Information Principle, too. See Appendix A.) Now, row IDs as such aren’t part of the relational model, but that fact in itself doesn’t mean they have to be prohibited. But I observe as an important aside that if those row IDs are regarded—as they are, most unfortunately, in the SQL standard, as well as in at least some of the major SQL products— as some kind of object ID in the object oriented sense, then they are prohibited, very definitely! Object IDs are effectively pointers, and (to repeat from Chapter 2) the relational model explicitly prohibits pointers.

  • The distinction discussed in the previous chapter between single relvar and multirelvar constraints is more apparent than real (and the terminology is therefore deprecated, somewhat, for that very reason). Indeed, an example in that chapter—essentially the same London vs. non London suppliers example, in fact—showed that the very same constraint (viz., “suppliers numbers are unique”) could be a single relvar constraint with one design for the database and a multirelvar constraint with another.

  • Perhaps most important of all, we must be able to update views—because if not, then that fact in itself would constitute the clearest possible violation of The Principle of Interchangeability. Again, see the section “Update Operations,” later.

Relation Constants

You might have noticed that, in the formal definition I gave at the beginning of the present section for what a view was, I said the defining expression had to mention at least one relvar. Why? Because if it didn’t, the “virtual relvar” wouldn’t be a relvar at all!—I mean, it wouldn’t be a variable, and it wouldn’t be updatable. For example, the following is a valid CREATE VIEW statement in SQL:

CREATE VIEW S_CONST AS
     ( SELECT temp.*
       FROM ( VALUES ( 'S1' , 'Smith' , 20 , 'London' ) ,
                     ( 'S2' , 'Jones' , 10 , 'Paris'  ) ,
                     ( 'S3' , 'Blake' , 30 , 'Paris'  ) ,
                     ( 'S4' , 'Clark' , 20 , 'London' ) ,
                     ( 'S5' , 'Adams' , 30 , 'Athens' ) )
              AS temp ( SNO , SNAME , STATUS , CITY ) ) ;

But this view certainly can’t be updated. In other words, it’s not a variable at all, let alone a virtual one; rather, it’s what might be called a named relation constant. To elaborate:

  • First of all, I regard the terms constant and value as synonymous. Note, therefore, that there’s a logical difference between a constant and a literal; a literal isn’t a constant but is, rather, a symbol—sometimes referred to as a self-defining symbol—that denotes a constant (as in fact we already know from Chapter 2).

  • Strictly speaking, there’s also a logical difference between a constant and a named constant; a constant is a value, but a named constant is like a variable, except that its value can’t be changed. That said, however, for the remainder of this brief discussion I’ll take the term constant to mean a named constant specifically, for brevity.

  • Constants can be of any type you like, naturally, but relation constants (i.e., constants of some relation type) are my major focus here. Now, Tutorial D doesn’t currently allow users to define their own relation constants, but if it did, a relation constant (or “relcon”) definition would probably look something like this example:

    CONST PERIODIC_TABLE INIT ( RELATION
        { TUPLE { ELEMENT 'Hydrogen' , SYMBOL 'H'  , ATOMICNO  1 } ,
          TUPLE { ELEMENT 'Helium'   , SYMBOL 'He' , ATOMICNO  2 } ,
            ....................................................
          TUPLE { ELEMENT 'Uranium'  , SYMBOL 'U'  , ATOMICNO 92 } } ) ;

Now, I do believe it would be desirable to provide some kind of relation constant or “relcon” functionality along the lines sketched above. In fact, Tutorial D does already provide two system defined relcons: namely, TABLE_DUM and TABLE_DEE, both of which are extremely important, as we know. But that’s the only “relcon” support it provides, and SQL doesn’t provide any at all. It’s true that (as we’ve seen) such support can be simulated by means of the conventional view mechanism; however, there’s a serious logical difference involved here, and I don’t think it helps the cause of understanding to pretend that constants are variables.

VIEWS AND PREDICATES

The Principle of Interchangeability says that views are supposed to “look and feel” just like base relvars. It follows that (a) a view must have a relvar predicate, and further that (b) the parameters to that predicate must correspond one to one to the attributes of the relvar—i.e., the view—in question. However, the predicate that applies to a view V is a derived predicate: It’s derived from the predicates for the relvars in terms of which V is defined, in accordance with the semantics of the relational operations involved in the view defining expression. In fact, you already know this: In Chapter 6, I explained that every relational expression has a corresponding predicate, and of course a view has exactly the predicate that corresponds to its defining expression. For example, consider view LS (“London suppliers”) once again, as defined near the beginning of the section “Views Are Relvars.” That view is a restriction of relvar S, and its predicate is therefore the logical AND of the predicate for S and the restriction condition:

Supplier SNO is under contract, is named SNAME, has status STATUS, and is located in city CITY,

AND

city CITY is London.

Or more colloquially:

Supplier SNO is under contract, is named SNAME, has status STATUS, and is located in London.

Note, however, that this more colloquial form obscures the fact that CITY is a parameter. Indeed it is a parameter, but the corresponding argument is always the constant value London. (Precisely for this reason, in fact, a more realistic version of view LS would probably project away the CITY attribute. I prefer not to do this here, in order to keep the example and corresponding discussion as simple as possible.)

In similar fashion, the predicate for view NLS can be stated thus:

Supplier SNO is under contract, is named SNAME, has status STATUS, and is located in city CITY, which isn’t London.

RETRIEVAL OPERATIONS

The Principle of Interchangeability implies that (a) users should be able to operate on views as if they were base relvars and (b) the DBMS should be able to map those user operations into suitable operations on the base relvars in terms of which the views are ultimately defined. Note: I say “ultimately defined” here because if views really do behave just like base relvars, then one thing we can do is define further views on top of them, as in this SQL example:

CREATE VIEW LS_STATUS
  AS ( SELECT SNO , STATUS
       FROM   LS ) ;   /* LS is the "London suppliers" view */

In this section, I limit my attention to the mapping of read-only or “retrieval” operations, for simplicity (I remind you that the operations of the relational agebra are indeed all read-only). In fact, the process of mapping a read-only operation on a view to operations on the underlying relvars is in principle quite straightforward. By way of example, suppose we issue this SQL query on the London suppliers view LS (and here I deliberately show all dot qualifications explicitly):

SELECT LS.SNO
FROM   LS
WHERE  LS.STATUS > 10

First, then, the DBMS replaces the reference to the view in the FROM clause by the expression that defines that view, yielding:

SELECT LS.SNO
FROM   ( SELECT S.*
         FROM   S
         WHERE  S.CITY = 'London' ) AS LS
WHERE    LS.STATUS > 10

This expression can now be directly evaluated. However, and for performance reasons perhaps more significantly, it can first be simplified—see the section “Expression Transformation” in Chapter 6—to:

SELECT S.SNO
FROM   S
WHERE  S.CITY = 'London'
AND    S.STATUS > 10

In all likelihood, this latter expression is the one that will actually be evaluated.

Now, it’s important to understand that the foregoing procedure works precisely because of the relational closure property. Closure implies among other things that wherever we’re allowed to use the name of a variable to denote the value of the variable in question—for example, in a query—we can always replace that name by a more general expression (just so long as that expression denotes a value of the appropriate type, of course). In the FROM clause, for example, we can have an SQL table name; thus we can also have a more general SQL table expression, and that’s why we’re allowed to substitute the expression that defines the view LS for the name LS in the example. Note: For obvious reasons, the foregoing procedure (i.e., for implementing read-only operations on views) is known as the substitution procedure.

Incidentally, it’s worth noting that the substitution procedure didn’t always work in early versions of SQL—to be specific, in versions prior to SQL:1992—and the reason was that those early versions didn’t fully support the closure property. As a result, certain apparently innocuous queries against certain apparently innocuous tables (actually views) failed, and failed, moreover, in ways that were hard to explain. Here’s a simple example. First the view definition:

CREATE VIEW V
  AS ( SELECT CITY , SUM ( STATUS ) AS SST
       FROM   S
       GROUP  BY CITY ) ;

Now a query:

SELECT CITY
FROM   V
WHERE  SST > 25

This query failed in the SQL standard, prior to 1992, because simple substitution yielded something like the following syntactically invalid expression:

SELECT CITY
FROM   S
WHERE  SUM ( STATUS ) > 25     /* warning: invalid !!! */
GROUP  BY CITY

(This expression is invalid because SQL doesn’t allow “set function” invocations like SUM(STATUS) to be used in the WHERE clause in this manner.)

Now, the standard has been fixed in this regard, as you probably know;3 however, it doesn’t follow that the products have!—and as a matter of fact, the last time I looked (admittedly some time ago now) there was at least one major product that hadn’t. Indeed, precisely because of problems like the foregoing among others, the product in question actually implements certain view retrievals by materialization instead of substitution; that is, it actually evaluates the view defining expression, builds a table to hold the result of that evaluation, and then executes the requested retrieval against that materialized table. And while such an implementation might be argued to conform to the letter of the relational model, as it were, I don’t think it can be said to conform to the spirit. (It probably won’t perform very well, either. Not to mention the point that, in any case, such a technique clearly won’t work for updates as opposed to retrievals.)

VIEWS AND CONSTRAINTS

To repeat, The Principle of Interchangeability says that views are supposed to “look and feel” just like base relvars. It follows that views not only have relvar predicates like base relvars, they also have relvar constraints like base relvars—by which I mean they have both individual relvar constraints and what in Chapter 8 I called a total relvar constraint (for the relvar in question). As with predicates, however, the constraints that apply to a view V are derived: They’re derived from the constraints for the relvars in terms of which V is defined, in accordance with the semantics of the relational operations involved in the defining expression. By way of example, consider view LS once again. That view is a restriction of relvar S—i.e., its defining expression specifies a restriction operation on relvar S—and so its (total) relvar constraint is the logical AND of the (total) relvar constraint for S and the specified restriction condition. Let’s suppose for the sake of the example that the only constraint that applies to base relvar S is the constraint that {SNO} is a key. Then the total relvar constraint for view LS is the AND of that key constraint and the constraint that the city is London, and view LS is required to satisfy that constraint at all times. (In other words, The Golden Rule applies to views just as it does to base relvars.)

For simplicity, from this point forward I’ll use the term view constraint to refer to any constraint that applies to some view. Now, just because view constraints are always derived in the sense explained above, it doesn’t follow that there’s no need to declare them explicitly. For one thing, the system might not be “intelligent” enough to carry out the inferences needed to determine for itself the constraints that apply to some view; for another, such explicit declarations can at least serve documentation purposes (i.e., they can help explain the semantics of the view in question to users, if not to the system); and there’s another reason too, which I’ll get to in a little while.

I claim, then, that it should be possible to declare explicit constraints for views. In particular, then, it should be possible (a) to include explicit KEY and FOREIGN KEY specifications in view definitions and (b) to allow the target relvar in a FOREIGN KEY specification to be a view. Here’s an example to illustrate possibility (a):

VAR LS VIRTUAL ( S WHERE CITY = 'London' )
    KEY { SNO } ;

Tutorial D does permit such specifications; SQL doesn’t. Recommendation: In SQL, include such specifications in the form of comments. For example:

CREATE VIEW LS
  AS ( SELECT S.*
       FROM   S
       WHERE  S.CITY = 'London'
       /* UNIQUE ( SNO ) */ )
     WITH CHECK OPTION ;

Note: As I’ve said, SQL doesn’t permit view constraints to be formulated explicitly as part of the view definition. However, logically equivalent constraints can always be formulated by means of CREATE ASSERTION (if it’s supported, that is!). More generally, in fact, CREATE ASSERTION allows us to formulate constraints of any kind we like for any table that could be a view if we chose to define it as such—in other words, for any table that can be defined by some arbitrarily complex table expression (which is to say, any table at all).4 I’ll have more to say about this possibility in a few moments.

Now, having said that it should be possible to declare explicit constraints on views, I should now add that sometimes it might be a good idea not to, because it could lead to redundant checking. For example, as I’ve said, the specification KEY{SNO} clearly applies to view LS— but that’s because it applies to base relvar S as well,5 and declaring it explicitly for view LS might conceivably lead to the same constraint being checked twice. (But it should still be stated as part of the view documentation, somehow, because it’s certainly part of the semantics of the view.)

Perhaps more to the point, there definitely are situations where declaring view constraints explicitly could be a good idea. Here’s an example, expressed in SQL for definiteness. We’re given two base tables that look like this (in outline):

CREATE TABLE FDH
     ( FLIGHT ... ,
       DESTINATION ... ,
       HOUR ... ,
       UNIQUE ( FLIGHT ) ) ;

CREATE TABLE DFGP
     ( DAY ... ,
       FLIGHT ... ,
       GATE ... ,
       PILOT ... ,
       UNIQUE ( DAY , FLIGHT ) ) ;

The tables have predicates as follows:6

  • FDH:   Flight FLIGHT leaves at hour HOUR for destination DESTINATION.

  • DFGP: On day DAY, flight FLIGHT with pilot PILOT leaves from gate GATE.

They’re subject to the following constraints (expressed here in a kind of pseudo logical style):

IF ( f1,n1,h ) , ( f2,n2,h ) ∈ FDH AND
   ( d,f1,g,p1 ) , ( d,f2,g,p2 ) ∈ DFGP
THEN f1 = f2 AND p1 = p2 /* and n1 = n2, incidentally */

IF ( f1,n1,h ) , ( f2,n2,h ) ∈ FDH AND
   ( d,f1,g1,p ) , ( d,f2,g2,p ) ∈ DFGP
THEN f1 = f2 AND g1 = g2 /* and n1 = n2, incidentally */

Explanation: The first of these constraints says:

  1. If two rows of FDH, one for flight f1 (with destination n1) and one for flight f2 (with destination n2), have the same HOUR h, and

  2. If two rows of DFGP, one each for the FLIGHTs f1 and f2 from those two FDH rows, have the same DAY d and GATE g, then

  3. Those two FDH rows must be one and the same and those two DFGP rows must be one and the same. In other words, if we know the HOUR, DAY, and GATE, then the FLIGHT and PILOT (and DESTINATION) are determined.

The second constraint is analogous:

  1. If two rows of FDH, one for flight f1 (with destination n1) and one for flight f2 (with destination n2), have the same HOUR h, and

  2. If two rows of DFGP, one each for the FLIGHTs f1 and f2 from those two FDH rows, have the same DAY d and PILOT p, then

  3. Those two FDH rows must be one and the same and those two DFGP rows must be one and the same. In other words, if we know the HOUR, DAY, and PILOT, then the FLIGHT and GATE (and DESTINATION) are determined.

    Now, stating these constraints directly in terms of the two base tables is fairly nontrivial:

    CREATE ASSERTION BTCX1 CHECK
         ( NOT ( EXISTS ( SELECT * FROM FDH AS FX WHERE
                 EXISTS ( SELECT * FROM FDH AS FY WHERE
                 EXISTS ( SELECT * FROM DFGP AS DX WHERE
                 EXISTS ( SELECT * FROM DFGP AS DY WHERE
                 FY.HOUR = FX.HOUR AND
                 DX.FLIGHT = FX.FLIGHT AND
                 DY.FLIGHT = FY.FLIGHT AND
                 DY.DAY = DX.DAY AND
                 DY.GATE = DX.GATE AND
               ( FX.FLIGHT <> FY.FLIGHT OR
                 DX.PILOT <> DY.PILOT ) ) ) ) ) ) ) ;

    CREATE ASSERTION BTCX2 CHECK
         ( NOT ( EXISTS ( SELECT * FROM FDH AS FX WHERE
                 EXISTS ( SELECT * FROM FDH AS FY WHERE
                 EXISTS ( SELECT * FROM DFGP AS DX WHERE
                 EXISTS ( SELECT * FROM DFGP AS DY WHERE
                          FY.HOUR = FX.HOUR AND
                          DX.FLIGHT = FX.FLIGHT AND
                          DY.FLIGHT = FY.FLIGHT AND
                          DY.DAY = DX.DAY AND
                          DY.PILOT = DX.PILOT AND
                        ( FX.FLIGHT <> FY.FLIGHT OR
                          DX.GATE <> DY.GATE ) ) ) ) ) ) ) ;

But stating them in the form of key constraints on a view definition, if that were permitted, would take care of matters nicely:

CREATE VIEW V AS
     ( SELECT * FROM FDH NATURAL JOIN DFGP ,
       UNIQUE ( DAY , FLIGHT )             /* this is      */
       UNIQUE ( DAY , HOUR , GATE ) ,      /* hypothetical */
       UNIQUE ( DAY , HOUR , PILOT ) ) ;   /* syntax !!!!! */

Explanation: The join of FDH and DFGP is on {FLIGHT} and is a one to many join, and it has heading {FLIGHT,DESTINATION,HOUR,DAY,GATE,PILOT}. Moreover, since {DAY,FLIGHT} is a key for the many side of the join (i.e., relvar DFGP), it’s clearly a key for the result. But as we saw from our earlier analysis of the two stated constraints, the following functional dependencies also hold in that result:

{ DAY , HOUR , GATE } → { FLIGHT , PILOT , DESTINATION }

{ DAY , HOUR , PILOT } → { FLIGHT , GATE , DESTINATION }

It follows that {DAY,HOUR,GATE} and {DAY,HOUR,PILOT} are also keys for the result of the join. Hence the UNIQUE specifications shown.

Unfortunately, however, the foregoing solution isn’t available to us, because SQL doesn’t allow UNIQUE specifications (or constraint specifications of any kind, of course) on views. But we can and should at least specify those hypothetical view constraints in terms of suitable assertions, as follows:

CREATE VIEW V AS ( SELECT * FROM FDH NATURAL JOIN DFGP ) ;

CREATE ASSERTION VCX1
       CHECK ( UNIQUE ( SELECT DAY , FLIGHT FROM V ) ) ;

CREATE ASSERTION VCX2
       CHECK ( UNIQUE ( SELECT DAY , HOUR , GATE FROM V ) ) ;

CREATE ASSERTION VCX3
       CHECK ( UNIQUE ( SELECT DAY , HOUR , PILOT FROM V ) ) ;

In fact, of course, we don’t actually need to define the view V in order to define these constraints—we could simply replace the references to view V in the UNIQUE expressions in the constraints by the defining expression for V, like this:7

CREATE ASSERTION VCX1
       CHECK ( UNIQUE ( SELECT DAY , FLIGHT
                        FROM   FDH NATURAL JOIN DFGP ) ) ;

CREATE ASSERTION VCX2
       CHECK ( UNIQUE ( SELECT DAY , HOUR , GATE
                        FROM   FDH NATURAL JOIN DFGP ) ) ;

CREATE ASSERTION VCX3
       CHECK ( UNIQUE ( SELECT DAY , HOUR , PILOT
                        FROM   FDH NATURAL JOIN DFGP ) ) ;

Note: I didn’t mention the point in Chapter 8, but Tutorial D does provide direct support for saying the relation denoted by some relational expression is required to satisfy some key constraint.8 By way of illustration, here are Tutorial D analogs of assertions VCX1, VCX2, and VCX3:

CONSTRAINT VCX1 ( FDH JOIN DFGP ) KEY { DAY , FLIGHT } ;

CONSTRAINT VCX2 ( FDH JOIN DFGP ) KEY { DAY , HOUR , GATE } ;

CONSTRAINT VCX3 ( FDH JOIN DFGP ) KEY { DAY , HOUR , PILOT } ;

UPDATE OPERATIONS

Note: Much of the discussion in this section is based on material from my book View Updating and Relational Theory: Solving the View Update Problem (O’Reilly, 2013). Refer to Appendix G for further details.

I claimed earlier that The Principle of Interchangeability implies that views must be updatable. Now, I can hear some readers objecting right away: Surely some views just can’t be updated, can they? For example, consider a view defined as the join S JOIN P—a many to many join, observe—of relvars S and P on the basis of their sole common attribute CITY; surely we can’t insert a tuple into, or delete a tuple from, that view, can we? Note: I apologize for the sloppy manner of speaking here; as we know from Chapter 5, there’s no such thing as “inserting a tuple” or “deleting a tuple” in the relational model (updates, like all relational operations, are always set level). But to be too pedantic about such matters in the present discussion would get in the way of understanding, probably.

Well, even if were true—which in fact it isn’t, as I’ll show later—that updates can’t be done on a view like S JOIN P, let me point out that some updates can’t be done on some base relvars, either. For example, inserting a tuple into base relvar SP will fail if the SNO value in that tuple doesn’t currently exist in base relvar S. Thus, updates on base relvars can always fail on integrity constraint violations—and the same is true for updates on views. So it isn’t that some views are inherently nonupdatable; rather, it’s just that some updates on some views—like some updates on some base relvars—fail on integrity constraint violations (i.e., violations of The Golden Rule).

Let’s look at a detailed example—not the S JOIN P example (I’ll get back to that one later) but the “London vs. non London suppliers” example once again, involving views LS and NLS. By The Principle of Interchangeability, the behavior of these two relvars, and indeed that of relvar S also, shouldn’t depend on which relvars if any are base ones and which if any are views. Until further notice, therefore, let’s suppose all three are base relvars:

VAR S   BASE RELATION { ... } KEY { SNO } ;
VAR LS  BASE RELATION { ... } KEY { SNO } ;
VAR NLS BASE RELATION { ... } KEY { SNO } ;

As the definitions show, {SNO} is a key for each of these relvars. (As a matter of fact, {SNO} in each of relvars LS and NLS is a foreign key, referencing the key {SNO} in relvar S, though I haven’t bothered to show these constraints explicitly.) The relvars are also clearly subject to the following additional constraints:9

CONSTRAINT ... LS  = ( S WHERE CITY = 'London' ) ;
CONSTRAINT ... NLS = ( S WHERE CITY ≠ 'London' ) ;

What’s more, these constraints taken singly or together imply certain additional ones, as follows:

CONSTRAINT ... IS_EMPTY ( LS  WHERE CITY ≠ 'London' ) ;
CONSTRAINT ... IS_EMPTY ( NLS WHERE CITY = 'London' ) ;

CONSTRAINT ... S = UNION { LS , NLS } ;
CONSTRAINT ... IS_EMPTY ( JOIN { LS { SNO } , NLS { SNO } } ) ;

The first two of these additional constraints are self-explanatory; the third says every supplier is represented in either LS or NLS, and the fourth says no supplier is represented in both. (In other words, the union in the third constraint is actually a disjoint union, and the join in the fourth constraint is actually an intersection.)

Now, in order to ensure these constraints remain satisfied when updates are done, certain compensatory actions (or rules) need to be in effect. In general, a compensatory action is an additional update, over and above some update that’s requested by the user, that’s performed automatically by the DBMS, precisely in order to avoid some integrity violation that might otherwise occur. Cascade delete is a typical example (see Chapter 5).10 In the case at hand, in fact, it should be clear that “cascading” is exactly what we need to deal with DELETE operations in particular. First, deleting tuples from either LS or NLS clearly needs to “cascade” to cause those same tuples to be deleted from S; so we might imagine a couple of compensatory actions— actually cascade delete rules—that look something like this (hypothetical syntax):

ON DELETE ls FROM LS : DELETE ls FROM S ;

ON DELETE nls FROM NLS : DELETE nls FROM S ;

Second, deleting tuples from S clearly needs to “cascade” to cause those same tuples to be deleted from whichever of LS or NLS they appear in:

ON DELETE s FROM S : DELETE ( s WHERE CITY = 'London' ) FROM LS ,
                     DELETE ( s WHERE CITY ≠ 'London' ) FROM NLS ;

Aside: Given that an attempt (via DELETE, as opposed to I_DELETE) to delete a nonexistent tuple has no effect—see Chapter 5—this latter rule could be simplified to just this:

ON DELETE s FROM S : DELETE s FROM LS , DELETE s FROM NLS ;

However, the original formulation is perhaps preferable, inasmuch as it’s clearly more specific. End of aside.

Analogously, we’ll need some compensatory actions (“cascade insert rules”) for INSERT operations:

ON INSERT ls INTO LS : INSERT ls INTO S ;

ON INSERT nls INTO NLS : INSERT nls INTO S ;

ON INSERT s INTO S : INSERT ( s WHERE CITY = 'London' ) INTO LS ,
                     INSERT ( s WHERE CITY ≠ 'London' ) INTO NLS ;

As for explicit UPDATE operations, they can be regarded, at least in the case at hand, as a DELETE followed by an INSERT; in other words, the compensatory actions for UPDATE are just a combination of the actions for DELETE and INSERT, loosely speaking.11 For example, consider the following UPDATE on relvar S:

UPDATE S WHERE SNO = 'S1' : { CITY := 'Oslo' } ;

What happens here is this:

  1. The existing tuple for supplier S1 is deleted from relvar S and (thanks to the cascade delete rule from S to LS) from relvar LS also.

  2. Another tuple for supplier S1, with CITY value Oslo, is inserted into relvar S and (thanks to the cascade insert rule from S to NLS) into relvar NLS also. In other words, the tuple for supplier S1 has moved from relvar LS to relvar NLS!—now speaking very loosely, of course.

    Suppose now that the original UPDATE had been directed at relvar LS rather than relvar S:

    UPDATE LS WHERE SNO = 'S1' : { CITY := 'Oslo' } ;

Now what happens is this:

  1. The existing tuple for supplier S1 is deleted from relvar LS and (thanks to the cascade delete rule from LS to S) from relvar S also.

  2. An attempt is made to insert another tuple for supplier S1, with CITY value Oslo, into relvar LS. This attempt fails, however, because it violates the constraint on that relvar that the CITY value must always be London. So the update fails overall; the first step (viz., deleting the original tuple for supplier S1 from LS and S) is undone, and the net effect is that the database remains unchanged.

And now I come to my real point: Everything I’ve said in this discussion so far applies pretty much unchanged if some or all of the relvars concerned are views. For example, suppose as we originally did that S is a base relvar and LS and NLS are views:

VAR S   BASE RELATION { ................. } KEY { SNO } ;
VAR LS  VIRTUAL ( S WHERE CITY = 'London' ) KEY { SNO } ;
VAR NLS VIRTUAL ( S WHERE CITY ≠ 'London' ) KEY { SNO } ;

Now consider a user who sees only views LS and NLS, but wants to be able to behave as if those views were actually base relvars. Of course, that user will be aware of the corresponding relvar predicates, which as we saw earlier are essentially as follows:

  • LS:

Supplier SNO is under contract, is named SNAME, has status STATUS, and is located in city CITY (which is London).

  • NLS:

Supplier SNO is under contract, is named SNAME, has status STATUS, and is located in city CITY (which isn’t London).

That same user will also be aware of the following constraints (as well as the fact that {SNO} is a key for both relvars):

CONSTRAINT ... IS_EMPTY ( LS  WHERE CITY ≠ 'London' ) ;
CONSTRAINT ... IS_EMPTY ( NLS WHERE CITY = 'London' ) ;
CONSTRAINT ... IS_EMPTY ( JOIN { LS { SNO } , NLS { SNO } } ) ;

However, the user won’t be aware of any of the compensatory actions as such, precisely because that user isn’t aware that LS and NLS are actually views of relvar S; in fact, the user won’t even be aware of the existence of relvar S (which is why the user is also unaware of the constraint that the union of LS and NLS is equal to S). But updates by that user on relvars LS and NLS will all work correctly, just as if LS and NLS really were base relvars.

What about a user who sees only view LS, say (i.e., not view NLS and not base relvar S), but still wants to behave as if LS were a base relvar? Well, that user will certainly be aware of the pertinent relvar predicate and the following constraint:

CONSTRAINT ... IS_EMPTY ( LS WHERE CITY ≠ 'London' ) ;

Clearly, this user mustn’t be allowed to insert tuples into this relvar, nor to update supplier numbers within this relvar, because such operations have the potential to violate constraints of which this user is unaware (and must be unaware). Again, however, there are parallels with base relvars as such: With base relvars in general, it’ll be the case that certain users will be prohibited from performing certain updates on certain relvars (e.g., consider a user who sees only base relvar SP and not base relvar S). So this state of affairs doesn’t in and of itself constitute a violation of The Principle of Interchangeability, either.

One last point: Please understand that I’m not suggesting that the DBA should have to specify, explicitly, all of the various constraints and compensatory actions that apply in connection with any given view. Au contraire: In many cases if not all, I believe the DBMS should be able to determine those constraints and actions for itself, automatically, from the pertinent view definitions.12

From the foregoing discussion, I hope you can see that it’s not that updates are intrinsically impossible on views; rather, it’s just that some updates on some views fail on a violation of The Golden Rule. It follows that in order to support updates on a view V properly, the system needs to know the total relvar constraint, VC say, that applies to V. In other words, it needs to be able to perform constraint inference, so that, given the constraints that apply to the relvars in terms of which V is defined, it can determine VC. As I’m sure you realize, however, SQL products today do, or are capable of doing, very little in the way of such constraint inference. As a result, SQL’s support for view updating is quite weak (and this is true of the standard as well as the major products). I’ll have more to say regarding the specifics of SQL’s view updating support in the next subsection but one (“More on SQL”). First, however, I want discuss one particular aspect of that support in some detail: namely, the CHECK option.

The CHECK Option

Consider the following SQL INSERT on view LS (“London suppliers”) from the previous subsection:

INSERT INTO LS ( SNO , SNAME , STATUS , CITY )
       VALUES  ( 'S6', .............. , 'Madrid' ) ;

This INSERT maps to:

INSERT INTO S  ( SNO , SNAME , STATUS , CITY )
       VALUES  ( 'S6', .............. , 'Madrid' ) ;

(The only change is to the target table name.) Observe now that the new row violates the constraint for view LS, because the city isn’t London. So what happens? By default, SQL will insert that row into base table S; however, precisely because it doesn’t satisfy the defining expression for view LS, it won’t be visible through that view. From the perspective of that view, in other words, the new row just drops out of sight (alternatively, we might say the INSERT is a “no op”—again, from the perspective of the view). Actually, however, what’s happened from the perspective of view LS is that The Assignment Principle has been violated! (Recall from Chapter 5 and elsewhere that The Assignment Principle states that after assignment of value v to variable V, the comparison v = V must evaluate to TRUE.)

Now, I hope it goes without saying that the foregoing behavior is logically incorrect. It wouldn’t be tolerated in Tutorial D. As for SQL, the CHECK option is provided to address the problem: If (but only if) WITH CASCADED CHECK OPTION is specified for a given view, then updates to that view are required to conform to the defining expression for that view. Recommendation: Specify WITH CASCADED CHECK OPTION on view definitions whenever possible. Be aware, however, that SQL permits such a specification only if it regards the view as updatable,13 and (as we’ll see in the next subsection) not all logically updatable views are regarded as such in SQL.

Note: The alternative to CASCADED is LOCAL, but don’t use it. (The reason I say this is that the semantics of LOCAL are bizarre in the extreme—so bizarre, in fact, that (a) I don’t want to waste time and space and energy attempting to explain them here, and in any case (b) it’s hard to see why anyone would ever want such semantics. Indeed, it’s hard to resist the suspicion that LOCAL was included in the standard originally for no other reason than to allow certain flawed implementations, extant at the time, to be able to claim conformance.) It’s all right to specify neither CASCADED nor LOCAL, however, because CASCADED is the default.

More on SQL

As we’ve seen, SQL’s support for view updating is limited. It’s also extremely hard to understand!—in fact, the standard is even more impenetrable in this area than it usually is. The following extract (which is quoted verbatim from the 2003 version of the standard, SQL:2003) gives some idea of the complexities involved:

[The] <query expression> QE1 is updatable if and only if for every <query expression> or <query specification> QE2 that is simply contained in QE1:

  1. QE1 contains QE2 without an intervening <query expression body> that specifies UNION DISTINCT, EXCEPT ALL, or EXCEPT DISTINCT.

  2. If QE1 simply contains a <query expression body> QEB that specifies UNION ALL, then:

    1. QEB immediately contains a <query expression> LO and a <query term> RO such that no leaf generally underlying table of LO is also a leaf generally underlying table of RO.

    2. For every column of QEB, the underlying columns in the tables identified by LO and RO, respectively, are either both updatable or not updatable.

  3. QE1 contains QE2 without an intervening <query term> that specifies INTERSECT.

  4. QE2 is updatable.

Here’s my own gloss on the foregoing extract:

  • First of all, it doesn’t even seem to make sense, at least on the face of it. To be specific, the opening sentence says, in effect, that four conditions a), b), c), and d) have to be satisfied “for every ... QE2 that is simply contained in QE1”—yet item b) in particular has nothing to do with QE2 (indeed, it doesn’t even mention it).

  • Next, even if I’m wrong and the extract does make sense, note that it states just one of the many rules that have to be taken in combination in order to determine whether a given view is updatable in SQL.

  • The rules in question aren’t given all in one place but are scattered over many different portions of the standard.

  • All of those rules rely on a variety of additional concepts and constructs—updatable columns, leaf generally underlying tables, <query term>s, and so on—that are in turn defined in still further portions of the standard.

Because of such considerations, I won’t even attempt a precise characterization here of just which views SQL regards as updatable. Loosely speaking, however, they do at least include the following:

  1. Views defined as a restriction and/or projection of a single base table

  2. Views defined as a one to one or one to many join of two base tables (in the one to many case, only the many side is updatable)

  3. Views defined as a UNION ALL or INTERSECT of two distinct base tables

  4. Certain combinations of Cases 1-3 above

But even these limited cases are treated incorrectly, thanks to SQL’s lack of proper support for (a) constraint inference, (b) The Golden Rule, and (c) The Assignment Principle, and thanks also to the fact that SQL permits (d) nulls and (e) duplicate rows. And the picture is complicated still further by the fact that SQL identifies four distinct cases: A view in SQL can be updatable, potentially updatable, simply updatable, or insertable into. Now, the standard does define these terms formally, but it gives no insight into their intuitive meaning or why they were given those names. However, I can at least say that “updatable” refers to UPDATE and DELETE and “insertable into” refers to INSERT, and a view can’t be insertable into unless it’s updatable.14 But note the suggestion that some views might permit some updates but not others (e.g., DELETEs but not INSERTs), and the further suggestion that it’s therefore possible that DELETE and INSERT might not be inverses of each other. Both of these facts, if facts they are, I regard as further violations of The Principle of Interchangeability.

Regarding Case 1 above, however, I can be a little more precise. To be specific, an SQL view is certainly updatable if all of the following conditions are satisfied:

  • The defining expression is either (a) a simple SELECT expression (not a UNION, INTERSECT, or EXCEPT involving two such expressions) or (b) an “explicit table” (see Chapter 12) that’s logically equivalent to such an expression. Note: I’ll assume for simplicity in what follows that Case (b) is automatically converted to Case (a).

  • The SELECT clause in that SELECT expression implicitly or explicitly specifies ALL, not DISTINCT.

  • After expansion of any “asterisk style” items, every item in the SELECT item commalist is a simple column name (possibly dot qualified, and possibly with a corresponding AS specification), and no such item appears more than once.

  • The FROM clause in that SELECT expression takes the form FROM T [AS ...], where T is the name of an updatable table (either a base table or an updatable view).

  • The WHERE clause, if any, in that SELECT expression contains no subquery in which the FROM clause references T.

  • The SELECT expression has no GROUP BY or HAVING clause.

Recommendation: Lobby the SQL vendors to improve their support for view updating as soon as possible.

The S JOIN P Example

Now I’d like to come back to the S JOIN P example. The truth is, the example discussed in detail earlier (London vs. non London suppliers) was so simple that I suspect some readers might still be harboring doubts about my general claim—my claim, that is, that all views are updatable, modulo only possible Golden Rule violations. In an attempt to buttress that claim further, therefore, I want to examine a case that, historically, many people have regarded as “impossible”: to be specific, the many to many join case, of which S JOIN P is an example.

First let me simplify matters somewhat by eliminating considerations that are irrelevant to my purpose. To be specific, let’s assume, purely for the purposes of the present discussion, that relvar S has just two attributes, SNO and CITY, and relvar P also has just two attributes, PNO and CITY. Now let’s define their join as a view called SCP:

VAR SCP VIRTUAL ( S JOIN P ) KEY { SNO , PNO } ;

Sample values are shown in Fig. 9.1. Note: As you can see from that figure, I’ve dropped from our usual sample values any supplier whose city isn’t also a part city and any part whose city isn’t also a supplier city. Please note, however, that I don’t intend to maintain this simplification throughout the discussion that follows; that is, I’m not going to assume for the purposes of that discussion that every supplier city has to be a part city and vice versa, even though I do need to simplify the presentation somewhat for space and other reasons.

image

Fig. 9.1: Relvars S, P, and SCP—sample values

As in the case of London vs. non London suppliers, the first thing I’m going to do is see what happens if we think of view SCP as just another base relvar, living alongside the base relvars in terms of which it’s defined. Clearly, then, the following constraint holds:

CONSTRAINT ... SCP = S JOIN P ;

Now let’s consider some updates on relvar S (since the roles played by suppliers and parts are clearly symmetric in this example, there’s no need to consider updates on relvar P as well). First an INSERT:15

INSERT ( S5 , Athens ) INTO S ;

It should be clear that this INSERT does exactly what it says, no more and no less. And the same goes for the following DELETE, which removes the tuple just inserted:

DELETE ( S5 , Athens ) FROM S ;

But what about this INSERT?—

INSERT ( S7 , Paris ) INTO S ;

The point here is, of course, that there are some parts in Paris, viz., parts P2 and P5. Thus, this INSERT can and will succeed, just so long as it additionally has the effect of inserting the following tuples into relvar SCP:

( S7 , Paris , P2 )
( S7 , Paris , P5 )

Moreover, the following DELETE can and will now succeed, just so long as it has the additional effect of removing those extra tuples from relvar SCP:

DELETE ( S7 , Paris ) FROM S ;

From these examples and others like them, I hope it’s clear that the following compensatory actions are appropriate:

ON INSERT s INTO S : INSERT ( P JOIN s ) INTO SCP ;
ON INSERT p INTO P : INSERT ( S JOIN p ) INTO SCP ;

ON DELETE s FROM S : DELETE ( P JOIN s ) FROM SCP ;
ON DELETE p FROM P : DELETE ( S JOIN p ) FROM SCP ;

I turn now to updates on the join SCP. The compensatory action (or rule) for INSERT is fairly obvious:

ON INSERT i INTO SCP :
   INSERT i { SNO , CITY } INTO S ,
   INSERT i { PNO , CITY } INTO P ;

Note: I say this rule is obvious, but its consequences might not be, at least not immediately. Let’s look at a couple of examples, using the sample values from Fig. 9.1:

  1. Suppose we insert (S9,London,P1) into SCP. This INSERT will cause (S9,London) to be inserted into S but will have no effect on P, because (P1,London) already appears in P. But inserting (S9,London) into S will cause the insert rule for S to come into play, and the net effect will be that (S9,London,P4) and (S9,London,P6) will be inserted into SCP in addition to the originally requested tuple (S9,London,P1).

  2. Suppose we insert (S7,Paris,P7) into SCP. The net effect will be to insert (S7,Paris) into S, (P7,Paris) into P, and all of the following tuples into SCP:

    ( S7 , Paris , P7 )

    ( S7 , Paris , P2 )
    ( S7 , Paris , P5 )

    ( S2 , Paris , P7 )
    ( S3 , Paris , P7 )

Now, the foregoing insert rule (for inserts on SCP) might loosely be characterized as “Insert S subtuples unless they already exist and insert P subtuples unless they already exist.” Thus, intuition and symmetry both suggest that the delete rule (for deletes on SCP) should be “Delete S subtuples unless they exist elsewhere and delete P subtuples unless they exist elsewhere.”16 Formally:

ON DELETE d FROM SCP :
   DELETE ( ( S MATCHING d ) NOT MATCHING SCP ) FROM S ,
   DELETE ( ( P MATCHING d ) NOT MATCHING SCP ) FROM P ;

Again let’s consider some examples, using the sample values from Fig. 9.1:

  1. Suppose we delete all tuples from SCP where the city is Paris. This DELETE will cascade to delete the tuples for suppliers S2 and S3 from relvar S and the tuples for parts P2 and P5 from relvar P.

  2. Suppose we delete all tuples for supplier S1 from SCP. This DELETE will cascade to delete the tuple (S1,London) from relvar S but will have no effect on relvar P, because SCP still contains some tuples where the city is London—to be specific, the tuples (S4,London,P1), (S4,London,P4), and (S4,London,P6).

  3. Suppose we attempt to delete just the tuple (S1,London,P1) from SCP. This attempt must fail; since SCP contains other tuples for both supplier S1 and part P1, the attempted DELETE has no effect on relvars S and P, and so if it were allowed to succeed we would have a Golden Rule violation on our hands (to be specific, SCP would no longer be equal to the join of S and P).

I’ll leave it as an exercise for you to show that, given the foregoing insert and delete actions, explicit UPDATEs all work as intuitively expected.

And now, as I’m sure you’ve been expecting, I’m going to claim that everything I’ve been saying in the foregoing discussion applies pretty much unchanged if some or all of the relvars concerned are views. In particular, let S and P be base relvars as usual and let SCP be a view:

VAR S   BASE RELATION { ................. } KEY { SNO } ;
VAR P   BASE RELATION { ................. } KEY { PNO } ;
VAR SCP VIRTUAL ( S JOIN P ) KEY { SNO , PNO } ;

Now consider a user who sees only relvar SCP (the view). As far as that user is concerned, then, that view will behave in all respects exactly as if it were a base relvar (though it’s only fair to point out that the behavior in question won’t be entirely straightforward, as I’ll explain in a moment). The predicate is:

Supplier SNO and part PNO both have city CITY.

The user will be aware of this predicate, and aware also of the fact that {SNO,PNO} is a key. Moreover, the user will also be aware that the following functional dependencies (FDs) hold as well:

{ SNO } → { CITY }
{ PNO } → { CITY }

These FDs are effectively inherited from relvars S and P, respectively.

Now, I didn’t point this out before, but in fact the following multivalued dependencies (MVDs) also hold in SCP (and the user will be aware of these MVDs, too):

{ CITY } →→ { SNO } | { PNO }

I don’t want to get into details about MVDs in general (they’re discussed in depth in the book Database Design and Relational Theory: Normal Forms and All That Jazz, a companion to the present book). All I want to say here is that (a) the fact that these particular MVDs hold mean that relvar SCP isn’t in fourth normal form (4NF),17 and (b) together, these MVDs are equivalent to the following constraint:

CONSTRAINT ... SCP = JOIN { SCP { SNO , CITY } , SCP { PNO , CITY } } ;

(i.e., SCP is equal at all times to the join of its projections on {SNO,CITY} and {PNO,CITY}).

Since SCP isn’t in 4NF, there are bound to be situations where updating it turns out to be a little awkward. (Incidentally, note that this observation is valid regardless of whether SCP is a base relvar or a view.) Let me be more specific. First of all, updates in general must abide by those MVDs, of course. Second, INSERTs in particular are subject to the following rule:

ON INSERT i INTO SCP :
   INSERT ( SCP JOIN i { SNO , CITY } ) INTO SCP ,
   INSERT ( SCP JOIN i { PNO , CITY } ) INTO SCP ;

This rule might look a little complicated, but it’s basically just a combination of the earlier rules for INSERTs on S, P, and SCP, revised to eliminate references to S and P as such.

Aside: Observe the implication that such a rule ought to, and indeed does, apply even in the case where SCP is a base relvar and relvars S and P don’t exist (or are hidden); indeed, we could have arrived at this rule by considering just relvar SCP in isolation. Observe further that I don’t give a corresponding delete rule. In fact, however, DELETEs on SCP will always fail unless they request, explicitly or implicitly, deletion of all tuples for some particular supplier(s) and/or deletion of all tuples for some particular part(s). E.g., given the sample values shown in Fig. 9.1, a request to delete just the tuple (S1,London,P1) will fail, while a request to delete all Paris tuples will succeed (as indeed we saw earlier in both cases). End of aside.

I’d like to close this rather lengthy section on view updating by repeating something I said earlier (because I think it’s important): Please understand that I’m not suggesting that the DBA should have to specify, explicitly, all of the various constraints and compensatory actions that apply in connection with any given view. On the contrary, I believe that (in many cases if not all) the DBMS should be able to determine those constraints and actions for itself, automatically, from the pertinent view definitions.

WHAT ARE VIEWS FOR?

So far in this chapter, I’ve been tacitly assuming you already know what views are for—but now I’d like to say something about that topic nonetheless. In fact, views serve two rather different purposes:

  • The user who actually defines view V is, obviously, aware of the corresponding defining expression exp. Thus, that user can use the name V wherever the expression exp is intended; however, such uses are basically just shorthand, and are explicitly understood to be just shorthand by the user in question. (What’s more, the user in question is unlikely to request any updates on V—though if such updates are requested, they must perform as expected, of course.)

  • By contrast, a user who’s merely informed that V exists and is available for use is supposed (at least ideally) not to be aware of the expression exp; to that user, in fact, V is supposed to look and feel just like a base relvar, as I’ve already explained at length. And it’s this second use of views that’s the really important one, and the one I’ve been concentrating on, tacitly, throughout this chapter prior to this point.

Logical Data Independence

The second of the foregoing purposes is intimately related to the question of logical data independence. Recall from Chapter 1 that physical data independence means we can change the way the data is physically stored and accessed without having to make corresponding changes in the way the data is perceived by the user. Reasonably enough, then, logical data independence means we can change the way the data is logically stored and accessed without having to make corresponding changes in the way the data is perceived by the user. And it’s views that are supposed to provide that logical data independence.

By way of example, suppose that for some reason (the precise reason isn’t important here) we wish to replace base relvar S by base relvars LS and NLS, as follows:

VAR LS BASE RELATION        /* London suppliers */
  { SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR }
  KEY { SNO } ;

VAR NLS BASE RELATION       /* non London suppliers */
  { SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR }
  KEY { SNO } ;

As we saw earlier, the old relvar S is the disjoint union of the two new relvars LS and NLS (and LS and NLS are both restrictions of that old relvar S). So we can define a view that’s exactly that union, and name it S:

VAR S VIRTUAL ( LS D_UNION NLS ) KEY { SNO } ;

(Note that now I’ve specified D_UNION instead of UNION, for explicitness.) Any expression that previously referred to base relvar S will now refer to view S instead. Thus, assuming the system supports operations on views correctly—unfortunately a rather large assumption, given the state of today’s products—users will be immune to this particular change in the logical structure of the database.

I note in passing that replacing the original suppliers relvar S by its two restrictions LS and NLS isn’t a totally trivial matter. In particular, something might have to be done about the shipments relvar SP, since that relvar has a foreign key that references the original suppliers relvar S. See Exercise 9.8 at the end of the chapter.

VIEWS AND SNAPSHOTS

Throughout this chapter, I’ve been using the term view in its original sense—the sense, that is, in which (in the relational context, at least) it was originally defined. Unfortunately, however, some terminological confusion has arisen in recent years: certainly in the academic world, and to some extent in the commercial world also. Recall that a view can be thought of as a derived relvar. Well, there’s another kind of derived relvar too, called a snapshot. As the name might perhaps suggest, a snapshot, although it’s derived, is real, not virtual—meaning it’s represented not just by its definition in terms of other relvars, but also, at least conceptually, by its own separate copy of the data. For example (to invent some syntax on the fly):

VAR LSS SNAPSHOT ( S WHERE CITY = 'London' )
    KEY { SNO }
    REFRESH EVERY DAY ;

Defining a snapshot is just like executing a query, except that:

  • The result of the query is saved in the database under the specified name (LSS in the example) as a “read-only relvar” (read-only, that is, apart from the periodic refresh—see the bullet item immediately following).

  • Periodically (EVERY DAY in the example) the snapshot is refreshed, meaning its current value is discarded, the query is executed again, and the result of that new execution becomes the new snapshot value. Of course, other REFRESH options are possible too: for example, EVERY MONDAY, EVERY 5 MINUTES, EVERY MONTH, and so on.

In the example, therefore, snapshot LSS represents the data as it was at most 24 hours ago.

Snapshots are important in data warehouses, distributed systems, and many other contexts. In all such cases, the rationale is that applications can often tolerate—in some cases even require—data “as of” some particular point in time. Reporting and accounting applications are a case in point; such applications typically require the data to be frozen at an appropriate moment (for example, at the end of an accounting period), and snapshots allow such freezing to occur without locking out other applications.

So far, so good. The problem is, snapshots have come to be known (at least in some circles) not as snapshots at all but as materialized views. But they aren’t views! Views aren’t supposed to be materialized at all;18 as we’ve seen, operations on views are supposed to be implemented by mapping them into suitable operations on the underlying relvars. Thus, “materialized view” is simply a contradiction in terms. Worse yet, the unqualified term view is now often taken to mean a “materialized view” specifically—again, at least in some circles—and so we’re in danger of no longer having a good term to mean a view in the original sense. In this book I do use the term view in its original sense, but be warned that it doesn’t always have that meaning elsewhere. Recommendations: Never use the term view, unqualified, to mean a snapshot; never use the term materialized view; and watch out for violations of these recommendations on the part of others!

EXERCISES

9.1 Define a view consisting of supplier-number : part-number pairs for suppliers and parts that aren’t colocated. Give both Tutorial D and SQL definitions.

9.2 Let view LSSP be defined as follows (SQL):

CREATE VIEW LSSP
  AS ( SELECT SNO , SNAME , STATUS , PNO , QTY
       FROM   S NATURAL JOIN SP
       WHERE  CITY = 'London' ) ;

Here’s a query on this view:

SELECT DISTINCT STATUS , QTY
FROM   LSSP
WHERE  PNO IN
     ( SELECT PNO
       FROM   P
       WHERE  CITY <> 'London' )

What might the query that’s actually executed on the underlying base tables look like?

9.3 What key(s) does view LSSP from Exercise 9.2 have? What’s the predicate for that view?

9.4 Given the following Tutorial D view definition—

VAR HP VIRTUAL ( P WHERE WEIGHT > 14.0 ) KEY { PNO } ;

—show the converted form after the substitution procedure has been applied for each of the following expressions and statements:

  1. HP WHERE COLOR = 'Green'

  2. ( EXTEND HP : { W := WEIGHT + 5.3 } ) { PNO , W }

  3. INSERT HP RELATION { TUPLE { PNO 'P9' , PNAME 'Screw' , WEIGHT 15.0 ,
                                 COLOR 'Purple' , CITY 'Rome' } } ;

  4. DELETE HP WHERE WEIGHT < 9.0 ;

  5. UPDATE HP WHERE WEIGHT = 18.0 : { COLOR := 'White' } ;

9.5 Give SQL solutions to Exercise 9.4.

9.6 Give as many reasons as you can think of for wanting to be able to declare keys for a view.

9.7 Using either the suppliers-and-parts database or any other database you happen to be familiar with, give some further examples (over and above the London vs. non London suppliers example, that is) to illustrate the point that which relvars are base and which virtual is largely arbitrary.

9.8 In the body of the chapter, in the discussion of logical data independence, I discussed the possibility of restructuring—i.e., changing the logical structure of—the suppliers-and-parts database by replacing base relvar S by two of its restrictions (LS and NLS). However, I also noted that such a replacement wasn’t a completely trivial matter. Why not?

9.9 Investigate any SQL product available to you:

  1. Are there any apparently legitimate queries on views that fail in that product? If so, state as precisely as you can which ones they are. What justification does the vendor offer for failing to provide full support?

  2. What updates on what views does that product support? Be as precise as you can in your answer. Are the view updating rules in that product identical to those in the SQL standard?

  3. More generally, in what ways—there will be some!—does that product violate The Principle of Interchangeability?

9.10 Distinguish between views and snapshots. Does SQL support snapshots? Does any product that you’re aware of?

9.11 What’s a “materialized view”? Why is the term deprecated?

9.12 Consider the suppliers-and-parts database, but ignore relvar P for simplicity. Here in outline are two possible designs for suppliers and shipments:

  1. S  { SNO , SNAME , STATUS , CITY }
    SP { SNO , PNO , QTY }

  2. SSP { SNO , SNAME , STATUS , CITY , PNO , QTY }
    XSS { SNO , SNAME , STATUS , CITY }

Design a. is as usual. In Design b., by contrast, relvar SSP contains a tuple for every shipment, giving the applicable part number and quantity and full supplier details, and relvar XSS contains supplier details for suppliers who supply no parts at all. (Are these designs information equivalent?) Write view definitions to express Design b. as views of Design a. and vice versa. Also, show the applicable constraints for each design. Does either design have any obvious advantages over the other? If so, what are they?

9.13 Following on from the previous exercise: In the body of the chapter, I said two database designs were information equivalent if they represented the same information (meaning that for every query on one, there’s a logically equivalent query on the other). But can you pin down this notion more precisely?

9.14 Views are supposed to provide logical data independence. But didn’t I say in Chapter 6 that a hypothetical mechanism called “public tables” was supposed to perform that task? How do you account for the discrepancy?

ANSWERS

9.1

VAR NON_COLOCATED VIRTUAL
    ( ( S { SNO } JOIN P { PNO } ) NOT MATCHING ( S JOIN P ) )
    KEY { SNO , PNO } ;

CREATE VIEW NON_COLOCATED
  AS ( SELECT SNO , PNO
       FROM   S , P
       WHERE  S.CITY <> P.CITY
       /* UNIQUE ( SNO , PNO ) */ ) ;

9.2 Substituting the view definiting expression for the view reference in the outer FROM clause, we obtain:

SELECT DISTINCT STATUS , QTY
FROM ( SELECT SNO , SNAME , STATUS , PNO , QTY
       FROM   S NATURAL JOIN SP
       WHERE  CITY = 'London' ) AS LSSP
WHERE PNO IN
     ( SELECT PNO
       FROM   P
       WHERE  CITY <> 'London' )

This simplifies (potentially!) to:

SELECT DISTINCT STATUS , QTY
FROM   S NATURAL JOIN SP
WHERE  CITY = 'London'
AND    PNO IN
     ( SELECT PNO
       FROM   P
       WHERE  CITY <> 'London' )

9.3 The sole key is {SNO,PNO}. The predicate is: Supplier SNO is under contract, is named SNAME, has status STATUS, is located in London, and supplies part PNO in quantity QTY.

9.4 Note that a. and b. are expressions, the rest are statements.

  1. ( P WHERE WEIGHT > 14.0 ) WHERE COLOR = 'Green'

    This expression can be simplified to:

    P WHERE WEIGHT > 14.0 AND COLOR = 'Green'

    The simplification is worth making, too, because the first formulation implies (or at least suggests) two passes over the data while the second implies just one.

  2. ( EXTEND ( P WHERE WEIGHT > 14.0 ) :
                     { W := WEIGHT + 5.3 } ) { PNO , W }

  3. INSERT ( P WHERE WEIGHT > 14.0 )
           RELATION { TUPLE { PNO 'P9' , PNAME 'Screw' , WEIGHT 15.0 ,
                                         COLOR 'Purple' , CITY 'Rome' } } ;

    Observe that this INSERT is logically equivalent to a relational assignment in which the target is specified as something other than a simple relvar reference. The ability to update views implies that such assignments must indeed be legitimate, both syntactically and semantically, although the corresponding syntax isn’t currently supported in Tutorial D (neither for assignment in general nor for INSERT in particular). Note: Similar but not identical remarks apply to parts d. and e. below.

  4. DELETE ( P WHERE WEIGHT > 14.0 ) WHERE WEIGHT < 9.0 ;

    This syntax is currently illegal, although oddly enough the following (which is obviously logically equivalent to that just shown) is legal:

    DELETE P WHERE WEIGHT > 14.0 AND WEIGHT < 9.0 ;

    Of course, this DELETE is actually a “no op,” because WEIGHT > 14.0 AND WEIGHT < 9.0 is a logical contradiction. Do you think the optimizer would be able to recognize this fact?

  5. UPDATE ( P WHERE WEIGHT > 14.0 ) WHERE WEIGHT = 18.0 :
                                                  { COLOR := 'White' } ;

    Again this syntax is currently illegal, but the following is legal:

    UPDATE P WHERE WEIGHT > 14.0 AND WEIGHT = 18.0 :
                                            { COLOR := 'White' } ;

    Do you think the optimizer would be able to recognize the fact that the restriction condition WEIGHT > 14.0 here can be ignored?

9.5 Here first is an SQL version of the view definition from Exercise 9.4:

CREATE VIEW HP AS
    ( SELECT PNO , PNAME , COLOR , WEIGHT , CITY
      FROM   P
      WHERE  WEIGHT > 14.0
      /* UNIQUE ( PNO ) * / ) ;

For parts a.-e., I first show an SQL analog of the Tutorial D formulation, followed by the expanded form:

  1. SELECT HP.PNO , HP.PNAME , HP.COLOR , HP.WEIGHT , HP.CITY
    FROM   HP
    WHERE  HP.COLOR = 'Green'

    SELECT HP.PNO , HP.PNAME , HP.COLOR , HP.WEIGHT , HP.CITY
    FROM ( SELECT PNO , PNAME , COLOR, WEIGHT , CITY
           FROM   P
           WHERE  WEIGHT > 14.0 ) AS HP
    WHERE  HP.COLOR = 'Green'

    I leave further simplification, here and in subsequent parts, as a subsidiary exercise (barring explicit statements to the contrary).

  2. SELECT PNO , WEIGHT + 5.3 AS W
    FROM   HP

    SELECT HP.PNO , HP.WEIGHT + 5.3 AS W
    FROM ( SELECT P.PNO , P.PNAME , P.COLOR , P.WEIGHT , P.CITY
           FROM   P
           WHERE  P.WEIGHT > 14.0 ) AS HP

  3. INSERT INTO HP ( PNO , PNAME , WEIGHT , COLOR , CITY )
           VALUES ( 'P9' , 'Screw' , 15.0 , 'Purple' , 'Rome' ) ;

    INSERT INTO ( SELECT P.PNO , P.PNAME , P.WEIGHT , P.COLOR , P.CITY
                  FROM   P
                  WHERE  P.WEIGHT > 14.0 ) AS HP
           VALUES ( 'P9' , 'Screw' , 15.0 , 'Purple' , 'Rome' ) ;

    The remarks regarding Tutorial D in the solution to Exercise 9.4c apply here also, mutatis mutandis.

  4. DELETE FROM ( SELECT P.PNO , P.PNAME , P.COLOR , P.WEIGHT , P.COLOR
                  FROM   P
                  WHERE  P.WEIGHT > 14.0 ) AS HP
    WHERE HP.WEIGHT < 9.0 ;

    This transformed version isn’t valid SQL syntax, but this time a valid equivalent is a little easier to find:

    DELETE FROM P WHERE WEIGHT > 14.0 AND WEIGHT < 9.0 ;

    (As noted in the answer to Exercise 9.4d, this DELETE is actually a “no op.”)

  5. UPDATE ( SELECT P.PNO , P.PNAME , P.COLOR , P.WEIGHT , P.COLOR
             FROM   P
             WHERE  P.WEIGHT > 14.0 ) AS HP
    SET    COLOR = 'White'
    WHERE  HP.WEIGHT = 18.0 ;

    Syntactically valid equivalent:

    UPDATE P
    SET    COLOR = 'White'
    WHERE  WEIGHT = 18.0 AND WEIGHT > 14.0 ;

9.6 Here are some:

  • If users are to operate on views instead of base relvars, it’s clear that those views should look to the user as much like base relvars as possible. In accordance with The Principle of Interchangeability, in fact, the user shouldn’t have to know they’re views at all but should be able to treat them as if they were base relvars. And just as the user of a base relvar needs to know what keys that base relvar has, so the user of a view needs to know what keys that view has. Explicitly declaring those keys is the obvious way to make that information available.

  • The DBMS might be unable to infer keys for itself (this is almost certainly the case, in general, with SQL products on the market today). Explicit declarations are thus likely to be the only means available (to the DBA, that is) of informing the DBMS, as well as the user, of the existence of such keys.

  • Even if the DBMS were able to infer keys for itself, explicit declarations would at least enable the system to check that its inferences and the DBA’s explicit specifications were consistent.

  • The DBA might have some knowledge that the DBMS doesn’t, and might thus be able to improve on the DBMS’s inferences.

  • As shown in the body of the chapter, such a facility could provide a simple and convenient way of stating certain important constraints that could otherwise be stated only in some circumlocutory fashion.

Subsidiary exercise: Which if any of the foregoing points do you think apply not just to key constraints in particular but to integrity constraints in general?

9.7 One example is as follows: The suppliers relvar is equal to the join of its projections on {SNO,SNAME}, {SNO,STATUS}, and {SNO,CITY}—just so long as appropriate constraints are in force, that is. (What are those constraints exactly?) So we could make those projections base relvars and make the join a view. See also the answer to Exercise 9.12.

9.8 Here are some pertinent observations. First, the replacement process itself involves several steps, which might be summarized as follows:

/* define the new base relvars: */

VAR LS BASE RELATION
  { SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR }
  KEY { SNO } ;

VAR NLS BASE RELATION
  { SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR }
  KEY { SNO } ;

/* copy the data to the new base relvars: */

LS  := ( S WHERE CITY = 'London' ) ;

NLS := ( S WHERE CITY ≠ 'London' ) ;

/* drop the old relvar: */

DROP VAR S ;

/* create the desired view: */

VAR S VIRTUAL ( LS D_UNION NLS ) KEY { SNO } ;

Now we must do something about the foreign key in relvar SP that references the old base relvar S. Clearly, it would be best if that foreign key could now simply be taken as referring to the view S instead.19 However, if this is impossible (as it typically is in today’s products), then we might want to define another base relvar as follows:

VAR SS BASE RELATION { SNO CHAR } KEY { SNO } ;

And populate this relvar:

SS := S { SNO } ;

(This assignment assumes that relvar S hasn’t been dropped yet. Alternatively, we could assign to SS the union of LS{SNO} and NLS{SNO}.)

Now we need to add the following foreign key specification to the definitions of relvars LS and NLS:

FOREIGN KEY { SNO } REFERENCES SS

Finally, we must change the specification for the foreign key {SNO} in relvar SP to refer to SS instead of S.

9.9 a. No answer provided—except to note that if it’s hard to answer the question for some product, then that very fact is part of the point of the exercise in the first place. b. As for part a., but more so. c. Ditto.

9.10 For the distinction, see the body of the chapter. SQL doesn’t support snapshots at the time of writing. (It does support CREATE TABLE AS—see the last part of the answer to Exercise 1.16 in Chapter 1—which allows a base table to be initialized when it’s created, but CREATE TABLE AS has no REFRESH option.)

9.11 “Materialized view” is a deprecated term for a snapshot. The term is deprecated because it muddies concepts that are logically distinct and ought to be kept distinct—by definition, views simply aren’t materialized, so far as the relational model is concerned—and it’s leading us into a situation in which we no longer have a clear term for a concept that we did have a clear term for, originally. It should be firmly resisted. (I realize I’ve probably already lost this battle, but I’m an eternal optimist.) In fact, I’m tempted to go further; it seems to me that people who advocate use of the term “materialized view” are betraying their lack of understanding of the relational model in particular and the distinction between model and implementation in general.

9.12 First, here’s a definition of Design b. in terms of Design a. (pertinent constraints included):

VAR SSP VIRTUAL ( S JOIN SP )
    KEY { SNO , PNO } ;

VAR XSS VIRTUAL ( S NOT MATCHING SP )
    KEY { SNO } ;

CONSTRAINT B_FROM_A IS_EMPTY ( SSP { SNO } JOIN XSS { SNO } ) ;

(Constraint B_FROM_A and the specified key constraints are together what we would have to tell the user if we wanted the user to think of relvars SSP and XSS as base relvars, not views.) And here’s a definition of Design a. in terms of Design b.:

VAR S VIRTUAL ( XSS D_UNION SSP { ALL BUT PNO , QTY } )
    KEY { SNO } ;

VAR SP VIRTUAL ( SSP { SNO , PNO , QTY } )
    KEY { SNO , PNO } ;

CONSTRAINT A_FROM_B IS_EMPTY ( SP NOT MATCHING S ) ;

Given these constraints, the designs are information equivalent. But Design a. is superior, because the relvars in that design are in fifth normal form (5NF). By contrast, relvar SSP in Design b. isn’t even in second normal form; as a consequence, it displays redundancy and is thereby subject to certain “update anomalies.” Consider also what happens with Design b. if some supplier ceases to supply any parts, or used not to supply any but now does. Further discussion of the problems with Design b. is beyond the scope of this book; I just note that (as the example suggests) database design disciplines like normalization can help with the task of choosing “the best” design from a set of designs that are information equivalent.

Incidentally, I note in passing that—given that {SNO} is a key for relvar S—constraint A_FROM_B here shows another way of formulating a referential constraint. In practice, of course, it would be simpler just to include the following foreign key specification as part of the definition of relvar SP:

FOREIGN KEY { SNO } REFERENCES S

9.13 The following discussion relies on the fact that (as Appendix A explains in more detail) databases are really variables—i.e., we really need to draw a distinction between database values and database variables, analogous to that between relation values and relation variables. Let DBD1 and DBD2 be (logical) database designs; let DB1 and DB2 be database variables conforming to DBD1 and DBD2, respectively; and let db1 and db2 be the current values of DB1 and DB2, respectively. Further, let there exist mappings M12 and M21—i.e., sequences of relational algebra operations, loosely speaking—that transform db1 into db2 and db2 into db1, respectively. Then db1 and db2 are information equivalent, meaning that for every expression involving only relations from db1, there’s an expression involving only relations from db2 that evaluates to the same result (and vice versa).

Now let database variables DB1 and DB2 be such that for every possible value db1 of DB1 there exists an information equivalent value db2 of DB2 (and vice versa). Then DB1 and DB2 per se are information equivalent, as are the corresponding designs DBD1 and DBD2.

Now let database variables DB1 and DB2, as well as their current values db1 and db2, be information equivalent. Let U1 be an update on DB1 that transforms db1 into db1′. Then there must exist an update U2 on DB2 that transforms db2 into db2′, such that db1′ and db2′ are information equivalent. Note that the remarks of this paragraph apply in particular to the case in which DB1 consists only of base relvars and DB2 consists only of views of relvars in DB1.

Finally, let database variables DB1 and DB2, as well as their current values db1 and db2, not be information equivalent. Then there must exist an expression involving only relations from db1 with no counterpart involving only relations from db2 (or vice versa), and there must exist an update on DB1 with no counterpart on DB2 (or vice versa) —speaking somewhat loosely in both cases. Again note that the remarks of this paragraph apply in particular to the case in which DB1 consists only of base relvars and DB2 consists only of views of relvars in DB1.

9.14 (You might want to review the section “The Reliance on Attribute Names” in Chapter 6 before reading this answer.) Yes, views should indeed have been sufficient to solve the logical data independence problem. But the trouble with views as conventionally understood is that a view definition specifies both the application’s perception of some portion of the database and the mapping between that perception and the database “as it really is.” In order to achieve the kind of data independence I’m talking about here, those two specifications need to be kept separate (and the mapping specification in particular needs to be hidden from the user).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset