SQL and Relational Theory

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Footnotes

Preface to the First Edition

¹ In response to reader requests, in the third edition I’ve moved the answers that are specific to a given chapter to the end of the chapter in question and deleted the old Appendix F.

Preface to the Second Edition

² It was published in 2012.

³ In this connection, I’d like to acknowledge the contribution of a reader of the first edition, Thomas Uhren, who found an embarrassingly large number of errors. I’ll try harder in future. I promise.

⁴ Note that the name Tutorial D is always set in boldface.

Chapter 1: Setting the Scene

¹ There’s at least one pundit who doesn’t. The following is a direct quote from a document purporting (like this book!) to offer advice to SQL users: “Don’t use joins ... Oracle and SQL Server have fundamentally different approaches to the concept ... You can end up with unexpected result sets ... You should understand the basic types of join clauses ... Equijoins are formed by retrieving all the data from two separate sources and combining it into one large table ... Inner joins are joined on the inner columns of two tables. Outer joins are joined on the outer columns of two tables. Left joins are joined on the left columns of two tables. Right joins are joined on the right columns of two tables.” Your comment here.

² The standard has been through several versions, or editions, over the years. The version current at the time of writing is SQL:2011 (a formal reference for which can be found in Appendix G); the previous version was SQL:2008, the one before that was SQL:2003, the one before that was SQL:1999, and the one before that was SQL:1992. Most of the SQL features discussed in this book were present in SQL:1992, and often in even earlier versions.

³ These particular limitations were added in SQL:2003; they didn’t apply to SQL:1992, which is where explicit JOIN invocations were first introduced, nor to SQL:1999.

⁴ Strictly speaking, this sentence should read “Every relvar has at least one candidate key” (see the section “Relations vs. Relvars,” later). A similar remark applies elsewhere in this chapter as well. Exercise 1.1 at the end of the chapter addresses this issue.

⁵ See the answers to Exercises 5.27 in Chapter 5 and 7.23 in Chapter 7 for further discussion of this convention.

⁶ This definition is deliberately somewhat simplified. A better definition can be found in Chapter 5.

⁷ I’m sorry—I’m trying to suspend disbelief, but I do find it difficult ...The fact is, as we’ll see in Chapters 3 and 4, a tuple with “nothing at all” as the “value” of some component simply isn’t a tuple in the first place, and that really ought to be the end of the discussion. But let’s soldier on.

⁸ Except that Codd additionally defined an operator called divide. I’ll explain in Chapter 7 why I omit that operator here.

⁹ More than one reviewer observed that this sentence didn’t seem to make sense (how can a system be used as a method?). Well, if you’re too young to be familiar with the term access method, then I envy you; but the fact is, that term, inappropriate though it certainly was (and is), was widely used for many years to mean a simple record level I/O facility, of one kind or another.

¹⁰ It’s also used in connection with keys (see Chapter 5).

¹¹ “First” normal form because, as I’m sure you know, it’s possible to define a series of higher normal forms—second normal form, third normal form, and so on—that are relevant to the discipline of database design.

¹² What I’ve described in this paragraph is indeed in accordance with standard scientific practice; however, you might have encountered a different convention in less formal contexts. To be specific, some people use “B is a subset of A” to mean what I mean when I say B is a proper subset of A, and use “B is a subset of or equal to A” to mean what I mean when I say B is a subset of A. Similarly for supersets, of course, mutatis mutandis.

¹³ You might be thinking this claim can’t be 100 percent true for update operations. If so, you might be right as far as today’s SQL products are concerned; nevertheless, I still claim it’s true in principle. See the section “Update Operations” in Chapter 9 for further discussion.

¹⁴ Of course, this specific scheme will be inadequate if there can exist suppliers (like supplier S5 in Fig. 1.3) who currently supply no parts.

¹⁵ I say this knowing full well that the majority of today’s SQL products do provide a variety of options for hashing, partitioning, indexing, clustering, and otherwise organizing the data as it appears in physical storage. Despite this state of affairs, I still consider the mapping from base tables to physical storage in those products to be fairly direct. (For that very reason, in fact, elsewhere I’ve labeled those products “direct image systems.” For further explanation see my book Go Faster! The TransRelational™ Approach to DBMS Implementation, discussed briefly in Appendix G.)

¹⁶ Several reviewers complained about this fact—that is, they felt I should be using SQL itself, not some nonstandard language like Tutorial D, in order to illustrate relational ideas. (One even suggested the book be renamed “Tutorial D and Relational Theory”!) But SQL as such was never intended to be a vehicle for illustrating relational ideas, while Tutorial D explicitly was; and in any case, SQL simply isn’t adequate to the task. Indeed, if it were, a book like this one wouldn’t be necessary.

¹⁷ So the relational community isn’t perfect, either! But SQL makes essentially the same mistake, of course, because it too has just one term, table, that sometimes has to be understood as meaning a table value and sometimes a table variable.

¹⁸ Note that tuple ordering does indeed constitute a way of representing information—namely, by position; that is, the fact that a given tuple appears here and not there certainly does represent information, of some kind.

Chapter 2: Types and Domains

¹ And then only rational numbers of the first kind, such as 3/8.

² DBMS = database management system. Note that there’s a logical difference between a DBMS and a database! Unfortunately, the industry very commonly uses the term database when what it means is either some DBMS product, such as Oracle, or the particular copy of such a product that happens to be installed on a particular computer. I do not follow this usage in this book. The problem is, if you call the DBMS a database, what do you call the database?

³ DBA = database administrator.

⁴ This observation is valid regardless of whether we’re in an SQL context, as in the present discussion, or otherwise—but I should make it clear that selectors in SQL aren’t as straightforward as they might be, and selector as such isn’t an SQL term. I should also make it clear that selectors have nothing to do with the SQL SELECT operator. (Come to that, they also have nothing to do with the restriction operator of relational algebra, which people sometimes refer to as selection.)

⁵ Again this observation is valid regardless of whether we’re in an SQL context or some other context—though (as with selectors) THE_ operators in SQL aren’t as straightforward as they might be, and “THE_ operator” as such isn’t an SQL term. (I note too that some types might have more than one associated THE_ operator. See Chapter 8 for further discussion.)

⁶ If you’re not familiar with orthogonality as an important language design principle, you can read about it in “A Note on Orthogonality” in my book Relational Database Writings 1994-1997 (Addison-Wesley, 1998).

⁷ Observe that I don’t claim R3 is well designed—indeed, it probably isn’t—but that’s not the point. I’m concerned here with what’s legal, not with questions of good design. The design of R3 is legal.

⁸ In case you’re wondering, the difference is that sets in general can contain anything, but relations contain tuples. Note, however, that a relation certainly resembles a general set inasmuch as it too can be regarded as a single value.

⁹ The individual elements in an SQL multiset don’t have to be rows but can be values of any available SQL type—for example, integers. The same goes for arrays as well.

¹⁰ Throughout this book I treat declared and defined as synonymous.

¹¹ The logical difference between type and representation is important here. To spell the matter out, the operators associated with type T are the operators associated with type T as such—not the operators associated with the representation of type T. For example, just because the representation for type SNO happens to be CHAR (say), it doesn’t follow that we can concatenate two supplier numbers; we can do that only if concatenation is an operator that’s defined for type SNO. (In fact I did mention exactly this example in passing in the section “Equality Comparisons,” as you might recall.)

¹² This paragraph touches on another important logical difference, incidentally: namely, that between arguments and parameters (see Exercise 2.5 at the end of the chapter). Note too that the POINT selector, unlike the SNO and PNO selectors discussed earlier, takes two arguments (because points are represented by pairs of values, not just by a single value). See Chapter 8 for further discussion.

¹³ The concept might be familiar, but it seems to be quite difficult to find a good definition for it in the literature! See Exercise 2.2.

¹⁴ This sentence is only an approximation to the truth. A more accurate statement would be: Nongenerated types—see later in the present section—are scalar; generated types (e.g., relation types) are typically nonscalar, but don’t have to be. An example of a scalar generated type is the SQL type CHAR(25) (see the next section).

¹⁵ Note that it does make sense to talk about the heading of a tuple—tuples have headings just as relations do, as will be explained in more detail in the next chapter.

¹⁶ True bit string types—BIT(n) and BIT VARYING(n), where n was the length in bits—were introduced in SQL:1992 but dropped again in SQL:2003.

¹⁷ SQL actually calls q simply the scale, but there are good reasons to prefer the term scale factor.

¹⁸ SQL also supports a ROW type generator, as we’ll see in the section “Row and Table Types in SQL,” later. In fact, it also supports ARRAY, MULTISET, and REF (but, oddly enough, not TABLE) as type generators.

¹⁹ Unfortunately, however, that support is severely flawed. First of all, SQL supports coercions (see the section immediately following this one), with the consequence that “=” can give TRUE even when the comparands are of different types. Second, in the case of character string types, it’s possible for “=” to give TRUE even when the comparands are of the same type but clearly distinct (see the next section but one, “Collations in SQL”). And it’s also possible—for all types, not just character string types— for “=” not to give TRUE even when the comparands aren’t distinguishable; in particular, this happens when (but not only when) the comparands are both null. Also, for certain types not discussed in detail in this book, including type XML and certain user defined types, “=” isn’t defined at all. Note: This list of flaws in SQL’s support for “=” is not complete.

²⁰ One reviewer suggested that the “strangeness” of the union in this example might not matter in practice, since at least no information has been lost in the result. Well, that observation might be valid, in this particular example; I don’t want to argue the point. But if the SQL language designers want to define an operator that manifestly doesn’t behave like the union operator of the relational model (or the union operator of set theory, come that), then it seems to me that, first, it doesn’t help the cause of understanding to call that operator “union”; second (and rather more important), it isn’t up to me to show that such a “union” can sometimes cause problems—rather, it’s up to those language designers to show that it can’t.

²¹ As I’m sure you know, the scalar comparison operators supported by SQL are “=”, “<>” (not equals), “<”, “<=” (less than or equals), “>”, and “>=” (greater than or equals).

²² As a historical note, I remark that in the original (i.e., IBM) version of SQL, the only available collation—which was based on the internal coding scheme, of course—supported PAD SPACE only, and did that only tacitly. The reason for this state of affairs was a desire on the part of the language designers in IBM to conform to the corresponding rules for PL/I.

²³ For some reason SQL refers to the components of row types produced by invocation of the ROW type constructor (and to the components of rows of such types) not as columns but as fields.

²⁴ More generally, the expression n! (which is read as either “n factorial” or “factorial n” and is often pronounced “n bang”) is defined as the product n * (n-1) * ... * 2 * 1.

²⁵ Perhaps I should elaborate briefly on what I mean by the term pointer. A pointer is a value (an address, essentially) for which certain special operators—notably certain referencing and dereferencing operators—are, and in fact must be, defined. Here are rough definitions of those operators: (a) Given a variable V, the referencing operator applied to V returns the address of V; (b) given a value v of type pointer (i.e., an address), the dereferencing operator applied to v returns the variable that v points to (i.e., the variable located at the given address).

²⁶ A much more extensive discussion of the logical difference between foreign keys and pointers can be found in the paper “Inclusion Dependencies and Foreign Keys” (see Appendix G).

Chapter 3: Tuples and Relations, Rows and Tables

¹ In previous editions of this book, headings were denoted {H} instead of H.

² Indeed, this ambivalence is reflected in the standard’s very definition of the concept, which reads as follows: “null value: A special value that is used to indicate the absence of any data value.” In other words: Null is a value that means there isn’t a value.

³ Strictly speaking, I shouldn’t be talking about assignments of any kind in this chapter, because assignment has to do with variables and this chapter is concerned with values, not variables. But it’s convenient to include at least this brief mention of SQL row assignment here.

⁴ But see Exercise 3.15.

⁵ Why exactly is the definite article correct here (“the” row)?

⁶ Indeed, I think it could be argued that one reason we hear so much about the need for “multidimensional databases” (for decision support applications in particular) is precisely because so many people fail to realize that relations are multidimensional already.

⁷ The Tutorial D expression r{A,B,…,C} denotes the projection of relation r on attributes A, B, …, C. See Chapter 6 for further discussion.

⁸ In other words, the expression IS_EMPTY(r) is logically equivalent to both of the following: (a) r = r WHERE FALSE; (b) r{} = TABLE_DUM. Note: Regarding the second of these, see the section immediately following.

⁹ Here’s the pertinent quote from the standard: “Two [bags] A and B are distinct if there exists a value V in the element type of A and B, including the null value [sic], such that the number of elements in A that are not distinct from V does not equal the number of elements in B that are not distinct from V.” I hope that’s perfectly clear! Note that the extract quoted does indeed define what it means for two bags to be equal, because—simplifying considerably—if A and B aren’t distinct in SQL terms, then they must be equal. Note: SQL also has direct support for equality testing on arrays.

¹⁰ Note carefully that (as mentioned in Chapter 2) a bag of rows in SQL isn’t the same thing as an SQL table, because it can’t be operated upon by means of SQL’s regular table operators.

¹¹ What’s more, the standard doesn’t guarantee that the single column, in each of those two bags of rows resulting from the two TABLE invocations in the example, has any prescribed column name (see footnote 12, following). In particular, it doesn’t guarantee that the column name in question is CITY. Of course, this fact is probably insignificant in the present context, but it could easily be very significant indeed in other contexts.

¹² It’s certainly true to say that SQL fails in this latter case to enforce the rule against duplicate column names. However, it’s not quite true to say it fails to enforce the rule against anonymous columns—if some column would otherwise have no name, the DBMS is supposed to give that column a name that’s unique within its containing table but is otherwise “implementation dependent” (see Chapter 12). In practical terms, however, there’s no real difference between saying something is implementation dependent and saying it’s undefined. Calling such columns anonymous is thus not too far from the truth.

¹³ For this reason, in fact, I always show the keyword explicitly, even when it’s not required. It can be hard to remember when keywords are optional in SQL and when they’re mandatory. And in any case it would surely seem strange, in the case of AS in particular, to talk about something being an “AS specification” if there isn’t any AS.

¹⁴ Another useful informal characterization is this: Relations r1 and r2 represent the same information if and only if, for any query q1 that can be addressed to r1, there’s a corresponding query q2 that can be addressed to r2 that produces the same result (and vice versa). What’s more, this notion can readily be extended to sets of relations, thus: Let s1 and s2 be sets of relations. Then s1 and s2 represent the same information if and only if, for any query q1 that can be addressed to the relations in s1, there’s a corresponding query q2 that can be addressed to the relations in s2 that produces the same result (and vice versa). For further discussion, see Exercise 9.13 in Chapter 9.

Chapter 4: No Duplicates, No Nulls

¹ One reviewer felt strongly that an even more powerful practical argument (in fact, the most practical argument of all) is simply that duplicates don’t match reality—a database that permits duplicates just hasn’t been designed properly and can’t be, as I put it in Chapter 1, “a faithful model of reality.” I’m very sympathetic to this position. But this book isn’t about database design, and in any case duplicates are much more than just a design issue. Thus, what I’m trying to do here is show the problems duplicates can cause, regardless of whether they’re due to bad design. A detailed analysis of this whole issue, design aspects included, can be found in the paper “Double Trouble, Double Trouble” (see Appendix G).

² Here’s as good a place as any to stress the point that—contrary to common practice in the industry, perhaps— my use of the unqualified term “optimization” (and related terms) in this book always refers to something the DBMS is responsible for, not something the user has to do. In other words, I’m not talking about what’s sometimes called “manual optimization.”

³ I once quoted this line in a seminar, and an attendee said “You can say that again!” To which I replied “Yes—there’s a logical difference between logic and rhetoric.”

⁴ See the section “Summarization” in Chapter 7 for an explanation of why I generally set the SQL phrase “set function” in quotation marks.

⁵ The implication is that SELECT DISTINCT might take longer to execute than SELECT ALL, even if that DISTINCT is effectively a “no op.” Well, that might be so; I don’t want to labor the point; I’ll just observe that the reason those implementations typically can’t optimize away unnecessary DISTINCTs is that they don’t understand how key inference works (i.e., they can’t figure out the keys that apply to the result of an arbitrary table expression). This latter issue is explored in depth in a paper by Hugh Darwen, “The Role of Functional Dependence in Query Decomposition” (see Appendix G). Note: Let me add as a point of history that the explicit ALL in SELECT ALL wasn’t added to SQL until long after the language was first defined. So the original default wasn’t really SELECT ALL but just SELECT, making it that much easier to fall into the various traps that duplicates can cause—as well as suggesting, albeit subtly and psychologically, that by specifying something extra (DISTINCT), you were explicitly asking the system to do something extra, thereby inevitably getting worse performance.

⁶ A more accurate statement is: If the boolean expression in a WHERE clause evaluates to UNKNOWN, that UNKNOWN gets coerced to FALSE. Incidentally, it’s interesting to note that, by contrast, if the boolean expression in a CHECK clause—i.e., in a CREATE ASSERTION statement (see Chapter 8) —evaluates to UNKNOWN, that UNKNOWN gets coerced not to FALSE but to TRUE! This state of affairs (this inconsistency, rather) might reasonably be regarded as yet another nail in the nulls coffin. See the answer to Exercise 8.21g in Chapter 8 for further discussion.

⁷ I’m relying here on the fact that (as noted earlier) the intended interpretation of null is value unknown, from which it follows that the fact that “the CITY is null” for part P1 means part P1 does have some city, but we don’t know what it is. (In fact, if part P1 had no city at all—i.e., if the property of having a city didn’t apply to part P1—then that part shouldn’t have been mentioned in the table in the first place. See the discussion of relvar predicates in Chapter 5.)

⁸ SQL’s UNION JOIN operator, which was a flawed attempt to support an already flawed operator called outer union, was introduced in SQL:1992 and dropped again in SQL:2003.

⁹ It’s not respectable because the result has no proper predicate. Again, see the discussion of such matters in Chapter 5, also (especially) Appendix C.

¹⁰ Useful, that is, if we buy into the notion that 3VL as such is useful, which of course I don’t.

¹¹ If {SNO,PNO} is the primary key for shipments, then columns SP.SNO and SP.PNO couldn’t permit nulls without violating the entity integrity rule. So in case such a possibility bothers you (it doesn’t bother me, because I don’t believe in that rule anyway), let me change the example slightly; let me introduce another column, SHIPNO (shipment number), into the shipments table, and let me make {SHIPNO} the primary key. Then {SNO,PNO} will still be a key, but it won’t be the primary key, and the entity integrity rule therefore won’t apply. (Incidentally, the very fact that the entity integrity rule is supposed to apply only to primary keys and not to keys in general seems to me to be another reason to regard that rule with suspicion. Not to mention the fact that it’s also supposed to apply only to base tables and not to tables in general, which I think makes it even more suspect still.)

¹² Note that the standard uses True, Unknown, and False in prose discussions but TRUE, UNKNOWN, and FALSE in its SQL grammar.

¹³ Note that one way of distinguishing them might simply be by position—a is here and b is over there.

¹⁴ Note that the dyadic tables are shown here in a style slightly different from that used in the body of the chapter. Both styles are acceptable, but (as will be noted again in Chapter 10) sometimes one style is more convenient, sometimes the other is.

Chapter 5: Base Relvars, Base Tables

¹ For another argument against updating through cursors, see Exercise 4.5 in Chapter 4.

² There’s at least one product that doesn’t (at least, not 100 percent), because it does what it calls inflight checking. See Chapter 8 for further discussion.

³ The common special case DELETE R WHERE bx can be thought of as shorthand for DELETE R (R WHERE bx).

⁴ Even though this tactic—i.e., specifying the option—does fix the problem at hand, observe that it does still involve left to right column ordering and thus is still somewhat nonrelational in spirit. As Hugh Darwen once remarked to me (in a private communication): “The syntax of a language should in all places be in the spirit of that language. Then it’s easier to learn, because people get to know what to expect. A proper relational language attaches no significance to column ordering. Not anywhere.”

⁵ On the other hand, it does make sense to say of some relation that it either does or does not satisfy some key constraint. We might even go further and say, a trifle sloppily, that a relation that satisfies a given key constraint actually “has” the key in question—though such a manner of speaking is likely to cause confusion, and I wouldn’t recommend it.

⁶ Also known as functional dependence. The terms dependence and dependency are used interchangeably in the literature (and in this book), in contexts such as the one at hand.

⁷ Columns A1, A2, ..., An must be the columns named in some UNIQUE or PRIMARY KEY specification in the definition of table T1, but they don’t have to appear in that UNIQUE or PRIMARY KEY specification in the same sequence as they do in the FOREIGN KEY specification for table T2. Moreover, they, and the parentheses surrounding them, can be omitted entirely from this latter specification—but if so, then (a) they must appear in a PRIMARY KEY specification, not a UNIQUE specification, for table T1, and (b) they must appear in that specification in the appropriate sequence.

⁸ In case you’re wondering about the SQL terminology here, ON DELETE CASCADE is a “referential triggered action” and CASCADE by itself is a “referential action.”

⁹ Or the relational calculus. Either way, it comes to the same thing (see Chapter 10).

¹⁰ Recall from Chapter 3 that the expression n! (which is read as either “n factorial” or “factorial n” and is often pronounced “n bang”) is defined as the product n * (n-1) * ... * 2 * 1.

¹¹ Or even CITY is the name of somebody’s favorite teddy bear. There’s nothing in the relvar definition to say that CITY has to denote a city.

Chapter 6: SQL and Relational Algebra I: The Original Operators

¹ Actually, we’ll be meeting some operators later in this chapter—n-adic join, q.v., is a case in point—that are allowed to take no relations at all as input, though they do still produce a relation as output.

² This is true in the algebra but not necessarily in SQL. For example, if T1 and T2 are SQL table names, we typically can’t write things like T1 UNION T2—we have to write something like SELECT * FROM T1 UNION SELECT * FROM T2 instead.

³ Again, this is something that’s true in the algebra but not necessarily in SQL. See the BNF grammar for SQL table expressions in Chapter 12.

⁴ The formulation just shown was the only one supported in SQL as originally defined. The other possibilities were added in SQL:1992.

⁵ Here’s a test of your SQL knowledge: For which of these three formulations do the corresponding columns (just the CITY columns, in the example) have to be of the same type?

⁶ With one slight exception (?): Some writers regard relational inclusion (“⊆”) as a relational operation—more specifically, as part of the relational algebra—even though it produces a result that’s a truth value, not a relation. The point isn’t very important, however; certainly it’s not worth fighting over here.

⁷ I assume for the sake of the example that the comparison PNAME > SNAME is a sensible one—though if it is, then attributes PNAME and SNAME must presumably represent “the same kind of information,” and in accordance with my own recommendations in Chapter 3, therefore, I really ought to have given them the same name.

⁸ I remark in passing out that the phrase “duplicate elimination,” which is used almost universally (not just in SQL contexts), would more accurately be duplication elimination.

⁹ A relvar (as opposed to a relation) of such a high degree is perhaps unlikely, since it would almost certainly be in violation of the principles of normalization. But such violations aren’t exactly unknown in practice.

¹⁰ Relations r1, r2, ..., rn must be joinable, of course (see Exercise 6.16). Note: For psychological reasons, Tutorial D also supports n-adic versions of INTERSECT and TIMES, but I’ll skip the details here.

¹¹ As noted in Chapter 4, the SQL “set function” SUM yields null, not zero, if it’s invoked on an empty set of numbers. But this is just a logical mistake on the part of SQL—it has no bearing on the present discussion.

¹² Except that the set of common columns can be empty in Case 1 but not in Case 3.

¹³ Perhaps I should inject a small note of caution here. In practice, it’s very common for SQL tables to have some kind of “comments” column; thus, there’s a risk that NATURAL JOIN might produce unexpected results, unless some appropriate naming discipline is followed (or some appropriate renaming is done) in connection with such columns.

¹⁴ Greater-than join is a special case of what’s called θ-join, which I’ll be discussing later in this chapter.

¹⁵ I omitted CORRESPONDING from examples in earlier chapters because at the time it would only have been a distraction.

¹⁶ In the interest of completeness, I note that omitting the BY option is actually equivalent to specifying BY (C1,C2,...,Cn), where C1, C2, ..., Cn are all of the common columns, in the left to right order in which they appear in the first operand table.

¹⁷ Note, incidentally, that the condition SA < SB wouldn’t be legal if supplier numbers were of some user defined type (SNO, say) and the operator “<” hadn’t been defined in connection with that type.

¹⁸ This particular limitation was added in SQL:2011; it didn’t apply to SQL:1999, which is where WITH specifications were first introduced, nor to SQL:2003.

¹⁹ Except that the expression in question mustn’t be such that it relies on context for its evaluation; in other words, it must be what’s called a closed expression. For example, “S WHERE STATUS = 20” is closed, but “STATUS = 20” isn’t (it’s open instead). Of course, a similar rule applies in SQL also.

²⁰ One reviewer asked why CITY is mentioned in the predicate at all, since it isn’t part of the result of the projection. This is an important question! A short answer is: Because that result is obtained by projecting away the CITY attribute specifically, nothing more and nothing less. A much longer answer can be found in my book Logic and Databases: The Roots of Relational Theory (Trafford, 2007), pages 387-391 (see Appendix G).

²¹ Cost based optimizing is beyond the scope of this book because it has to do with how the data is physically stored, which isn’t a relational issue by definition. But I should at least note that such optimizing is possible in the first place only because (as we saw in Chapter 1) the relational model insists on there being a sharp and rigid distinction between the logical and physical levels of the system, which has the effect among other things of keeping access strategies out of applications.

²² Strictly speaking, the SQL analogs of these operators aren’t commutative, because—among other things—the left to right column order of the result depends on which operand is specified first. Indeed, the disciplines recommended in this book in connection with these operators are designed, in part, precisely to avoid such problems. More generally, the possibility of such problems occurring is one reason out of many why you’re recommended never to write SQL code that relies on column positioning.

²³ We could even, if the query involves (say) four relations r1, r2, r3, and r4, join (e.g.) r1 and r3 first, join r4 and r2 second, and then join the results of those two joins.

²⁴ Actually such an error might not occur in SQL, because SQL permits coercions. But Tutorial D doesn’t, and the observation is certainly true of Tutorial D.

²⁵ It’s from Atsushi Ohori, Peter Buneman, and Val Breazu-Tannen: “Database Programming in Machiavelli—A Polymorphic Language with Static Type Inference,” Proc. ACM SIGMOD International Conference on Management of Data, Portland, Ore. (June 1989).

²⁶ It doesn’t really have a column called S.SNO, either (it has a column called SNO, unqualified, instead); however, there’s a bizarre syntax rule to the effect that the column can be referred to by that dot qualified name anyway, as in the case at hand. (When I say the rule is bizarre, I mean it’s extremely difficult to state precisely, as well as being both counterintuitive and logically incorrect.)

²⁷ In other words, although a column reference of the form “S.SNO” would be legal in the SELECT clause here—see part e. of the exercise—the expanded form of the expression “S.*” in that same context includes no such reference!

²⁸ The CORRESPONDING specification could safely be omitted here—why, exactly?—but it’s easier, and shouldn’t hurt, always to specify CORRESPONDING, even when it’s logically unnecessary.

²⁹ Actually it’s not quite correct to say the SQL expression shown denotes an n-adic join, because (a) for n-adic join, the relations involved are required to be n-way joinable and the sequence in which they’re specified is irrelevant, whereas (b) the SQL expression shown is defined to perform the individual dyadic joins in the order specified (first join t1 and t2, then join the result of that join and t3, and so on).

Chapter 7: SQL and Relational Algebra II: Additional Operators

¹ Recall from Chapter 6 that two relations are joinable if and only if attributes with the same name are of the same type (i.e., are in fact one and the same attribute, formally speaking).

² Also known, a trifle inappropriately (?), as antijoin.

³ And/or ungroup (see later in this chapter).

⁴ More accurately, it defines a corresponding range variable. See Chapter 12 for further explanation.

⁵ As noted elsewhere in this book, in mathematics the expression “n!” (n factorial) is often pronounced “n bang”; hence my choice of pronunciation for the symbol “!!”. Note: If you prefer a syntax that’s based on keywords, you could consider replacing “!!r” by “IMAGE IN r”.

⁶ The seven different operators are discussed in excruciating detail in the paper “A Brief History of the Relational Divide Operator” (see Appendix G).

⁷ Tutorial D doesn’t directly support the original divide operator, and r1 DIVIDEBY r2 is thus not valid Tutorial D syntax.

⁸ If you’re wondering what the logical difference is here, consider the slightly different query “Get suppliers who supply all purple parts” (the point being, of course, that there are no purple parts). If there aren’t any purple parts, then every supplier supplies all of them!—even supplier S5, who supplies no parts at all, and is thus not represented in relvar SP, and so can’t be returned by any analogous DIVIDEBY expression. And if you’re still wondering, then see the further discussion of this example in Chapter 11.

⁹ But nonscalar aggregate operators can be defined too, as we’ll see in the section “GROUP, UNGROUP, and Relation Valued Attributes.”

¹⁰ It might be claimed, somewhat more reasonably, that the COUNT invocations within those expressions are SQL aggregate operator invocations. But the whole point about such invocations is that they can’t appear as “stand alone” expressions in SQL; rather, they can only appear as part of some table expression, because they rely on that expression to identify the table over which the aggregation is to be done. For example, a statement like “SET X = COUNT(*);” would be meaningless in SQL, since it fails to identify the table whose rows are to be counted.

¹¹ But see further remarks on this topic in the next aside, on pages 232-233.

¹² By contrast, as noted in Chapter 4, the SQL analogs of these operators all return null if their argument is empty (except for COUNT and COUNT(*), which do correctly return zero).

¹³ Because of this fact, RENAME can be used to switch the names of attributes, like this: R RENAME {A AS B, B AS A}.

¹⁴ More accurately, it would be understood as defining a range variable that ranges over that table (see Chapter 12).

¹⁵ More tests of your SQL knowledge: In the example under discussion, would it be possible to save keystrokes by using WITH to introduce a name for the common subexpression “SELECT SUM(SPY.QTY) FROM SP AS SPY WHERE SPY.SNO = SPX.SNO”? Also, would it be legal to attach “AS TOTQ” to the appearance of that subexpression within the WHERE clause?

¹⁶ Note that no COALESCE is needed in this example—but why not?

¹⁷ Not to mention the fact that SUMMARIZE involves a syntactic construct that looks a bit like an aggregate operator invocation but isn’t one—which (as pointed out earlier) is a good reason why it might be better to dispense with SUMMARIZE altogether.

¹⁸ As a matter of fact, UNGROUP can also be defined in terms of EXTEND, though the details are rather more complicated than they are for GROUP.

¹⁹ I do not guarantee that these SQL expressions are completely legal, or adequate even if they’re legal—the relevant portions of the SQL standard are extremely difficult to understand. If you seriously want to know more about SQL’s TABLE and UNNEST operators, then I recommend Hugh Darwen’s book SQL: A Comparative Survey (see Appendix G).

²⁰ We’ve met this operator before—see the discussion of table equality comparisons in the section “Tables in SQL” in Chapter 3.

²¹ Note, therefore, that relation r isn’t exactly being “extended” in the usual sense, so it might be nice to find a better keyword than EXTEND for the purpose.

²² Alternatively, we could replace the entire assignment by the multiple assignment “DELETE P s1, INSERT P s2;” (see Chapter 8).

²³ Nothing to do with the closure property of the relational algebra.

²⁴ Actually Tutorial D goes beyond the relational algebra as conventionally understood in that it provides TCLOSE as a built in operator. I show it as a user defined operator here just to show how recursive operators might be defined in Tutorial D.

²⁵ In particular, therefore, it can’t appear in a view definition—despite the fact that at least one well known SQL product allows it to! Note: It’s sometimes suggested—and, sadly, the SQL standard now explicitly supports this idea—that ORDER BY is needed in connection with what are called quota queries, but it isn’t (see Exercise 7.14).

²⁶ I suppose SQL might claim it is defined for rows, as opposed to tuples (again, see Chapter 3).

²⁷ The example thus points up an important difference between RVAs in a relational system and hierarchies in a system like IMS (or XML?). In IMS, the hierarchies are “hardwired into the database,” as it were; in other words, we’re stuck with whatever hierarchies the database designer has seen fit to give us. In a relational system, by contrast, we can dynamically construct whatever hierarchies we want, by means of appropriate operators of the relational algebra.

Chapter 8: SQL and Constraints

¹ As noted in Chapter 5, constraints constrain updates and updates apply to variables, not values, so it does make sense to talk of a constraint “applying to” some variable.

² A minor exception to this rule arises in connection with Tutorial D’s type inheritance support, but that exception need not concern us here.

³ I would much have preferred to use the more formal term object in this sentence in place of the very vague term thing, but object has become a loaded word in computing contexts.

⁴ In SQL, that shorthand would typically involve a specification of the form UNIQUE(SNO) as part of the CREATE TABLE for table S. The semantics of such a specification are explained by the standard as follows (I’ve adapted the standard’s own generic phrasing to apply to the specific case at hand): “The constraint UNIQUE(SNO) is not satisfied if and only if EXISTS (SELECT * FROM S WHERE NOT (UNIQUE (SELECT SNO FROM S))) is true.” I hope that’s perfectly clear. Note the reference to the SQL UNIQUE operator, discussed in the present chapter in the next bullet item.

⁵ But is this SQL formulation valid? As you can see, it involves an equality comparison in which the comparands are denoted by subqueries. Since subqueries evaluate to tables, it appears we’re trying to test two tables for equality—yet we saw in Chapter 3 that SQL doesn’t directly support table equality comparisons. See Exercise 12.5 in Chapter 12 for further discussion.

⁶ Except that (as you’ll recall from Chapter 2) constraints in SQL are supposed not to contain “possibly nondeterministic expressions,” a rule that could cause serious problems in practice. See Chapter 12 for further discussion.

⁷ Not to be confused with attribute constraints (see the end of the section “Type Constraints,” earlier in the chapter).

⁸ Chapter 11 of the book Applied Mathematics for Database Professionals, by Lex de Haan and Toon Koppelaars (highly recommended, by the way), goes into great detail on what’s involved in writing your own constraint enforcement code. See Appendix G.

⁹ The standard reference (also highly recommended) is Transaction Processing: Concepts and Techniques, by Jim Gray and Andreas Reuter. Again, see Appendix G.

¹⁰ Well, it can still be described as a single relvar constraint with the revised design, but that single relvar constraint is a single relvar constraint on a view (view S). See Chapter 9 for a discussion of view constraints in general.

¹¹ The constraint must be stated declaratively, however; obviously there’s no way the optimizer can “understand” and exploit constraints that have been specified procedurally (and so we have here another strong reason for requiring declarative constraint support).

¹² If despite everything I’ve said you’re still bothered by this idea, please refer to the section titled “Eventual Consistency” in Appendix F for further clarification.

¹³ In case you’re wondering how that deferring is done, I should explain that in general—there are some exceptions that don’t need to concern us here—every SQL constraint is declared to be (a) either DEFERRABLE or NOT DEFERRABLE, and if DEFERRABLE then (b) either INITIALLY DEFERRED or INITIALLY IMMEDIATE. Then, at run time, the statement SET CONSTRAINTS <constraint name commalist> <option>, where <option> is either DEFERRED or IMMEDIATE, sets the “mode” of the specified constraint(s) accordingly. (Of course, the constraint(s) in question must have been defined to be DEFERRABLE for SET CONSTRAINTS to apply.) COMMIT forces all DEFERRABLE constraints into immediate mode; if some integrity check then fails, the COMMIT fails, and the transaction is rolled back.

¹⁴ I’m told, however, that this functionality is likely to be provided in some future version of the standard.

¹⁵ Or that supplier S1 used to be located in London, or that supplier S1 has an office in London, or that supplier S1 doesn’t have an office in London, or any of an infinite number of other possible interpretations (corresponding, of course, to an infinite number of possible relvar predicates).

¹⁶ “Unlikely” is right; every relvar is supposed to be subject to a key constraint at the very least.

¹⁷ I don’t mean to suggest here that system enforcement of constraints implies bad performance; in fact, I think it ought to improve performance. (Not to mention the fact that user enforcement is highly nontrivial, and very likely to be incorrect! As mentioned in an earlier footnote, the book by Lex de Haan and Toon Koppelaars, Applied Mathematics for Database Professionals, gives a good idea of what’s involved in such enforcement.) All I mean is, there tends to be a huge emphasis in vendor development effort on performance issues, to the exclusion of other matters such as data integrity.

¹⁸ But see the answer to Exercise 8.23 below.

¹⁹ No such rule exists in SQL, however. What’s more, any implementation that tried to impose such a rule would be in violation of the standard!—i.e., the SQL standard explicitly permits “keys” to be declared that the user and the system both know to be proper superkeys. The “justification”—such as it is—for this state of affairs is beyond the scope of this book.

Chapter 9: SQL and Views

¹ I’ve touched on this notion before in this book—see the answer to Exercise 3.7 in Chapter 3 (though at that point I wasn’t using the term information equivalence as such). See also Exercise 9.13.

² In the standard, that feature goes by the name of REF types and reference values (see Chapter 2).

³ For the example under discussion, according to my own reading of the standard, the substitution procedure now yields an expression along the following lines: SELECT CITY FROM S WHERE (SELECT AST FROM (SELECT CITY, SUM (STATUS) AS AST FROM S GROUP BY CITY)) > 25.

⁴ Any table, that is, so long as the definition of the table in question doesn’t involve a possibly nondeterministic expression, a complication I choose to ignore for now. See Chapter 12 for further discussion.

⁵ A more accurate statement is: The specification KEY{SNO} applies to view LS as a logical consequence of the fact that it applies to base relvar S. Note, however, that the two specifications don’t mean the same thing—the one for view LS means suppliers numbers are unique with respect to London suppliers, the one for base relvar S means they’re unique with respect to all suppliers.

⁶ The tables are rather obviously not very well designed, and for that reason you might think I’m “stacking the deck” in an attempt to make my point seem more convincing than it really is. So I’d like to say the example isn’t really mine at all; rather, it’s a lightly edited version of one from Joe Celko’s article “Back to the Future” (Database Programming & Design 4, No. 12, December 1991).

⁷ If you look carefully, you’ll see I’m not exactly replacing those references to V by the defining expression for V. The reason is that (as we saw in Chapter 6) SQL requires an explicit JOIN invocation like FDH NATURAL JOIN DFGP to have a “SELECT * FROM” prefix if it appears at the outermost level of nesting but allows it not to have such a prefix otherwise.

⁸ The same goes for foreign key constraints, as a matter of fact. See the paper “Inclusion Dependencies and Foreign Keys” (mentioned in Appendix G).

⁹ Formally, these two constraints are equality dependencies (EQDs). In general, an EQD is an expression of the form rx = ry, where rx and ry are relational expressions of the same type; it can be read as “The relations denoted by rx and ry are equal” (in other words, they’re one and the same relation).

¹⁰ Cascade delete in particular is usually thought of as applying to foreign key constraints specifically, but the concept of compensatory actions is actually more general—it applies to constraints of all kinds. Also, don’t get the idea that such actions must always take the form of simple “cascades”; while all of the examples examined in the present subsection do happen to take that form, more complicated cases might well require actions of some less straightforward form.

¹¹ In fact this state of affairs holds true in all cases, not just for the particular example under consideration (see Exercise 8.30 in Chapter 8), though the details might not always be entirely straightforward. For further explanation, see the book mentioned earlier, View Updating and Relational Theory: Solving the View Update Problem (again, see Appendix G).

¹² Again I refer you to the book View Updating and Relational Theory: Solving the View Update Problem for further discussion.

¹³ And then only if the view defining expression isn’t possibly nondeterministic (see Chapter 12). Incidentally, note the implication here that SQL does allow updates on “possibly nondeterministic views,” and the further implication that SQL is thus apparently quite willing to allow certain updates to have unpredictable results! This state of affairs strikes me as odd, given that (as far as I know) the rationale for not allowing possibly nondeterministic expressions in constraints was precisely to avoid updates having unpredictable results.

¹⁴ The asymmetry here is intuitively odd. For example, an argument might be made—not by me!—that you can do DELETEs but not INSERTs on a union view, because the delete rule is obvious (delete from both operands) but the insert rule isn’t (do we insert into both operands or just one—and if just one, which?). But an exactly analogous argument would surely say you can do INSERTs but not DELETEs on an intersection view (?). In other words, if some views are “deletable from but not insertable into,” then surely others must be “insertable into but not deletable from.”

¹⁵ In keeping with our simplifying assumption that it makes sense to talk about inserting and deleting individual tuples, in this section I’m going to use a kind of pseudocode style—which I trust is self-explanatory—for INSERT and DELETE operations.

¹⁶ Some writers have suggested that the “and” in that informal characterization should really be “or,” meaning in terms of the S JOIN P example that (e.g.) deleting Paris tuples from SCP can and should be achieved by deleting Paris tuples just from S or just from P instead of from both. Arguments for and against this position are considered in detail in View Updating and Relational Theory: Solving the View Update Problem. For the purposes of the present discussion, I’ll stay with the rule as stated here.

¹⁷ As a matter of fact it isn’t even in second normal form (2NF), thanks to those FDs {SNO} → {CITY} and {PNO} → {CITY}. But it’s the violation of 4NF that leads to the relvar’s characteristic behavior as a many to many join.

¹⁸ Despite the fact that, as we saw earlier, there’s at least one product on the market that does materialize them at least some of the time for implementation reasons.

¹⁹ Indeed, logical data independence is a strong argument in favor of allowing constraints in general to be defined for views as well as base relvars.

Chapter 10: SQL and Logic

¹ The name SQL originally stood for Structured Query Language; the idea behind that name was that SQL queries typically consisted of subqueries nested inside other subqueries (i.e., that was the “structure” being alluded to). More specifically, SQL allowed such subqueries—i.e., SELECT – FROM – WHERE expressions, loosely speaking—to appear nested inside the WHERE clause of another such expression, recursively. But that was all! The ability to nest subqueries elsewhere (in particular, in the SELECT and FROM clauses) wasn’t part of SQL as originally defined, nor was it added to the language until nearly 20 years later, as part of SQL:1992.

² Thanks to Lauri Pietarinen of Relational Consulting Oy, Helsinki, Finland, for drawing this splendid example to my attention. The example is included and analyzed in detail in Ernest Nagel: “Symbolic Notation, Haddocks’ Eyes, and the Dog-Walking Ordinance,” in James Newman (ed.), The World of Mathematics, Vol. 3. Mineola, N.Y.: Dover Publications (2000).

³ Conventional logic (i.e., so called two-valued logic, 2VL) is thus truth functionally complete. In general, a logic is truth functionally complete if and only if every possible connective can be defined in terms of the given ones. As noted in Exercise 4.11 in Chapter 4, truth functional completeness is an extremely important property; a logic without it would be like an arithmetic that was missing certain operations (the operation of addition, say) and would thus be of extremely limited utility.

⁴ Easier, yes, but it’s still necessary to appeal to certain laws of transformation that I haven’t yet explained (though they’re intuitively obvious). See Chapter 11 for further discussion.

⁵ As noted in Chapter 5, statements in logic aren’t the same thing as statements in a programming language; in some respects, in fact, a statement in logic is more like a programming language expression, at least inasmuch as it denotes a value (a truth value, of course). In logic contexts, therefore, I’ll use the terms statement and expression (both in this chapter and the next) more or less interchangeably—and I apologize if this usage on my part leads to any confusion.

⁶ Existentially just to be definite. Quantifying universally instead would make no difference to the point I’m making here.

⁷ Even, sometimes, in logic textbooks, where the practice really ought to be deprecated.

⁸ A couple of remarks on terminology: First, the term proto tuple, standing for “prototype tuple,” is apt but nonstandard (in fact, a standard term for the concept doesn’t seem to exist). Second, I’ve referred to the construct following the keyword WHERE as a predicate. Syntactically speaking, however, it’s just a boolean expression—but of course a boolean expression can be regarded as the concrete representation of a predicate. In general, I tend to use “predicate” when I’m discussing logic as such (including relational calculus), but “boolean expression” when I’m discussing some concrete language such as Tutorial D or SQL.

⁹ Actually the standard does, but products typically don’t—they tend to use the term correlation name instead (see Chapter 12 for further discussion), or else some wildly inappropriate term such as alias or table alias or (surely the most grotesque of the many I’ve seen) join variable.

¹⁰ It might help to point out that SQL’s EXISTS is rather similar to Tutorial D’s IS_NOT_EMPTY (see Chapter 3). See the section “Some Equivalences,” later.

¹¹ To be a little more precise about the matter: Suppose tx denotes a nonempty restriction of some table T and the restriction condition evaluates to UNKNOWN for every row in T; then EXISTS (tx) ought logically to return UNKNOWN but will actually return TRUE, in SQL.

¹² In practice, the term tuple calculus is used mainly to distinguish the version of the relational calculus discussed in the present chapter from the domain calculus, which is a version of the relational calculus in which the variables range over domains—i.e., types—instead of relations. But there’s no need to discuss the domain calculus in this book; if you want to know more, you can find a detailed explanation in my book An Introduction to Database Systems (see Appendix G).

¹³ To elaborate: Consider by way of example the proposition EXISTS x (p(x)), where p is a predicate with just one parameter, x. If x ranges over an infinite set, then any attempt to use an “iterated OR” algorithm for evaluating the proposition will inevitably be flawed, since the algorithm might never terminate (it might never find the one value of x that satisfies p). Likewise, any attempt to use an “iterated AND” algorithm for FORALL x (p(x)) will also inevitably be flawed, since again the algorithm might never terminate (it might never find the one value of x that fails to satisfy p).

¹⁴ A note on syntax: Recall from Chapter 7 that Tutorial D supports the aggregate operators AND and OR, thereby allowing us to write, e.g., AND (SP, QTY > 0), to express the fact that QTY values in relvar SP must be greater than zero. The discussions of the present section suggest that more “user friendly” names for these operators might well be FORALL and EXISTS, respectively. For example, the expression FORALL (SP, QTY > 0) does read quite well from an intuitive point of view. Likewise, EXISTS (SP, QTY > 250) seems to be an intuitively pleasing way of expressing the fact that at least one QTY value in relvar SP must be greater than 250.

¹⁵ By contrast, the UNIQUE quantifier gives FALSE if its range is empty.

¹⁶ It follows that AT_MOST_ONE (or perhaps NO_DUPS) would be a better name for the SQL operator than UNIQUE, at least in a context like the one under discussion. (Come to that, AT_LEAST_ONE might be a better name than EXISTS, too, both for the existential quantifier as such and for SQL’s analog of that quantifier.)

¹⁷ Note that in this context, by contrast, the SQL keyword UNIQUE does mean exactly one.

¹⁸ Don’t confuse relational completeness with any other kind of completeness: in particular, with truth functional completeness, mentioned in an earlier footnote.

¹⁹ The term valid is something of a loaded word in logical contexts. I’m using it in these exercises simply to mean that the statement in question is true, regardless of what values are assigned to any variables involved (in other words, the statement in question is a tautology).

²⁰ In other words, I’m claiming that x here is a bound variable, and hence that the sentence could be rephrased thus: “There exists some city, say x, such that supplier S2 is located in city x.”

²¹ Except for TCLOSE—but TCLOSE wasn’t included in the original definition of what it meant to be relationally complete, anyway. Note: You might be wondering about some of the other operators from Chapter 7 (e.g., EXTEND, SUMMARIZE, and GROUP and UNGROUP). In fact, Hugh Darwen and I show in our book Databases, Types, and the Relational Model: The Third Manifesto (see Appendix G) that these operators can indeed be defined in terms of restrict, project, etc., at least in principle.

²² To state the matter a little more precisely: An SQL analog of the algebraic expression R RENAME {A AS B} is the (extremely inconvenient!) SQL expression SELECT X, Y, ..., Z, A AS B FROM R (where X, Y, ..., Z are all of the columns of R apart from A, and I choose to overlook the fact that the SQL expression results in a table with a left to right ordering to its columns).

²³ I choose to overlook the fact that SQL would actually require such a table expression, when used in any of the contexts under discussion, to be accompanied by a pointless range variable definition.

Chapter 11: Using Logic to Formulate SQL Expressions

¹ I remark, for what it’s worth, that SQL and the relational model and SQL are both essentially products of California.

² What would happen if we omitted that ELSE TRUE?

³ I’m being sloppy here. The phrase “range over table P” ought really to be “range over the table value that’s the current value of the table variable called P” (and similarly for “range over table S,” of course). Of course, SQL has no explicit notion of table values vs. table variables anyway.

⁴ All query results shown in this chapter are based on the usual sample data values, of course. Note: According to reviewers, at least two SQL products gave the same result here regardless of whether or not DISTINCT was specified. If so, then the products in question would seem to have a bug in this area.

⁵ It’s worth pointing out in passing that the tactic of introducing names for subexpressions is reminiscent, somewhat, of the use of WITH in simplifying complex expressions as discussed in Chapter 6. But there’s a difference: For WITH, the subexpressions in question are required to be closed, whereas no such requirement applies in the present context. Indeed, all we’re doing in the present context is, in effect, simple text substitution, which is not what happens with WITH.

⁶ Actually we could employ the same trick in mapping EXISTS—i.e., we could define EXISTS SX (bx) as mapping to EXISTS (SELECT k FROM S AS SX WHERE (sbx)), instead of EXISTS (SELECT * FROM S AS SX WHERE (sbx)). For symmetry I’ve done exactly this in the SQL formulation of constraint CX6 that follows.

⁷ A similar remark applies to the EXISTS invocation, of course.

⁸ Note that the “or” in this sentence is implicitly inclusive. Would it be more correct if it were exclusive? Would it make any difference?

⁹ And that TRUE is logically correct! This behavior is certainly a little surprising, given that SQL’s EVERY “set function” incorrectly returns null, not TRUE, if its argument is empty. (EVERY is, of course, the “set function” analog of ALL in the context under discussion.) The reason for the inconsistency is that—as perhaps you’ve guessed—SQL’s ALL or ANY comparisons were defined before nulls were added to the language. (Is there a moral here?) Analogous remarks apply to ANY comparisons also, mutatis mutandis.

¹⁰ Or is it? What if supplier or part cities could be null?

¹¹ Note that both expressions involve some coercion. As a slightly nontrivial exercise, you might like to try figuring out exactly what coercions are involved in each case.

Chapter 12: Miscellaneous SQL Topics

¹ A detailed though not necessarily exhaustive discussion of this issue can be found on pages 141-144 of the book A Guide to the SQL Standard (4th edition), by Hugh Darwen and myself (see Appendix G).

² As in, for example, UPDATE S SET S.STATUS = S.STATUS + 1 (the second S.STATUS here is legal but the first isn’t).

³ Actually, the current version of the standard—viz., SQL:2011—does sometimes use the the term range variable, but it doesn’t do so either exclusively or systematically, and correlation name still seems to be the preferred term.

⁴ Here I might admit if pressed to a sneaking sympathy with a remark an old friend once made to me in connection with this very point: “You mathematicians are all alike—you spend hours agonizing over things that are perfectly obvious to everybody else.”

⁵ The standard calls them table references, but that term is hardly very apt. In most languages, a variable reference is a special case of an expression; syntactically, it’s just a variable name, used to denote the value of the variable in question. But an SQL “table reference” isn’t a special case of a table expression—not in the sense in which the latter term is used in this book, and (perhaps more to the point) not in the sense in which it’s used in SQL, either.

⁶ As the example suggests, the column name commalist in a range variable definition is required, somewhat annoyingly, to be exhaustive—there’s no way to rename just some of the columns concerned and not others. Also, note the need here to be fully cognizant of SQL’s rules regarding left to right column ordering in the result of the explicit JOIN!

⁷ I’ll omit them from most of my own examples in the remainder of this chapter, however, because (a) using explicit range variables might distract from the main point I’m trying to make with those examples and (b) those examples are all fairly simple, anyway.

⁸ It was legal in SQL:1992 but became illegal in SQL:2003.

⁹ Nor in view definitions, if WITH CHECK OPTION is specified.

¹⁰ What follows represents my own understanding and paraphrasing of the pertinent text from SQL:1992 (except that I’ve taken into account certain minor revisions made in subsequent versions of the standard). More important, I follow SQL:1992 here in talking about character string types only. The rules have since been extended to include as possibly nondeterministic (a) expressions involving data of certain user defined types and (b) expressions involving invocations of certain user defined operators (routines, to use the standard’s term). Further details are beyond the scope of this book.

¹¹ Appendix D gives a BNF grammar for relational expressions (also relational assignment) in Tutorial D.

¹² It might help to point out that, loosely speaking, explicit joins can’t appear at the outermost level of nesting, while WITH specifications can’t appear anywhere else.

Appendix A: The Relational Model

¹ Of course, one immediate objection to that claim is that there are clearly many nonrelational databases in existence already: hierarchic databases, CODASYL databases, and so on. True enough—but note carefully that those older databases were never meant to be application neutral and fully general purpose; rather, they were typically built to serve some specific application. As a consequence, they don’t and can’t provide all of the functionality that a truly general purpose database is supposed to provide—ad hoc query, view support, full data independence, flexible security and integrity controls, and so forth. In fact, I would go further; I would argue that, to the extent they fail to be fully relational, even modern SQL databases are subject to some of these same criticisms. In other words, I regard nonrelational databases in general as little more than application specific data stores, and I’m very tempted to say they shouldn’t be called databases, as such, at all.

² Which is why I set them all enclosed in quotation marks. I’ll drop those quotation marks from this point forward because I know how annoying they can be, but you should think of them as still being there in some virtual kind of sense.

³ My own opinion, for what it’s worth, is that the semistructured model and the object model are just the old hierarchic model warmed over and the old network model warmed over, respectively.

⁴ Actually I think these remarks are rather charitable; in my opinion, those other models are really little more than slightly abstract, but otherwise ad hoc, storage structures that have been elevated above their station and will not stand the test of time.

⁵ Here for the record is, verbatim, the way those “six models” are labeled in Celko’s book: 1. Chris Date = No Duplicates, No NULLs. 2. E. F. Codd, RM Version I. 3. E. F. Codd, RM Version II. 4. SQL-92 = Duplicates, One NULL. 5. Duplicates, One NULL, Non-1NF Tables. 6. Rick Snodgrass = Temporal SQL.

⁶ “I see relational theory as simply a body of theory to which many people are contributing in different ways” (E. F. Codd, in an interview in Data Base Newsletter 10, No. 2, March 1982).

⁷ As explained in Chapter 2, the relational model doesn’t rely on the distinction between scalar and nonscalar types in any formal sense. I appeal to it here (as elsewhere in this book) merely as an aid to intuition.

⁸ I’ve already said it’s misleading to think of the relational model as structure plus integrity plus manipulation because all three aspects are inextricably intertwined. Well, the same goes for the five components to be described in the following subsections, of course (at least to some extent).

⁹ Actually this paragraph applies to all types, scalar or otherwise.

¹⁰ It goes without saying that object databases, XML databases, and more generally nonrelational databases of any kind, do all violate it, necessarily.

¹¹ We’d also have to choose which data we wanted to represent as relations and which as arrays, probably without any good guidelines to help us in making such choices. And what about the database catalog? Would it contain relations, or arrays? Or a mixture?

¹² To all of the foregoing I’d like to add a comment Hugh Darwen once made to me (in a private communication): “UNION CORRESPONDING was added to SQL in 1992, presumably to fill some perceived gap in functionality. Suppose it had been part of the language as originally defined; when if ever would the need have emerged to introduce a UNION based on left to right column ordering instead?” I note too that questions like this one apply to a whole host of constructs that have been added to SQL since it was first defined.

¹³ So here we have another reason—a somewhat philosophical reason, perhaps—for rejecting the notion of duplicates.

¹⁴ In this connection, we’d also like to see an implementation that’s more sophisticated in certain respects than most current SQL implementations typically are. More specifically, we’d like to see an implementation based on The TransRelational™ Model (see Appendix G).

¹⁵ As a matter of fact, Darwen and I show in our Manifesto book that every algebraic operator discussed in this book, with the sole exception of TCLOSE, can be expressed in terms of just two primitives, remove (which is basically “project over all attributes but one”) and either nand or nor (which are basically algebraic analogs of the logical operators with the same names—see the answer to Exercise 10.4 in Chapter 10).

¹⁶ Yes, it really was thought of in such terms. Here’s a quote from the very first paper on the language we now know as SQL (see Appendix G): “Examples of such users are accountants, engineers, architects, and urban planners. It is for this class of users that [SQL] is intended.”

Appendix B: SQL Departures from the Relational Model

¹ I remind you too, one more time, that All logical differences are big differences.

² To be fair, this criticism applies to Tutorial D too at the time of writing.

³ One consequence of this point is that two rows can be duplicates of each other without being identical. And one consequence of that point is that the definition of (e.g.) SQL’s UNION DISTINCT operator has to look something like this: Let tables t1 and t2 be the union operands; let r be a row that’s a duplicate of some row in t1 and a duplicate of some row in t2; then the result table contains (a) exactly one duplicate of every such row r and (b) no row that’s not a duplicate of some such row r. (And even this definition is incomplete, of course—we need additional rules specifying the column names and types of that result table.) Contrast the simple relational definition of union in Chapter 6.

Appendix C: A Relational Approach to Missing Information

¹ In SQL, by contrast, it’s considered to be of every type. To quote the standard: “Every data type includes a special value, called the null value … [that] is neither equal to any other value nor not equal to any other value.” Note that phrase “null value,” bythe way! Since the crucial point about nulls is that they’re not values, that phrase is simply a contradiction in terms.

² They can be expressed in SQL, too, though not quite so easily (exercise for the reader). In fact, they’re both examples of what are called, for obvious reasons, equality dependencies or EQDs. (EQDs were mentioned briefly in Chapter 9, as you might recall.) Note that if an EQD is in effect, spanning two or more tables, then certain updates on just one of those tables will necessarily cause that EQD to be violated. See the discussion of multiple assignment and related matters in Chapter 8.

³ I have to use Tutorial D here, not SQL, because the example under discussion is a yes/no query, as we’ll see in a moment. As a consequence, it relies on the special relations TABLE_DEE and TABLE_DUM (the only relations of degree zero—see Chapter 3), and SQL doesn’t support those relations.

⁴ Even the terms know and knowledge might be a little strong in contexts such as those at hand (the terms believe and beliefs might be better)—but I’ll stay with know and knowledge for the purposes of the present discussion.

Appendix F: No SQL and Relational Theory

¹ Actually it was, originally. The first use of the term, by Carlo Strozzi in 1995, was as the name of a relational DBMS that was explicitly intended to support something that wasn’t SQL (see www.strozzi.it/cgi-bin/CSA/tw7/l/en_US/NoSQL/Home%20Page).

² The industry tends to call them not systems but “stores,” but (as with calling the DBMS a database, a usage I objected to in Chapter 2) this nomenclature seems to confuse the stored data as such with the software system that manages it.

³ Since this appendix has something of an SQL flavor (or a NoSQL flavor, rather), throughout most of the discussion I’ll stick with the SQL terminology of tables, rows, and columns.

⁴ The term schema is much used in the database industry, in the SQL world in particular. However, I’ve avoided it in this book because, to be frank, it isn’t always clear exactly what it means. Sometimes it’s used as if it were just a synonym for database design (either logical or physical). Sometimes it’s used to mean a set of definitional statements, such as CREATE TABLE in SQL, that correspond to such a design (but then how do statements like ALTER TABLE and DROP TABLE fit in with that definition?). Also, in the more specific form relation schema, it’s sometimes used to mean a relvar or relation heading, or a relvar or relation heading together with the constraints that apply to the relvar or relation in question, or sometimes a relvar or relation heading together with just the pertinent key constraint(s). And so on.

⁵ The book Graph Databases, by Ian Robinson, Jim Webber, and Emil Efrem (O’Reilly, 2013), gives the following justification for this state of affairs: “[Query] execution times increase as the size of tables and the number of joins gow (so called join pain).” Well, this is obviously true—but in a well architected system, that growth should be linear (though I grant it’s probably not linear in today’s mainstream SQL products). See my book, mentioned in Appendix G, Go Faster! The TransRelational™ Approach to DBMS Implementation.

⁶ They’re also highly reminiscent of what in Appendix A I called application specific data stores.

⁷ It’s worth noting that functional segmentation as discussed in the previous section will tend to produce such hierarchic structures naturally, and even if it doesn’t it can be forced to.

⁸ I explained in Chapter 8 how the term isolation as used in the database context (in particular, in the context of the ACID properties of transactions) doesn’t mean quite the same thing as it does in ordinary English. What I’m saying now is that an analogous remark applies to consistency as well, mutatis mutandis.

⁹ The advantages Darwen is referring to here are basically what Codd claimed as his objectives for introducing the relational model (see the section “Objectives of the Relational Model” in Appendix A).

¹⁰ Or equivalent functionality, which as we saw in Chapter 8 is available in the SQL standard in the form of base table constraints.

Appendix G: Suggestions for Further Reading

¹ I apologize for the number of times my own name appears as either author or coauthor in this list, but given the nature of the material, such a state of affairs is—if you’ll pardon me for saying so—more or less inevitable.

² Of course, SEQUEL hadn’t actually been implemented when this paper was written, nor so far as I know had it even been formally defined. If it had been (either defined or implemented, that is), some of the items in the subsequent bullet list would certainly have had to change, and some of them probably wouldn’t have survived—a state of affairs that might explain some of the differences here referred to.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for SQL and Relational Theory

Create new playlist

Sign In

Sign Up