SQL and Relational Theory

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 3

Tuples and Relations, Rows and Tables

[I] have reduced several great confused Volumes into a few perspicuous Tables.

—John Graunt (1662)

From the first two chapters you should have gained a pretty good understanding of what tuples and relations are, at least intuitively. Now I want to define those concepts more precisely, and I want to explore some of the consequences of those more precise definitions; also, I want to describe the analogous SQL constructs (viz., rows and tables) and offer some specific recommendations to help with our goal of using SQL relationally. Perhaps I should warn you that the formal definitions might look a little daunting—but that’s not unusual with formal definitions; the concepts themselves are quite straightforward, once you’ve struggled through the formalism, and you should be ready to do that by now because the terminology, at least, should be quite familiar to you.

WHAT’S A TUPLE?

Is this a tuple?—

┌────────────┬──────────────┬──────────────────┬─────────────┐
│ SNO : CHAR │ SNAME : CHAR │ STATUS : INTEGER │ CITY : CHAR │
├────────────┼──────────────┼──────────────────┼─────────────┤
│ S1 │ Smith │ 20 │ London │
└────────────┴──────────────┴──────────────────┴─────────────┘

Well, no, it isn’t—it’s a picture of a tuple, not a tuple as such (and note that for once I’ve included the type names in that picture as well as the attribute names). As we saw in Chapter 1, there’s a logical difference between a thing and a picture of a thing, and that difference can be very important. For example, tuples have no left to right ordering to their attributes, and so the following is an equally good (bad?) picture of the very same tuple:

┌──────────────────┬──────────────┬─────────────┬────────────┐
│ STATUS : INTEGER │ SNAME : CHAR │ CITY : CHAR │ SNO : CHAR │
├──────────────────┼──────────────┼─────────────┼────────────┤
│ 20 │ Smith │ London │ S1 │
└──────────────────┴──────────────┴─────────────┴────────────┘

Thus, while I’ll certainly be showing many pictures like these in the pages to follow, please keep in mind that they’re only pictures, and they can sometimes suggest some things that aren’t true.

With that caveat out of the way, I can now say exactly what a tuple is:

Definition: A heading H is a set of n attributes (n ≥ 0),¹ each consisting of an attribute name Ai and a corresponding type name Ti, such that the attribute names Ai are all distinct. The value n is the degree of H; a heading of degree one is unary, a heading of degree two is binary, a heading of degree three is ternary, ..., and more generally a heading of degree n is n-ary. Let each attribute Ai (i = 1, 2, ..., n) be associated with an attribute value vi of type Ti, and let each of the n attribute : value pairs that results be called a component. The set— call it t—of all n components so defined is a tuple value (or just a tuple for short) over the attributes of H. H is the tuple heading (or just heading for short) for t, and the degree and attributes of H are, respectively, the degree and attributes of t.

Thus, for example, with reference to either of the pictures above of the tuple for supplier S1, we have:

Attribute names: SNO, SNAME, STATUS, CITY.
Corresponding type names: CHAR, CHAR, INTEGER, CHAR.
Attributes: SNO:CHAR, SNAME:CHAR, STATUS:INTEGER, CITY:CHAR.
Corresponding attribute values: 'S1', 'Smith', 20, 'London'. Note the quotes enclosing the character string values here, incidentally; I didn’t show any such quotes in the pictures, but perhaps I should have done—it would have been more correct.

Aside: Suppose for a moment, as we did in Chapter 2, that attribute SNO was of type SNO (a user defined type) instead of type CHAR. Then it would be even more incorrect to say the SNO value in the tuple we’re talking about was S1, or even 'S1'; rather, it would be SNO('S1'). A value of type SNO is a value of type SNO, not a value of type CHAR!—a difference in type is certainly a logical difference. (Recall from Chapter 2 that the expression SNO('S1') is a selector invocation—in fact, a literal—of type SNO.) End of aside.
Heading: {SNO:CHAR, SNAME:CHAR, STATUS:INTEGER, CITY:CHAR}.
Degree: 4.

By the way, it’s sometimes convenient to represent headings on paper (like tuples) by means of a picture, as in this example:

┌────────────┬──────────────┬──────────────────┬─────────────┐
│ SNO : CHAR │ SNAME : CHAR │ STATUS : INTEGER │ CITY : CHAR │
└────────────┴──────────────┴──────────────────┴─────────────┘

Of course, this picture represents a set, and so the order in which the attributes are shown is arbitrary. Here’s another picture of the same heading:

┌──────────────────┬──────────────┬─────────────┬────────────┐
│ STATUS : INTEGER │ SNAME : CHAR │ CITY : CHAR │ SNO : CHAR │
└──────────────────┴──────────────┴─────────────┴────────────┘

Exercise: How many different pictures of this same general nature could we draw to represent this same heading? Answer: 4! (factorial 4) = 4 * 3 * 2 * 1 = 24.

Now, a tuple is a value; like all values, therefore, it has a type (as we know from Chapter 2), and that type, like all types, has a name. In Tutorial D, such names take the form TUPLE H, where H is the heading. In our example, the name is:

TUPLE { SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR }

The order in which the attributes are specified is arbitrary, of course. Note, however, that Tutorial D uses spaces, not colons, to separate attribute names from their corresponding type names.

To repeat, a tuple is a value. Like all values, therefore, it must be returned by some selector invocation (a tuple selector invocation, naturally, if the value is a tuple). Here’s a tuple selector invocation for our example (Tutorial D again):

TUPLE { SNO 'S1' , SNAME 'Smith' , STATUS 20 , CITY 'London' }

The order in which the components are specified is again arbitrary. Note, however, that in Tutorial D each component is specified by means of the pertinent attribute name by itself (i.e., without the corresponding type name), separated by spaces from an expression denoting the pertinent attribute value; there’s no need to specify the attribute type explicitly, because it’s necessarily the same as that of the specified expression.

Here’s another example of a tuple selector invocation (unlike the previous one, this one isn’t a literal, because not all of its arguments are specified as literals in turn):

TUPLE { SNO SX , SNAME 'Johns' , STATUS TX , CITY CX }

I’m assuming here that SX, TX, and CX are variables of types CHAR, INTEGER, and CHAR, respectively.

As these examples indicate, a tuple selector invocation in Tutorial D consists in general of the keyword TUPLE, followed by a commalist of attribute-name : expression pairs (but without the colon separators), the whole commalist being enclosed in braces. Note, therefore, that the keyword TUPLE does double duty in Tutorial D—it’s used in connection both with tuple selector invocations, as we’ve just seen, and with tuple type names as we saw earlier. An analogous remark applies to the keyword RELATION also (see the section “What’s a Relation?” later in this chapter).

Consequences of the Definitions

Now I want to highlight some important consequences of the foregoing definitions. The first is this: No tuple ever contains any nulls. The reason is that, by definition, every tuple contains a value (of the appropriate type) for each of its attributes, and as we saw in Chapter 1 nulls aren’t values—despite the fact that SQL does often, though not invariably, refer to them explicitly as null values. Recommendation: Since the phrase “null value” is a contradiction in terms, don’t use it; always say just “null” instead. Observe that this recommendation isn’t just a matter of pedantry; rather, it’s a matter of thinking straight. SQL itself manages to make numerous mistakes in its handling of nulls, and some of those mistakes can be traced directly to the fact that SQL does sometimes, though not always, think of null as a value.²

Now, if no tuple ever contains any nulls, then no relation does so either, a fortiori; so right away we have at least a formal reason for rejecting the concept of nulls—but in the next chapter I’ll give some much more pragmatic reasons as well.

The next consequence—or pair of consequences, rather—is: Every subset of a tuple is a tuple and every subset of a heading is a heading. (I did mention these points in Chapter 1, but now I want to elaborate on them.) By way of example, given our usual tuple for supplier S1, what we might call the {SNO,CITY} value within that tuple is itself another tuple (of degree two):

┌────────────┬─────────────┐
│ SNO : CHAR │ CITY : CHAR │
├────────────┼─────────────┤
│ S1 │ London │
└────────────┴─────────────┘

Its heading is as indicated, and its type is thus TUPLE {SNO CHAR, CITY CHAR}. Likewise, the following is a tuple also:

┌────────────┐
│ SNO : CHAR │
├────────────┤
│ S1 │
└────────────┘

This tuple is of degree one, and its type is TUPLE {SNO CHAR}.

Now, as I’m sure you know, the empty set—i.e., the set that contains no elements—is a subset of every set. It follows that the empty heading is a valid heading!—and hence that a tuple with an empty set of components is a valid tuple (though it’s a little hard to draw pictures of such a tuple on paper, and I’m not even going to try). A tuple with an empty heading has type TUPLE { }; indeed, we sometimes refer to it explicitly as a 0-tuple, in order to emphasize the fact that it has no components and is of degree zero. We also sometimes call it an empty tuple. Now, you might be thinking such a tuple is unlikely to be of much use in practice; in fact, however, it turns out, perhaps rather surprisingly, to be of crucial importance. I’ll have more to say about it in the section “TABLE_DUM and TABLE_DEE,” later.

Let’s get back to the original tuple for supplier S1 (i.e., the one of degree four) for a moment. Suppose we’re given that tuple and we want to access the actual value of some attribute, say the SNO attribute, from that tuple. Then we have to extract that value, somehow, from the tuple that contains it. Tutorial D uses syntax of the form SNO FROM t for this purpose (where t is any expression that denotes a tuple with an attribute called SNO). SQL uses dot qualification: t.SNO.

Note: It follows from the foregoing paragraph that a value v and a tuple t that contains just that value v aren’t the same thing; in particular, they’re of different types. This logical difference is analogous to that described in Chapter 2, between a tuple t and a relation r that contains just that tuple t; these aren’t the same thing either (they too are of different types).

Now I’d like to turn to the notion of tuple equality. (Again I mentioned this notion in Chapter 1, but now I want to elaborate on it.) Recall first from Chapter 2 that the “=” comparison operator is—in fact, must be—defined for every type, and tuple types are no exception. Basically, two tuples are equal if and only if they’re the very same tuple (just as, for example, two integers are equal if and only if they’re the very same integer). But it’s worth spelling out the semantics of tuple equality in detail, since so much in the relational model depends on it. (For example, candidate keys, foreign keys, and most if not all of the operators of the relational algebra are defined in terms of it.) Here then is a precise definition:

Definition: Tuples t and t′ are equal if and only if they have the same attributes A1, A2, ..., An—in other words, they’re of the same type—and, for all i (i = 1, 2, ..., n), the value v of Ai in t is equal to the value v′ of Ai in t′.

Also (to repeat from Chapter 1, this might seem obvious, but it needs to be said), two tuples are duplicates of each other if and only if they’re equal. Thus, e.g., the tuple for supplier S1 in the suppliers relation of Fig. 1.3 is equal to, and is therefore a duplicate of, itself—and it isn’t equal to, or a duplicate of, anything else (any other tuple in particular).

By the way, it’s an immediate consequence of the foregoing definition that all 0-tuples are duplicates of one another. For this reason, we’re within our rights if we talk in terms of the 0-tuple instead of “a” 0-tuple, and indeed we usually do. Note, moreover, that we can validly say that the 0-tuple is a subset, or “subtuple,” of every tuple (just as we can say the empty set is a subset of every set).

So the comparison operator “=”, and therefore the comparison operator “≠” also, do both necessarily apply to tuples. However, the operators “<” and “>” do not apply. The reason is that tuples are fundamentally sets (sets of components), and such operators make no sense for sets.

In closing this section, let me draw your attention to Exercise 3.16 at the end of the chapter, which I strongly recommend you devote some thought to. Later chapters in the book will appeal to several of the matters raised by that exercise.

ROWS IN SQL

SQL supports rows, not tuples; in particular, it supports row types, a row type constructor, and row value constructors, which are analogous, somewhat, to Tutorial D’s tuple types, TUPLE type generator, and tuple selectors, respectively. (Row types and row type constructors, though not row value constructors, were also discussed in Chapter 2.) But these analogies are loose at best, because, crucially, rows, unlike tuples, have a left to right ordering to their components. For example, the expressions ROW(1,2) and ROW(2,1)—both of which are legitimate row value constructor invocations in SQL—represent two different SQL rows. Note: The keyword ROW in an SQL row value constructor invocation is optional; in practice, it’s almost always omitted.

Thanks to that left to right ordering, row components (“fields”) in SQL can be, and indeed are, identified by ordinal position instead of by name. For example, consider the following row value constructor invocation (actually it’s a row literal, though SQL doesn’t use that term):

( 'S1' , 'Smith' , 20 , 'London' )

This row clearly has (among other things) a component with the value 20; logically speaking, however, we can’t say that component is “the STATUS component,” we can only say it’s the third component.

I should add that rows in SQL always contain at least one component; SQL has no analog of the 0-tuple of the relational model (i.e., there’s no “0-row”).

As discussed in Chapter 2—recall the example involving the SQL row variable SRV— SQL also supports a row assignment operation.³ In particular, such assignments are involved (in effect) in SQL UPDATE statements. For example, the following UPDATE statement—

UPDATE S
SET STATUS = 20 , CITY = 'London'
WHERE CITY = 'Paris' ;

—is defined to be logically equivalent to this one (note the row assignment in the second line):

UPDATE  S
SET   ( STATUS , CITY ) = ( 20 , 'London' )
WHERE   CITY = 'Paris' ;

As for comparison operations, most boolean expressions in SQL, including (believe it or not) simple “scalar” comparisons in particular, are actually defined in terms of rows rather than scalars. Here’s an example of a SELECT expression in which the WHERE clause contains an explicit row comparison:

SELECT SNO
FROM S
WHERE ( STATUS , CITY ) = ( 20 , 'London' )

This SELECT expression is logically equivalent to the following one:

SELECT SNO
FROM S
WHERE STATUS = 20 AND CITY = 'London'

As another example, the expression

SELECT SNO
FROM S
WHERE ( STATUS , CITY ) <> ( 20 , 'London' )

is logically equivalent to:

SELECT SNO
FROM S
WHERE STATUS <> 20 OR CITY <> 'London'

Note carefully in the expanded form of this example that the two individual comparisons in the WHERE clause are connected by OR, not AND.

Moreover, since row components have a left to right ordering, SQL is also able to support “<” and “>” as row comparison operators. Here’s an example:

SELECT SNO
FROM S
WHERE ( STATUS , CITY ) > ( 20 , 'London' )

This expression is logically equivalent to:

SELECT SNO
FROM S
WHERE STATUS > 20 OR ( STATUS = 20 AND CITY > 'London' )

In practice, however, the vast majority of row comparisons involve rows of degree one, as here:

SELECT SNO
FROM S
WHERE ( STATUS ) = ( 20 )

Now, all of the expressions denoting comparands in the examples so far have been, specifically, row value constructor invocations. But now I need to explain that SQL has a syntax rule to the effect that if such an invocation consists of a single scalar expression enclosed in parentheses, then the parentheses can optionally be dropped, as here:

SELECT SNO
FROM S
WHERE STATUS = 20

The “row comparison” in the WHERE clause in this example is thus effectively a scalar comparison (STATUS and 20 are both scalar expressions). Strictly speaking, however, there’s no such thing as a scalar comparison in SQL; the expression STATUS = 20 is still technically a row comparison (and the “scalar” comparands are effectively coerced to rows), so far as SQL is concerned.

Recommendation: Unless the rows being compared are of degree one (and thus effectively scalars), don’t use the comparison operators “<”, “<=”, “>”, and “>=”; they rely on left to right column ordering, they have no direct counterpart in the relational model, and in any case they’re seriously error prone. (It’s relevant to note in this connection that when this functionality was first proposed for SQL, the standardizers had great difficulty in defining the semantics properly; in fact, it took them several iterations before they got it right.)

WHAT’S A RELATION?

I’ll use our usual suppliers relation as a basis for examples in this section. Here’s a picture:

┌────────────┬──────────────┬──────────────────┬─────────────┐
│ SNO : CHAR │ SNAME : CHAR │ STATUS : INTEGER │ CITY : CHAR │
├════════════┼──────────────┼──────────────────┼─────────────┤
│ S1         │ Smith        │               20 │ London      │
│ S2         │ Jones        │               10 │ Paris       │
│ S3         │ Blake        │               30 │ Paris       │
│ S4         │ Clark        │               20 │ London      │
│ S5         │ Adams        │               30 │ Athens      │
└────────────┴──────────────┴──────────────────┴─────────────┘

And here’s a definition:

Definition: Given a heading H, a body B conforming to H is a set of m tuples (m ≥ 0), each with heading H. The value m is the cardinality of B. The pair (H,B)—call it r—is a relation value (or just a relation for short) over the attributes of H. H is the relation heading (or just heading for short) for r, and the degree and attributes of H and the cardinality of B are, respectively, the degree, attributes, and cardinality of r.

I’ll leave it as an exercise for you to interpret the suppliers relation in terms of the foregoing definition. However, I will at least explain why we call such things relations. Basically, each tuple in a relation represents an n-ary relationship, in the ordinary natural language sense of that term, interrelating a collection of n values (one such value for each tuple attribute); the full set of tuples in a given relation represents the full set of such relationships that happen to exist at some given time; and, mathematically speaking, that set of tuples is a relation. Thus, the explanation often heard, to the effect that the relational model is so called because it lets us “relate one table to another,” though accurate in a kind of secondary sense, really misses the point: The relational model is so called because it deals with certain abstractions that we can think of informally as “tables” but are known in mathematics, formally, as relations.

Now, a relation, like a tuple, is itself a value and has a type, and that type has a name. In Tutorial D, such names take the form RELATION H, where H is the heading—for example:

RELATION { SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR }

(The order in which the attributes are specified is arbitrary, of course.) Also, every relation value is denoted by some relation selector invocation—for example:

RELATION
  { TUPLE { SNO 'S1' , SNAME 'Smith' , STATUS 20 , CITY 'London' } ,
    TUPLE { SNO 'S2' , SNAME 'Jones' , STATUS 10 , CITY 'Paris'  } ,
    TUPLE { SNO 'S3' , SNAME 'Blake' , STATUS 30 , CITY 'Paris'  } ,
    TUPLE { SNO 'S4' , SNAME 'Clark' , STATUS 20 , CITY 'London' } ,
    TUPLE { SNO 'S5' , SNAME 'Adams' , STATUS 30 , CITY 'Athens' } }

The order in which the tuples are specified is arbitrary. Here’s another example (unlike the previous one, this one isn’t a literal):

RELATION { tx1 , tx2 , tx3 }

I’m assuming here that tx1, tx2, and tx3 are tuple expressions and are all of the same tuple type. As these examples suggest, a relation selector invocation in Tutorial D consists in general⁴ of the keyword RELATION, followed by a commalist enclosed in braces of tuple expressions (and those tuple expressions must all be of the same tuple type).

Consequences of the Definitions

Most of the properties of relations I talked about in Chapter 1 are direct consequences of the definitions discussed above, but there are some points I didn’t call out explicitly before, and I want to elaborate on some of the others. The first two I want to mention are as follows:

Relations never contain duplicate tuples—because the body of a relation is a set (a set of tuples) and sets in mathematics don’t contain duplicate elements.
Relations never contain nulls—because the body of a relation is a set of tuples, and we’ve already seen that tuples in turn never contain nulls.

But these two points are so significant, and there’s so much I need to say about them, that I’ll defer detailed treatment of them to the next chapter. In the next few sections, I’ll address a series of possibly less weighty issues (?) arising from the definitions.

RELATIONS AND THEIR BODIES

The first point I want to discuss is this: Every subset of a body is a body—or, loosely, every subset of a relation is a relation. (Once again I mentioned this fact in Chapter 1, but now I want to say a little more about it.) In particular, since the empty set is a subset of every set, a relation can have a body that consists of an empty set of tuples (and we call such a relation an empty relation). For example, suppose there are no shipments right now. Then relvar SP will have as its current value the empty shipments relation, which we might draw like this (and now I revert to the convention by which we omit the type names from headings in informal contexts; throughout the rest of the book, in fact, I’ll feel free to regard headings as either including or excluding the attribute type names—whichever best suits my purpose at the time):

┌─────┬─────┬─────┐
│ SNO │ PNO │ QTY │
├═════┼═════┼─────┤
└─────┴─────┴─────┘

Note that, given any particular relation type, there’s exactly one empty relation of that type—but empty relations of different types aren’t the same thing, precisely because they’re of different types. For example, the empty suppliers relation isn’t equal to the empty parts relation (their bodies are equal but their headings aren’t).

Consider now the relation depicted here:

┌─────┬─────┬─────┐
│ SNO │ PNO │ QTY │
├═════┼═════┼─────┤
│ S1 │ P1 │ 300 │
└─────┴─────┴─────┘

This relation contains just one tuple (equivalently, it’s of cardinality one). If we want to access the single tuple it contains, then we’ll have to extract it somehow from its containing relation. Tutorial D uses syntax of the form TUPLE FROM rx for this purpose, where rx is any expression that denotes a relation of cardinality one—for example, it might be the expression RELATION {TUPLE {SNO 'S1', PNO 'P1', QTY 300}}, which is in fact a relation selector invocation (actually it’s a literal). SQL, by contrast, uses coercion: If (a) tx is a table expression that’s being used as a row subquery (meaning it appears where a row expression is expected), then (b) the table t denoted by tx is supposed to contain just one row r, and (c) that table t is coerced to that row r. Here’s an example (it’s the row assignment example from the section “Row and Table Types in SQL” in Chapter 2):

SET SRV = ( S WHERE SNO = 'S1' ) ;

We also need to be able to test whether a given tuple t appears in a given relation r. In Tutorial D:

t ∊ r

This expression returns TRUE if t appears in r and FALSE otherwise. The symbol “∊” (a stylized Greek epsilon) denotes the set membership operator; the expression t ∊ r can be read as “t [is] in r” or “t appears in r.” In fact, as you’ve probably realized, “∊” is essentially SQL’s IN—except that the left operand of SQL’s IN is usually a scalar, not a row, which means there’s some coercion going on once again (i.e., the scalar is coerced to the row that contains it).⁵ Here’s an example (“Get suppliers who supply at least one part”):

SELECT SNO , SNAME , STATUS , CITY
FROM   S
WHERE  SNO IN        /* "SNO" coerced to "ROW(SNO)" */
     ( SELECT SNO
       FROM   SP )

As I’m sure you know, SQL also supports NOT IN. The Tutorial D analog is “∉”; in other words, the Tutorial D expression “t ∉ r” means tuple t isn’t in relation r.

RELATIONS ARE n-DIMENSIONAL

I’ve stressed the point several times that, while a relation can be pictured as a table, it isn’t a table. (To say it one more time, a picture of a thing isn’t the same as the thing.) Of course, it can be very convenient to think of a relation as a table; after all, tables are user friendly; indeed, as noted in Chapter 1, it’s the fact that we can think of relations, informally, as tables—sometimes more explicitly as “flat” or “two-dimensional” tables—that makes relational systems intuitively easy to understand and use, and makes it intuitively easy to reason about the way such systems behave. In other words, it’s a very nice property of the relational model that its basic data structure, the relation, has such an intuitively attractive pictorial representation.

Unfortunately, however, many people seem to have been blinded by that attractive pictorial representation into thinking that relations as such are “flat” or “two-dimensional.” But they’re not. Rather, if relation r has n attributes, then each tuple in r represents a point in a certain n-dimensional space (and the relation overall represents a set of such points). For example, each of the five tuples appearing in our usual suppliers relation represents a certain point in a certain 4-dimensional space (the four dimensions corresponding, of course, to the four attributes of that relation), and the relation overall can thus be said to be four-dimensional. Thus, relations in general are n-dimensional, not two-dimensional.⁶ As I’ve written elsewhere (in quite a few places, in fact): Let’s all vow never to say “flat relations” ever again.

RELATIONAL COMPARISONS

Like tuple types, relation types are no exception to the rule that the “=” comparison operator must be defined for every type; that is, given two relations r1 and r2 of the same relation type, we must at least be able to test whether they’re equal. Here’s an example, expressed in Tutorial D as usual, of an equality comparison on relations:

S { CITY } = P { CITY }

The left comparand here is the projection of suppliers on {CITY},⁷ the right comparand is the projection of parts on {CITY}, and the comparison returns TRUE if these two projections are equal, FALSE otherwise. In other words, the comparison (which is a boolean expression) means: “The set of supplier cities is equal to the set of part cities” (and it evaluates to either TRUE or FALSE, of course).

Other comparisons might be useful, too. For example, we might want to test whether r1 includes r2 (meaning every tuple in r2 is also in r1), or whether r1 properly includes r2 (meaning every tuple in r2 is also in r1 but r1 contains at least one tuple that isn’t in r2). Here’s an example involving proper inclusion:

S { SNO } ⊃ SP { SNO }

The symbol “⊃” here means “properly includes” (or, equivalently, “is a proper superset of”). The meaning of this expression (considerably paraphrased) is: “At least one supplier supplies no parts at all” (which again necessarily evaluates to either TRUE or FALSE).

Other useful relational comparison operators include “⊇” (“includes,” “is a superset of”), “⊆” (“is included in,” “is a subset of”), and “⊂” (“is properly included in,” “is a proper subset of”). Note: Of these various operators, the “⊆” operator in particular is usually referred to, a trifle arbitrarily, as the relational inclusion operator.

Finally, one extremely common requirement is to be able to perform an “=” comparison between some given relation r and an empty relation of the same type—in other words, a test to see whether r is empty. So it’s convenient to define a shorthand:

IS_EMPTY ( r )

This expression is defined to return TRUE if relation r is empty and FALSE otherwise.⁸ I’ll be relying on it heavily in chapters to come (especially Chapter 8). The inverse operator can be useful too:

IS_NOT_EMPTY ( r )

This expression is logically equivalent to NOT (IS_EMPTY(r)).

TABLE_DUM AND TABLE_DEE

Recall from the discussion of tuples earlier in this chapter that the empty set is a subset of every set, and hence that there’s such a thing as the empty tuple (also called the 0-tuple), and of course that tuple has an empty heading. For exactly the same reason, a relation too might have an empty heading—a heading is a set of attributes, and there’s no reason why that set shouldn’t be empty. Such a relation is of type RELATION { }, and its degree is zero.

Let r be a relation of degree zero, then. How many such relations are there? The answer is: Just two. First, r might be empty (meaning it contains no tuples)—remember there’s always exactly one empty relation of any given type. Second, if r isn’t empty, then the tuples it contains must all be 0-tuples. But there’s only one 0-tuple!—equivalently, all 0-tuples are duplicates of one another—and so r can’t possibly contain more than one of them. So there are indeed just two relations with no attributes: one with just one tuple, and one with no tuples at all. For obvious reasons, I’m not going to try drawing pictures of these relations (in fact, this is the one place where the idea of thinking of relations as tables breaks down completely).

Now, you might well be thinking: So what? Why on earth would I ever want a relation that has no attributes at all? Even if they’re mathematically respectable (which they are), surely they’re of no practical significance? In fact, however, it turns out they’re of very great practical significance indeed: so much so, that we have pet names for them—we call them TABLE_DUM and TABLE_DEE, or DUM and DEE for short (DUM is the empty one, DEE is the one with one tuple). And what makes them so significant is their meanings, which are FALSE (or no) for DUM and TRUE (or yes) for DEE. They have the most fundamental meanings of all. Note: I’ll be discussing the whole notion of relations and their meaning in much more detail in Chapters 5 and 6.

By the way, a good way to remember which is which is this: DEE and yes both have an “E”; DUM and no don’t.

Now, I haven’t covered enough in this book yet to show concrete examples of DUM and DEE in action, as it were, but we’ll see plenty of examples of their use in the pages ahead. Here I’ll just mention one point that should make at least intuitive sense at this early juncture: These two relations (especially TABLE_DEE) play a role in the relational algebra that’s analogous to the role played by zero in conventional arithmetic. And we all know how important zero is; in fact, it’s hard to imagine an arithmetic without zero (the ancient Romans tried, but it got them into a lot of trouble). Well, it should be equally hard to imagine a relational algebra without TABLE_DEE. Which brings us to SQL ... SQL, since it has no counterpart to the 0-tuple, clearly (but unfortunately) has no counterpart to TABLE_DUM or TABLE_DEE either.

Aside: Perhaps I should say a little more about those pet names TABLE_DUM and TABLE_DEE. First, for the benefit of non English speakers, I should explain that they’re basically just wordplay on Tweedledum and Tweedledee, who were originally characters in a children’s nursery rhyme and were subsequently incorporated into Lewis Carroll’s Through the Looking-Glass and What Alice Found There (1871). Second, the names are perhaps a little unfortunate, given that these two relations are precisely the ones that can’t reasonably be depicted as tables! But we’ve been using them (the names, that is) for so long now in the relational world that I don’t think we’re going to change them. End of aside.

TABLES IN SQL

Note: Throughout this section, by the term table I mean a table value specifically—an SQL table value, that is—and not a table variable (which is what CREATE TABLE and CREATE VIEW create). I’ll discuss table variables in Chapter 5.

Now, I explained in Chapter 2 that SQL doesn’t really have anything analogous to the concept of a relation type at all; instead, an SQL table is just a collection of rows, where (a) the rows are of a certain row type and (b) the collection is (in general) a bag, not necessarily a set. It follows that SQL doesn’t really have anything analogous to the RELATION type generator, either—though as we know from Chapter 2 it does support other type generators, including ROW, ARRAY, and MULTISET. It does, however, have something called a table value constructor that’s analogous, somewhat, to a relation selector. Here’s an example:

VALUES ( 1 , 2 ), ( 2 , 1 ), ( 1 , 1 ), ( 1 , 2 )

This expression (actually it’s a table literal, though SQL doesn’t use this term) evaluates to a table with four—not three!—rows and two columns. What’s more, those columns have no names. As I’ve already explained, the columns of an SQL table are ordered, left to right; as a consequence, those columns can be, and sometimes have to be, identified by ordinal position instead of name.

By way of another example, consider the following table value constructor invocation:

VALUES ( 'S1' , 'Smith' , 20 , 'London' ) ,
       ( 'S2' , 'Jones' , 10 , 'Paris'  ) ,
       ( 'S3' , 'Blake' , 30 , 'Paris'  ) ,
       ( 'S4' , 'Clark' , 20 , 'London' ) ,
       ( 'S5' , 'Adams' , 30 , 'Athens' )

In order for this expression to be regarded as a fair approximation to its relational counterpart (i.e., a relation literal denoting the relation that’s the current value of relvar S as shown in Fig. 1.3), we must:

Ensure that if the ith ordinal position, within any of the rows specified by the VALUES expression, corresponds to attribute A of the intended relational counterpart (viz., the suppliers relation, in the example), then the ith ordinal position in all of those rows corresponds to that same attribute A.
Ensure that all of the values in the ith ordinal position are values of the type appropriate for that attribute A.
Ensure that the same row isn’t specified twice.

Note: As you know, in the relational model a heading is a set of attributes. In SQL, by contrast, because columns have a left to right ordering, it would be more correct to regard a heading as a sequence, not a set, of attributes (or columns, rather). If the recommendations of this book are followed, however, this logical difference can mostly (?) be ignored.

What about table assignment and comparison operators? Well, table assignment is a big topic, and I’ll defer the details to Chapter 5. As for table comparisons, SQL has no direct support—not even for equality!—but workarounds are available. For example, here’s an SQL counterpart to the Tutorial D comparison S{CITY} = P{CITY}:

NOT EXISTS ( SELECT CITY FROM S
             EXCEPT
             SELECT CITY FROM P )
AND
NOT EXISTS ( SELECT CITY FROM P
             EXCEPT
             SELECT CITY FROM S )

And here’s a counterpart to the Tutorial D comparison S{SNO} ⊃ SP{SNO}:

EXISTS ( SELECT SNO FROM S
         EXCEPT
         SELECT SNO FROM SP )
AND
NOT EXISTS ( SELECT SNO FROM SP
             EXCEPT
             SELECT SNO FROM S )

Aside: I said above that SQL has no direct support for table equality comparisons, and that’s true. As a consequence, the following putative SQL analog of the Tutorial D comparison S{CITY} = P{CITY}—

( SELECT DISTINCT CITY FROM S ) =
( SELECT DISTINCT CITY FROM P )

—is illegal. But the odd thing is, SQL does have direct support for equality comparisons on bags, including as a special case bags of rows in particular.⁹ Moreover, it also has an operator for converting a table to a bag of rows.¹⁰ So we can do the desired equality comparison by converting the tables to bags of rows and then comparing those bags. So far so good ... Believe it or not, however, the operator that converts a table to a bag of rows is called TABLE (!). Thus, the desired comparison can legitimately be formulated in SQL as follows:

TABLE ( SELECT DISTINCT CITY FROM S ) =
TABLE ( SELECT DISTINCT CITY FROM P )

But this trick only works for equality comparisons—SQL has no direct support for “⊃” etc., certainly not for tables, and not for bags of rows either.¹¹ End of aside.

COLUMN NAMING IN SQL

In the relational model, (a) every attribute of every relation has a name (i.e., anonymous attributes are prohibited), and (b) such names are unique within the relevant relation (i.e., duplicate attribute names are prohibited). In SQL, analogous rules are enforced sometimes, but not always. To be specific, they’re enforced for the tables that happen to be the current values of table variables—defined via CREATE TABLE or CREATE VIEW—but not for the tables that result from evaluation of some table expression.¹² Strong recommendation: Use AS specifications whenever necessary to give proper column names to columns that otherwise (a) wouldn’t have a name at all or (b) would have a name that wasn’t unique. Here are some examples:

SELECT DISTINCT SNAME , 'Supplier' AS TAG
FROM S
SELECT DISTINCT SNAME , 2 * STATUS AS DOUBLE_STATUS
FROM S
SELECT MAX ( WEIGHT ) AS MBW
FROM P
WHERE COLOR = 'Blue'
CREATE VIEW SDS
AS ( SELECT DISTINCT SNAME , 2 * STATUS AS DOUBLE_STATUS
FROM S ) ;
SELECT DISTINCT S.CITY AS SCITY , P.CITY AS PCITY
FROM   S , SP , P
WHERE  S.SNO = SP.SNO
AND    SP.PNO = P.PNO
SELECT temp.*
FROM ( SELECT * FROM S JOIN P ON S.CITY > P.CITY )
AS temp ( SNO , SNAME , STATUS , SCITY ,
PNO , PNAME , COLOR , WEIGHT , PCITY )

Of course, the foregoing recommendation can safely be ignored if there’s no subsequent need to reference the otherwise anonymous or nonuniquely named columns. For example, the third of the foregoing examples could safely be abbreviated in some circumstances (in a WHERE or HAVING clause, perhaps) to just:

SELECT MAX ( WEIGHT )
FROM P
WHERE COLOR = 'Blue'

Perhaps more important, note that the recommendation unfortunately can’t be followed at all in the case of tables specified by means of VALUES expressions. However, workarounds are available. For example, the following is legal:

SELECT temp.SNO , temp.SNAME , temp.STATUS , temp.CITY
FROM ( VALUES ( 'S1' , 'Smith' , 20 , 'London' ) ,
              ( 'S2' , 'Jones' , 10 , 'Paris' ) ,
              ( 'S3' , 'Blake' , 30 , 'Paris' ) ,
              ( 'S4' , 'Clark' , 20 , 'London' ) ,
              ( 'S5' , 'Adams' , 30 , 'Athens' ) )
       AS temp ( SNO , SNAME , STATUS , CITY )

Explanation: I’ve enclosed the VALUES expression in parentheses (thereby making it a subquery), attached an AS specification, and specified column names as well as a “correlation name” in that AS specification (see Chapter 12; see also Example 6 above).

Important note: The operators of the relational algebra rely on proper attribute naming in a variety of ways. For example, as we’ll see in Chapter 6, the relational UNION operator requires its operands to have the same heading (and hence the same attribute names), and the result then has the same heading as well. One advantage of this scheme is precisely that it avoids the complexities caused, in SQL, by reliance on ordinal position! In order to use SQL relationally, therefore, you should apply the same discipline to the SQL analogs of those relational operators. Strong recommendation: As a prerequisite to enforcing such a discipline, if two columns in SQL represent “the same kind of information,” give them the same name wherever possible. (That’s why, for example, the two supplier number columns in our running example, the suppliers-and-parts database, are both called SNO and not, say, SNO in one table and SNUM in the other.) Conversely, if two columns represent different kinds of information, it’s usually a good idea to give them different names.

The only case where it’s impossible to follow the foregoing recommendation is when two columns in the same table both represent the same kind of information. For example, consider an SQL table EMP with (among other things) columns representing “employee number” and “manager number,” respectively, where a manager number is itself another employee number. Obviously, these two columns will have to have different names, say ENO and MNO, respectively. As a consequence, some column renaming will sometimes have to be done, as in the following join example (note the specification “ENO AS MNO” in the third line):

( SELECT ENO , MNO FROM EMP ) AS temp1
  NATURAL JOIN
( SELECT ENO AS MNO , ... FROM EMP ) AS temp2
/* where "..." is EMP columns other than ENO and MNO,              */
/* and the AS specifications at the end of lines 1 and 3 are there */
/* because they're required by the SQL standard (see Chapter 12)   */

Such renaming will also have to be done, if you want to use SQL relationally, if columns simply haven’t been named appropriately in the first place (e.g., if you’re confronted with a database that’s been defined by somebody else—doubtless a common state of affairs in practice). A strategy you might want to consider in such circumstances is the following:

For each table T in the database, define a view V that’s identical to table T except possibly for some column renaming.
Make sure all views so defined abide by the column naming discipline described above.
Operate in terms of those views instead of the underlying tables.

Unfortunately, it’s impossible to ignore the fact 100 percent that columns do have an ordinal position in SQL. (Of course, it’s precisely because of that fact that SQL is able to get away with its anonymous columns and duplicate column names.) Note in particular that columns still have an ordinal position in SQL even when they don’t need to (i.e., when they’re all properly named anyway); this observation applies to columns in base tables and views in particular. Strong recommendation: Never write SQL code that relies on such ordinal positioning. Examples of where SQL attaches significance to such positioning include (but probably aren’t limited to):

SELECT * (see Chapter 12)
The FROM clause, if more than one table is specified (see Chapter 6)
Explicit JOIN operations (see Chapter 6)
UNION, INTERSECT, and EXCEPT operations, if CORRESPONDING isn’t specified (see Chapter 6)
The column name commalist, if specified, following the definition of a range variable (see Chapter 12)
The column name commalist, if specified, in CREATE VIEW (see Chapter 9)
INSERT, if no column name commalist is specified (see Chapter 5)
VALUES expressions as described in the present chapter
Row assignments and comparisons, also as described in the present chapter
ALL and ANY comparisons, if the comparands are of degree greater than one (see Chapter 11)

CONCLUDING REMARKS

In this chapter I’ve given precise definitions for the fundamental concepts tuple and relation. As I said earlier, those definitions can be a little daunting at first, but I hope you were able to make sense of them after having read the first two chapters. I also discussed tuple and relation types, and tuple and relation selectors and comparisons, as well as a number of important consequences of the definitions; in particular, I briefly described the important relations TABLE_DUM and TABLE_DEE. I also discussed the SQL counterparts of all of these notions, where such counterparts exist. In closing, I’d like to stress the importance of the recommendations, in the section immediately preceding this one, regarding column naming in SQL. Later chapters will rely heavily on those recommendations.

EXERCISES

3.1 Define as precisely as you can the terms attribute, body, cardinality, degree, heading, relation, relation type, and tuple.

3.2 State as precisely as you can what it means for (a) two tuples to be equal; (b) two relations to be equal.

3.3 Write Tutorial D tuple selector invocations for a typical tuple from (a) the parts relvar, (b) the shipments relvar. Also show SQL counterparts to those selector invocations.

3.4 Write a typical Tutorial D relation selector invocation. Also show an SQL counterpart to that selector invocation.

3.5 (This is essentially a repeat of Exercise 1.8 from Chapter 1, but you should be able to give a more comprehensive answer now.) There are many differences between a relation and a table. List as many as you can.

3.6 The attributes of a tuple can be of any type whatsoever (well, almost; can you think of any exceptions?). Give an example of (a) a tuple with a tuple valued attribute (TVA), (b) a tuple with a relation valued attribute (RVA).

3.7 Give an example of a relation with (a) one RVA, (b) two RVAs. Also give two more relations that represent the same information as those relations but don’t involve RVAs. Also give an example of a relation with an RVA such that there’s no relation that represents precisely the same information but has no RVA.

3.8 Explain the relations TABLE_DUM and TABLE_DEE in your own words. Why exactly doesn’t SQL support them?

3.9 As we saw in the body of the chapter, TABLE_DEE means TRUE and TABLE_DUM means FALSE. Do these facts mean we could dispense with the usual BOOLEAN data type? Also, DEE and DUM are relations, not relvars. Do you think it would ever make sense to define a relvar of degree zero?

3.10 What if any is the logical difference—as opposed to the obvious syntactic difference— between the following two SQL expressions?

VALUES ( 1 , 2 ), ( 2 , 1 ), ( 1 , 1 ), ( 1 , 2 )

VALUES ( ( 1 , 2 ), ( 2 , 1 ), ( 1 , 1 ), ( 1 , 2 ) )

3.11 What exactly does the following SQL expression mean?

SELECT SNO
FROM S
WHERE ( NOT ( ( STATUS , SNO ) <= ( 20 , 'S4' ) ) ) IS NOT FALSE

3.12 Explain in your own words what it means to say that relations are n-dimensional.

3.13 List as many situations as you can think of in which SQL regards left to right column ordering as significant.

3.14 Give an SQL analog for the Tutorial D expression IS_NOT_EMPTY(r).

3.15 I said in the body of the chapter that a relation selector invocation in Tutorial D consists of the keyword RELATION, followed by a commalist enclosed in braces of tuple expressions (and those tuple expressions must all be of the same tuple type)—and I implied, though I didn’t actually say as much, that the type of the relation denoted by the overall expression was RELATION H, where TUPLE H was the common type of all of the specified tuple expressions. But what if the set of specified tuple expressions is empty?—in other words, what if the relation being specified is empty? How can its type be determined?

Following on from the foregoing, how can we specify an empty table in SQL?

3.16 A tuple is a set (a set of components); so do you think it might make sense to define versions of the usual set operators (union, intersection, etc.) that apply to tuples?

3.17 State in your own words, as carefully as you can, the discipline described in the body of the chapter regarding SQL column names.

3.18 The column naming discipline referred to in the previous exercise relies on the use of AS specifications. But such specifications can appear in SQL in many different contexts; moreover, the syntax sometimes takes the form “X AS <something>” and sometimes “<something> AS X” (if you see what I mean); and the keyword is sometimes optional and sometimes mandatory.¹³ List all of the contexts in which AS can appear, showing which are of the form “X AS ...” and which of the form “... AS X”, and in which cases the keyword is optional.

ANSWERS

3.1 See the body of the chapter.

3.2 Two values of any kind are equal if and only if they’re the very same value (meaning they must be of the same type, a fortiori). In particular, therefore, (a) two tuples t and t′ are equal if and only if they have the same attributes A1, A2, ..., An and, for all i (i = 1, 2, ..., n), the value v of Ai in t is equal to the value v′ of Ai in t′; (b) two relations r and r′ are equal if and only if they have the same heading and the same body (i.e., their headings are equal and their bodies are equal).

3.3 Tutorial D tuple selector invocations (actually literals):

TUPLE { PNO 'P1' , PNAME 'Nut' ,
COLOR 'Red' , WEIGHT 12.0 , CITY 'London' }

TUPLE { SNO 'S1' , PNO 'P1' , QTY 300 }

SQL analogs (“row value constructor” invocations):

ROW ( 'P1' , 'Nut' , 'Red' , 12.0 , 'London' )

ROW ( 'S1' , 'P1' , 300 )

Observe the lack of column names (or field names, rather, to use the official SQL term) and the reliance on left to right ordering in these SQL expressions. Note: The keyword ROW could be omitted in both cases without changing the semantics.

3.4 The following selector invocation (actually a literal) denotes a relation of two tuples:

RELATION { TUPLE { SNO 'S1' , PNO 'P1' , QTY 300 } ,
TUPLE { SNO 'S1' , PNO 'P2' , QTY 200 } }

SQL analog (a “table value constructor” invocation, involving two “row value constructor” invocations):

VALUES ROW ( 'S1' , 'P1' , 300 ) ,
ROW ( 'S1' , 'P2' , 200 )

Again the keyword ROW could be omitted in both cases without changing the semantics. By the way, the fact that there are no parentheses enclosing the commalist of row value constructor invocations isn’t an error. In fact, the following SQL expression—

VALUES ( ROW ( 'S1' , 'P1' , 300 ) ,
ROW ( 'S1' , 'P2' , 200 ) )

(which is certainly legal, syntactically speaking)—denotes something entirely different! See the answer to Exercise 3.10 below.

3.5 The list that follows is based on one in my book An Introduction to Database Systems (see Appendix G).

Each attribute in the heading of a relation involves a type name, but those type names are usually omitted from tables (where by tables I mean tabular pictures of relations).
Each component of each tuple in the body of a relation involves a type name and an attribute name, but those type and attribute names are usually omitted from tabular pictures.
Each attribute value in each tuple in the body of a relation is a value of the applicable type, but those values (or literals denoting those values, rather) are usually shown in some abbreviated form—for example, S1 instead of 'S1'—in tabular pictures.
The columns of a table have a left to right ordering, but the attributes of a relation don’t. One implication of this point is that (unlike attributes) columns can have duplicate names, or even no names at all. For example, consider the SQL expression

SELECT DISTINCT S.CITY , S.STATUS * 2 , P.CITY
FROM S , P

What are the column names in the result of this expression?
The rows of a table have a top to bottom ordering, but the tuples of a relation don’t.
A table might contain duplicate rows, but a relation never contains duplicate tuples.
Tables in SQL always have at least one column, while relations are allowed to have no attributes at all (see the section “TABLE_DUM and TABLE_DEE” in the body of the chapter).
Tables in SQL are allowed to include nulls, which relations most certainly aren’t.
Tables (in the sense of tabular pictures) are “flat” or two-dimensional, but relations are n-dimensional.

3.6 One exception is as follows: Since no database relation can have an attribute of any pointer type, no tuple in such a relation can have an attribute of any pointer type either. The other exception is a little harder to state, but what it boils down to is this: If tuple t has heading H, then no attribute of t can be defined in terms of any tuple or relation type with that same heading H, at any level of nesting.

Here’s a Tutorial D expression denoting a tuple with a tuple valued attribute (TVA) called ADDR:

TUPLE { NAME 'Superman' ,
ADDR TUPLE { STREET '1600 Pennsylvania Ave. ' ,
CITY 'Washington' , STATE 'DC' , ZIP '20500' } }

And here’s a Tutorial D expression denoting a tuple with a relation valued attribute (RVA) called PNO_REL:

TUPLE { SNO 'S2' ,
PNO_REL RELATION { TUPLE { PNO 'P1' } ,
TUPLE { PNO 'P2' } } }

3.7 For a relation with one RVA, see relation R4 in Fig. 2.2 in Chapter 2; for an equivalent relation with no RVA, see relation R1 in Fig. 2.1 in Chapter 2. As for one with two RVAs, consider the table on the left below. The intended meaning is: Course CNO can be taught by every teacher TNO in TEACHER (and no other teachers) and uses every textbook XNO in TEXT (and no other textbooks). The table on the right represents a relation without RVAs that conveys the same information.

┌─────┬─────────┬─────────┐    ┌─────┬─────┬─────┐
│ CNO │ TEACHER │ TEXT    │    │ CNO │ TNO │ XNO │
├═════┼─────────┼─────────┤    ├═════┼═════┼═════┤
│     │ ┌─────┐ │ ┌─────┐ │    │ C1  │ T2  │ X1  │
│ C1  │ │ TNO │ │ │ XNO │ │    │ C1  │ T2  │ X2  │
│     │ ├═════┤ │ ├═════┤ │    │ C1  │ T4  │ X1  │
│     │ │ T2  │ │ │ X1  │ │    │ C1  │ T4  │ X2  │
│     │ │ T4  │ │ │ X2  │ │    │ C1  │ T5  │ X1  │
│     │ │ T5  │ │ └─────┘ │    │ C1  │ T5  │ X2  │
│     │ └─────┘ │         │    │ C2  │ T4  │ X2  │
│     │ ┌─────┐ │ ┌─────┐ │    │ C2  │ T4  │ X4  │
│ C2  │ │ TNO │ │ │ XNO │ │    │ C2  │ T4  │ X5  │
│     │ ├═════┤ │ ├═════┤ │    └─────┴─────┴─────┘
│     │ │ T4  │ │ │ X2  │ │
│     │ └─────┘ │ │ X4  │ │
│     │         │ │ X5  │ │
│     │         │ └─────┘ │
└─────┴─────────┴─────────┘

As for a relation with an RVA such that there’s no relation without an RVA that represents precisely the same information, one simple example can be obtained from Fig. 2.2 in Chapter 2 by just replacing the PNO_REL value for (say) supplier S2 by an empty relation:

┌─────┬─────────┐
│ SNO │ PNO_REL │
├═════┼─────────┤
│     │ ┌─────┐ │
│ S2  │ │ PNO │ │
│     │ ├═════┤ │
│     │ └─────┘ │
│     │ ┌─────┐ │
│ S3  │ │ PNO │ │
│     │ ├═════┤ │
│     │ │ P2  │ │
│     │ └─────┘ │
│     │ ┌─────┐ │
│ S4  │ │ PNO │ │
│     │ ├═════┤ │
│     │ │ P2  │ │
│     │ │ P4  │ │
│     │ │ P5  │ │
│     │ └─────┘ │
└─────┴─────────┘

Subsidiary exercise: Why exactly is there no relation without an RVA that represents the same information as the relation just shown?

However, it isn’t necessary to invoke the notion of an empty relation in order to come up with an example of a relation with an RVA such that there’s no relation without an RVA that represents precisely the same information. (Subsidiary exercise: Justify this remark! If you give up, refer to the discussion of the SIBLING example in Chapter 7.)

Note: Perhaps I should elaborate on what it means for two relations to represent the same information. Basically, relations r1 and r2 represent the same information if and only if it’s possible to map r1 into r2 and vice versa by means of operations of the relational algebra.¹⁴ With reference to relations R4 in Fig. 2.2 in Chapter 2 and R1 in Fig. 2.1 in Chapter 2, for example, we have the following (as will in fact be noted again in Chapter 7):

R4 = R1 GROUP { PNO } AS PNO_REL

R1 = R4 UNGROUP PNO_REL

Each relation can thus be defined in terms of the other, and the two therefore do represent the same information. See Chapter 7 for further discussion of the GROUP and UNGROUP operators in particular.

3.8 TABLE_DEE and TABLE_DUM (DEE and DUM for short) are the only relations with no attributes; DEE contains exactly one tuple (the 0-tuple), DUM contains no tuples at all. SQL doesn’t support them because tables in SQL are always required to have at least one column. (In other words, SQL’s version of the relational algebra is like an arithmetic that has no zero.) As for why this is so, your guess is as good as mine.

3.9 (Note: You might want to come back and take another look at this answer after reading Chapter 10.) We need the concept of relations in general before we can have the concept of relations of degree zero in particular. The concept of relations in general depends on predicate logic. Predicate logic depends on propositional logic. Propositional logic depends on the truth values TRUE and FALSE. So if we tried to replace TRUE and FALSE by DEE and DUM, we would be going round in circles!

Also, it would be a little odd to say the least if all boolean expressions suddenly became relational expressions, and host languages thus suddenly all had to support relational data types.

Would it make sense to define a relvar of degree zero? It’s hard but not impossible to imagine a situation in which such a relvar might be useful—but that’s not the point. Rather, the point is that the system shouldn’t include a prohibition against defining such a relvar. If it did, then that fact would constitute a violation of orthogonality, and such violations always come back to bite us eventually.

3.10 The first denotes an SQL table of four rows (three distinct ones, plus a duplicate of one of those three). The second denotes an SQL table of one row, that row consisting of four “field” values all of which are rows in turn. Note that none of the fields involved is named in either case.

3.11 The given expression is semantically equivalent to this one:

SELECT SNO
FROM   S
WHERE  STATUS > 20
OR   ( STATUS = 20 AND SNO > 'S4' )
OR     STATUS IS NULL
OR     SNO IS NULL

3.12 See the body of the chapter.

3.13 See the body of the chapter.

3.14 EXISTS (t), where t is the SQL analog of the relational expression r. Note: Another possibility is (SELECT COUNT(*) FROM (t)) > 0; however, this possibility is slightly deprecated, for reasons to be explained in Chapter 10.

3.15 The complete syntax for a relation selector invocation in Tutorial D is as follows:

RELATION [ <heading> ] { <tuple exp commalist> }

And the syntax for <heading> is as explained in the body of the chapter. Moreover, there’s a syntax rule to the effect that a <heading> must be specified if the <tuple exp commalist> is empty (it can be omitted otherwise). By way of example, therefore, the empty suppliers relation can be specified as follows:

RELATION { SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR } { }

As an aside, I note that TABLE_DEE and TABLE_DUM can be thought of as shorthand for the relation selector invocations RELATION { } { TUPLE { } } and RELATION { } { }, respectively.

As for SQL: The SQL analog of a relation selector invocation is a VALUES expression. The syntax is:

VALUES <row exp commalist>

As you can see, there’s nothing here analogous to the optional <heading> component of a Tutorial D selector invocation. As a consequence, the <row exp commalist> mustn’t be empty, and SQL has no direct way of specifying an empty table. Thus, workarounds are needed. For example, the empty suppliers table might be specified as follows:

( SELECT * FROM S WHERE FALSE )

3.16 Yes! However, we would of course want such operators always to produce a valid tuple as a result (i.e., we would want closure for such operations, just as we have closure for relational operations). For tuple union, for example, we would want the input tuples to be such that attributes with the same name have the same value (and are therefore of the same type, a fortiori). By way of example, let t1 and t2 be a supplier tuple and a shipment tuple, respectively, and let t1 and t2 have the same SNO value. Then the union of t1 and t2, t1 UNION t2, is a tuple of type TUPLE {SNO CHAR, SNAME CHAR, STATUS INTEGER, CITY CHAR, PNO CHAR, QTY INTEGER}, with components as in t1 or t2 or both (as applicable). E.g., if t1 is (S1,Smith, 20,London) and t2 is (S1,P1,300)—to adopt an obvious shorthand notation for tuples—then their union is the tuple (S1,Smith,20,London,P1,300). Note: This operation might reasonably be called tuple join instead of tuple union.

Of course, it’s not just the usual set operators that might reasonably be adapted to tuples specifically—the same goes for certain of the well known relational operators, too. A particularly important example is provided by the tuple projection operator, which is a straightforward adaptation of the relational projection operator. For example, let t be a supplier tuple; then the projection t{SNO,CITY} of t on attributes {SNO,CITY} is that subtuple of t that contains just the SNO and CITY components from t. Likewise, t{CITY} is that subtuple of t that contains just the CITY component from t, and t{ } is that subtuple of t that contains no components at all (in other words, it’s the 0-tuple). In fact, it’s worth noting explicitly that every tuple has a projection on the empty set of attributes whose value is, precisely, the 0-tuple.

3.17 See the body of the chapter.

3.18 AS is used in SELECT clauses (to introduce column names); CREATE VIEW (ditto); FROM clauses (to introduce range variable names—by contrast, the syntax used to introduce column names in this context doesn’t use AS); WITH specifications; and many other contexts not discussed in this book.

You were also asked (a) in which cases the keyword was optional; (b) in which cases the AS specification took the form “<something> AS name”; and (c) in which cases it took the form “name AS <something>“: No answer provided.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for SQL and Relational Theory

Create new playlist

Sign In

Sign Up

Chapter 3

WHAT’S A TUPLE?

Consequences of the Definitions

ROWS IN SQL

WHAT’S A RELATION?

Consequences of the Definitions

RELATIONS AND THEIR BODIES

RELATIONS ARE n-DIMENSIONAL

RELATIONAL COMPARISONS

TABLE_DUM AND TABLE_DEE

TABLES IN SQL

COLUMN NAMING IN SQL

CONCLUDING REMARKS

EXERCISES

ANSWERS

Table of Contents for
SQL and Relational Theory