SQL and Relational Theory

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 2

Types and Domains

A major purpose of type systems is to avoid embarrassing questions about representations, and to forbid situations in which these questions might come up.

—Luca Cardelli and Peter Wegner: “On Understanding Types, Data Abstraction, and Polymorphism”
ACM Comp. Surv. 17, No. 4 (December 1985)

This chapter is related only tangentially to the main theme of the book. Types are certainly fundamental, and the ideas discussed in this chapter are certainly important (they might help to dispel certain common misconceptions, too). However, type theory as such isn’t a specially relational topic, and type-related matters don’t seem—at least on the surface—to have much to do with SQL daily life, as it were. What’s more, while there are certainly SQL problems in this area, there isn’t much you can do about them, for the most part; I mean, there isn’t much concrete advice I can offer to help with the goal of using SQL relationally (though there is some, as you’ll see). So you might want to give this chapter just a “once over lightly” reading on a first pass, and come back to it after you’ve absorbed more of the material from later chapters.

TYPES AND RELATIONS

Data types (types for short) are fundamental to computer science. Relational theory in particular requires a supporting type theory, because relations are defined over types—that is, every attribute of every relation is defined to be of some type, and the same is true of relvars too, of course. For example, I’m going to assume throughout this book that attribute STATUS of the suppliers relvar S is defined to be of type INTEGER. Under that assumption, every relation that’s a possible value for relvar S must also have a STATUS attribute of type INTEGER— which means in turn that every tuple in such a relation must also have a STATUS attribute that’s of type INTEGER, which means in turn that the tuple in question must have a STATUS value that’s an integer.

I’ll be discussing such matters in more detail later in the chapter. For now, let me just say that—with certain important exceptions, which I’ll also be discussing later—a relational attribute (i.e., an attribute of a relation or relvar) can be of any type whatsoever, implying among other things that such types can be arbitrarily complex. In particular, those types can be either system or user defined. In this book, however, I don’t plan to say very much about user defined types as such, because:

The whole point about user defined types—from the point of view of the user who is merely using them, that is, as opposed to the user who actually has the job of defining them—is that they’re supposed to behave just like system defined types anyway.
Few users will ever be faced with the job of defining a type (and even for those who are, the business of defining a type doesn’t involve much in the way of specifically relational considerations in any case).

From this point forward, therefore, you can take the term type to mean a system defined type specifically, unless the context demands otherwise. The relational model prescribes just one such type, BOOLEAN (the most fundamental type of all). That type contains exactly two values: two truth values, to be specific, denoted by the literals TRUE and FALSE, respectively. However, real systems will support a variety of other system defined types as well, of course, and I’ll assume for definiteness that types INTEGER (integers), RATIONAL (rational numbers), and CHAR (character strings of arbitrary length) are all system defined types. Note: I’ll discuss the system defined types supported by SQL in particular later in the chapter.

Aside: A rational number is a number that can be expressed as the ratio of two integers (e.g., 3/8, 593/370, -4/3); an irrational number is a number that can’t be so expressed (e.g., π, √2). Rational numbers in turn fall into two categories: (a) those whose fractional part can be expressed in decimal notation by means of a finite sequence of digits followed by an infinite sequence of zeros, which can be ignored without loss (e.g., 3/8 = 0.375000...), and (b) those whose fractional part can be expressed in decimal notation by means of a possibly empty finite sequence of digits followed by another finite sequence of digits, the first of which is nonzero, that infinitely repeats (e.g., 593/370 = 1.60270270...). By contrast, the fractional part of an irrational number in decimal notation consists of an infinite, nonrepeating sequence of digits (e.g., π = 3.14159..., √2 = 1.41421...). Now, many programming languages support a numeric type they call REAL. A real number is a number that’s either rational or irrational; computers being finite, however, the only real numbers they’re capable of representing precisely are necessarily rational ones.¹ Hence Tutorial D’s choice of the keyword RATIONAL. End of aside.

In the interest of historical accuracy, I should now explain that when Codd first defined the relational model, he said relations were defined over domains, not types. In fact, however, domains and types are exactly the same thing. Now, you can take this claim as a position statement on my part, if you like, but I want to present a series of arguments in support of that position. I’ll start with the relational model as Codd originally defined it; thus, I’ll use the term domain, not type, until further notice. There are two major topics I want to discuss, one in each of the next two sections:

Equality comparisons and “domain check override”: This part of the discussion I hope will convince you that domains really are types.
Data value atomicity and first normal form: And this part I hope will convince you that the types in question can be arbitrarily complex.

EQUALITY COMPARISONS

Despite what I said a few moments ago about ignoring user defined types, I’m going to assume in the present section, purely for the sake of the example, that the supplier number (SNO) attributes in relvars S and SP are of some user defined type—sorry, domain—which I’ll assume for simplicity is called SNO as well. Likewise, I’m going to assume that the part number (PNO) attributes in relvars P and SP are also of a user defined type (or domain) with the same name, PNO. Please note that these assumptions aren’t crucial to my argument; it’s just that I think they make the argument a little more convincing, and perhaps easier to follow.

I’ll start with the fact that, “as everyone knows,” two values can be compared for equality in the relational model only if they come from the same domain. For example, the following comparison (which might be part of the WHERE clause in some SQL query) is obviously valid:

SP.SNO = S.SNO /* OK */

By contrast, this one obviously (?) isn’t:

SP.PNO = S.SNO /* not OK */

But why isn’t it? Answer: Because part numbers and supplier numbers are different kinds of things—they’re defined on different domains. So the general idea is that the DBMS² should reject any attempt to perform any relational operation (join, union, whatever) that involves, either explicitly or implicitly, an equality comparison between values from different domains. For example, suppose some user wants to find suppliers (like supplier S5 in the sample values of Fig. 1.3 in Chapter 1) who currently supply no parts at all. The following might be an attempt to formulate this query in SQL:

SELECT S.SNO , S.SNAME , S.STATUS , S.CITY
FROM   S
WHERE  NOT EXISTS
     ( SELECT *
       FROM   SP
       WHERE  SP.PNO = S.SNO )      /* not OK */

(There’s no terminating semicolon here because this is an expression, not a statement. See Exercise 2.24 at the end of the chapter.)

As the comment says, this formulation is certainly not OK. The reason is that, in the last line, the user presumably meant to say WHERE SP.SNO = S.SNO, but by mistake—probably just a slip of the typing fingers—he or she said WHERE SP.PNO = S.SNO instead. And, given that we’re indeed talking about a simple typo (probably), it would be a friendly act on the part of the DBMS to interrupt the user at this point, highlight the error, and perhaps ask if the user would like to correct it before proceeding.

Now, I don’t know of any SQL product that actually behaves in the way I’ve just suggested; in today’s products, depending on exactly how you’ve set up the database, either the query will simply fail or it’ll give the wrong answer. Well ... not exactly the wrong answer, perhaps, but the right answer to the wrong question. (Does that make you feel any better?)

To repeat, therefore, the DBMS should reject a comparison like SP.PNO = S.SNO if it isn’t valid. However, Codd felt there should be a way in such a situation for the user to make the DBMS go ahead and do the comparison anyway, even though it’s apparently not valid, on the grounds that sometimes the user will know more than the DBMS does. Now, it’s hard for me to do justice to this idea, because I frankly don’t think it makes sense—but let me give it a try. Suppose it’s your job to design a database involving, let’s say, customers and suppliers; and you therefore decide to have a domain of customer numbers and a domain of supplier numbers; and you build your database that way, and start using it, and everything works just fine for a year or two. Then, one day, one of your users comes along with a query you never heard before— namely: “Are any of our customers also suppliers to us?” Observe that this is a perfectly reasonable query; observe too that it might involve a comparison between a customer number and a supplier number (a cross domain comparison) to see if they’re equal. And if it does, well, the system certainly mustn’t prevent you from doing that comparison, because (of course) the system certainly mustn’t prevent you from posing a reasonable query.

On the basis of such arguments, Codd proposed what he called “domain check override” (DCO) versions of certain of his relational operators. A DCO version of join, for example, would perform the join even if the joining attributes were defined on different domains. In SQL terms, we might imagine this proposal being realized by means of a new clause, IGNORE DOMAIN CHECKS, that could be included in an SQL query as in this example:

SELECT ...
FROM ...
WHERE CNO = SNO
IGNORE DOMAIN CHECKS

And this new clause would be separately authorizable—most users wouldn’t be allowed to use it (perhaps only the DBA³ would be allowed to use it).

Before analyzing the DCO idea in detail, I want to look at a simpler example. Consider the following two SQL queries on the suppliers-and-parts database:

SELECT ...                   │    SELECT ...
FROM   P , SP                │    FROM   P , SP
WHERE  P.WEIGHT = SP.QTY     │    WHERE  P.WEIGHT - SP.QTY = 0

Assuming, reasonably enough, that weights and quantities are defined on different domains, the query on the left is clearly invalid. But what about the one on the right? According to Codd, that one’s valid! In his book The Relational Model for Database Management Version 2 (Addison-Wesley, 1990), page 47, he says that in such a situation “the DBMS [merely] checks that the basic data types are the same”; in the case at hand, those “basic data types” are all just numbers, loosely speaking, and so that check succeeds.

To me, this conclusion is unacceptable. Clearly, the expressions P.WEIGHT = SP.QTY and P.WEIGHT - SP.QTY = 0 both mean essentially the same thing. Surely, therefore, they must both be valid or both be invalid; the idea that one might be valid and the other not surely makes no sense. So it seems to me there’s something strange about Codd-style domain checking in the first place, before we even get to domain check override. (In essence, in fact, Codd-style domain checking applies only in the very special case where both comparands are specified as simple attribute references. Observe that the comparison P.WEIGHT = SP.QTY falls into this special category but the comparison P.WEIGHT - SP.QTY = 0 doesn’t.)

Let’s look at some even simpler examples. Consider the following comparisons (each of which might appear as part of an SQL WHERE clause, for example):

S.SNO = 'X4' P.PNO = 'X4' S.SNO = P.PNO

I hope you agree it’s at least plausible that the first two of these could be valid (and evaluate successfully, and possibly even give TRUE) and the third not. But if so, then I hope you also agree there’s something strange going on; apparently, we can have three values a, b, and c such that a = c is true and b = c is true, but as for a = b—well, we can’t even do the comparison, let alone have it come out true! So what’s going on?

I return now to the fact that attributes S.SNO and P.PNO are defined on domains SNO and PNO, respectively, and my claim that domains are actually types; as previously noted, in fact, I’m assuming for the sake of the present discussion that those particular domains SNO and PNO are actually user defined types. Now, it’s possible (even likely) that those user defined types are both physically represented in terms of the system defined type CHAR; in fact, let’s assume such is the case, just to be definite. However, those representations are part of the implementation, not the model—they’re irrelevant to the user, and as we saw in Chapter 1 they’re supposed to be hidden from the user. In particular, therefore, the operators that apply to supplier numbers and part numbers are the operators defined in connection with those types, not the operators that happen to be defined in connection with type CHAR (see the section “What’s a Type?” later in this chapter). For example, we can concatenate two character strings, but we probably can’t concatenate two supplier numbers (we could do this latter only if concatenation were an operator defined in connection with type SNO).

As the foregoing paragraph suggests, however, when we define a type, we do also have to define the operators that can be used in connection with values and variables of the type in question. And one operator we must define is what’s called a selector operator, which allows us to select, or specify, an arbitrary value of the type in question.⁴ In the case of type SNO, for example, the selector (which in practice would probably also be called SNO) allows us to select the particular SNO value that has some specified CHAR representation. Here’s an example:

SNO('S1')

This expression is an invocation of the SNO selector, and it returns a certain supplier number: namely, the one represented by the character string 'S1'. Likewise, the expression

PNO('P1')

is an invocation of the PNO selector, and it returns a certain part number: namely, the one represented by the character string 'P1'. In other words, the SNO and PNO selectors effectively work by taking a certain CHAR value and converting it to a certain SNO value and a certain PNO value, respectively.

Now let’s get back to the comparison S.SNO = 'X4'. As you can see, the comparands here are of different types (types SNO and CHAR, to be specific; in fact, 'X4' is a character string literal). Since they’re of different types, they certainly can’t be equal (recall from the beginning of the present section that two values can be compared for equality “only if they come from the same domain”). But the system does at least know there’s an operator—namely, the SNO selector—that effectively performs CHAR to SNO conversions. So it can invoke that operator, implicitly, to convert the CHAR comparand to a supplier number, thereby effectively replacing the original comparison by this one:

S.SNO = SNO('X4')

Now we’re comparing two supplier numbers, which is legitimate.

In the same kind of way, the system can effectively replace the comparison P.PNO = 'X4' by this one:

P.PNO = PNO('X4')

But in the case of the comparison S.SNO = P.PNO, there’s no conversion operator known to the system—at least, let’s assume not—that will convert a supplier number to a part number or the other way around, and so the comparison fails on a type error: The comparands are of different types, and there’s no way to make them be of the same type.

Note: Implicit type conversion as illustrated in the foregoing examples is often called coercion in the literature. In the first example, therefore, we can say the character string 'X4' is coerced to type SNO; in the second it’s coerced to type PNO. I’ll have a little more to say about coercion in SQL in particular in the section “Type Checking and Coercion in SQL,” later.

To continue with the example: Another operator we must define when we define a type like SNO or PNO is what’s called, generically, a THE_ operator, which—in the case at hand— effectively converts a given SNO or PNO value to the character string (or whatever else it is) that’s used to represent it.⁵ Assume for the sake of the example that the THE_ operators for types SNO and PNO are called THE_SC and THE_PC, respectively. Then, if we really did want to compare S.SNO and P.PNO for equality, the only sense I can make of that requirement is that we want to test whether the corresponding character string representations are the same, which we might do like this:

THE_SC ( S.SNO ) = THE_PC ( P.PNO )

In other words: Convert the supplier number to a string, convert the part number to a string, and compare the two strings.

As I’m sure you can see, the mechanism I’ve been sketching here, involving selectors and THE_ operators, effectively provides both (a) the domain checking we want in the first place and (b) a way of overriding that checking, when desired, in the second place. Moreover, it does all this in a clean, fully orthogonal, non ad hoc manner.⁶ By contrast, domain check override doesn’t really do the job; in fact, it doesn’t really make sense at all, because it confuses types and representations (as noted previously, types are a model concept, representations are an implementation concept).

Now, you might have realized that what I’m talking about is here is what’s known in language circles as strong typing. Different writers have slightly different definitions for this term, but basically it means that (a) everything—in particular, every value and every variable—has a type, and (b) whenever we try to perform some operation, the system checks that the operands are of the right types for the operation in question (or, possibly, that they’re coercible to those right types). Observe, moreover, that this mechanism works for all operations, not just for the equality comparisons I’ve been discussing; the emphasis in the literature, in discussions of domain checking, on equality and other comparison operations is sanctioned by historical usage but is in fact misplaced. For example, consider the following expressions:

P.WEIGHT * SP.QTY

P.WEIGHT + SP.QTY

The first of these is probably valid (it yields another weight: namely, the total weight of the pertinent shipment). The second, by contrast, is probably not valid (what could it possibly mean to add a weight and a quantity?).

I’d like to close this section by stressing the absolutely fundamental role played—not just in type theory!—by the equality operator (“=”). It wasn’t just an accident that the discussions above happened to focus on the question of comparing two values for equality specifically. The fact is, equality truly is central, and the relational model requires it to be supported for every type. Indeed, since a type is basically a set of values (see the section “What’s a Type?”), without the “=” operator we couldn’t even say what values constitute the type in question! That is, given some type T and some value v, we couldn’t say, absent that operator, whether or not v was one of the values in the set of values constituting type T.

What’s more, the relational model also specifies the semantics of the “=” operator, as follows: If v1 and v2 are values of the same type, then v1 = v2 evaluates to TRUE if v1 and v2 are the very same value and FALSE otherwise. (As a matter of fact, I said exactly this in Chapter 1, as you might recall.) By contrast, if v1 and v2 are values of different types, then v1 = v2 has no meaning—it’s not even a legal comparison—unless v1 can be coerced to the type of v2 or the other way around, in which case we aren’t really talking about a comparison between v1 and v2 as such anyway.

DATA VALUE ATOMICITY

I hope the previous section succeeded in convincing you that domains really are types, no more and no less. Now I want to turn to the issue of data value atomicity and the related notion of first normal form (1NF for short). In Chapter 1, I said that 1NF meant that every tuple in every relation contains just a single value (of the appropriate type) in every attribute position—and it’s usual to add that those “single values” are supposed to be “atomic.” But this latter stipulation raises the obvious question: What does it mean for data to be atomic?

Well, on page 6 of the book mentioned earlier (The Relational Model for Database Management Version 2), Codd defines atomic data as data that “cannot be decomposed into smaller pieces by the DBMS (excluding certain special functions).” Even if we ignore that parenthetical exclusion, however, this definition is a trifle puzzling; even at best, it’s not very precise. For example, what about character strings? Are character strings atomic? Well, every database product I know provides a variety of operators—LIKE, SUBSTR (substring), “| |” (concatenate), and so on—that rely by definition on the fact that character strings in general can be “decomposed into smaller pieces by the DBMS.” So are such strings atomic? What do you think?

Here are some other examples of values whose atomicity is at least open to question and yet we would certainly want to allow as attribute values in tuples in relations:

Bit strings
Rational numbers (which might be regarded as being decomposable into integer and fractional parts)
Dates and times (which might be regarded as being decomposable into year / month / day and hour / minute / second components, respectively)

And so on.

Before drawing any conclusions from the discussion so far, I’d like to consider another example, one that some might regard as more startling, in a way. Refer to Fig. 2.1 below. Relation R1 in that figure is a reduced version of the shipments relation from our running example; it shows that certain suppliers supply certain parts, and it contains one tuple for each pertinent {SNO,PNO} combination. Further, let’s agree for the sake of the example that supplier numbers and part numbers are indeed atomic; then we can presumably agree that R1, at least, is in 1NF.

Fig. 2.1: Relations R1, R2, and R3

Now suppose we replace R1 by R2, which shows that certain suppliers supply certain groups of parts (attribute PNO in R2 is what some writers would call multivalued, and values of that attribute are groups of part numbers). Then most people would surely say that R2 is not in 1NF; in fact, it looks like a case of “repeating groups,” and repeating groups are the one thing that just about everybody agrees 1NF is supposed to prohibit (because such groups are obviously not atomic—right?).

Well, let’s agree for the sake of the argument that R2 isn’t in 1NF. But suppose we now replace R2 by R3. Then I claim that R3 is in 1NF! For consider:

First, note that I’ve renamed the attribute PNO_SET, and I’ve enclosed the groups of part numbers that are PNO_SET values in braces, to emphasize the fact that each such group is indeed a single value: a set value, to be sure, but a set is still, at a certain level of abstraction, a single value.
Second (and regardless of what you might think of my first argument), the fact is that a set like {P2,P4,P5} is no more and no less decomposable by the DBMS than a character string is. Like character strings, sets do have some inner structure; as with character strings, however, it’s convenient to ignore that structure for certain purposes. In other words, if character strings are compatible with the requirements of 1NF—that is, if character strings are atomic—then sets must be, too.⁷

The real point I’m getting at here is that the notion of atomicity has no absolute meaning; it just depends on what we want to do with the data. Sometimes we want to deal with an entire set of part numbers as a single thing; sometimes we want to deal with individual part numbers within that set—but then we’re descending to a lower level of detail, or lower level of abstraction. The following analogy might help. In physics (which after all is where the terminology of atomicity comes from) the situation is exactly parallel: Sometimes we want to think about individual atoms as indivisible things, sometimes we want to think about the subatomic particles (i.e., the protons, neutrons, and electrons) that go to make up those atoms. What’s more, protons and neutrons, at least, aren’t really indivisible, either—they contain a variety of “subsubatomic” particles called quarks. And so on, possibly (?).

Let’s return for a moment to relation R3. In Fig. 2.1, I showed PNO_SET values as general sets. But it would be more useful in practice if they were, more specifically, relations (see Fig. 2.2, where I’ve changed the attribute name to PNO_REL). Why would it be more useful? Because relations, not general sets, are what the relational model is all about.⁸ As a consequence, the full power of the relational algebra immediately becomes available for the relations in question—they can be restricted, projected, joined, and so on. By contrast, if we were to use general sets instead of relations, then we would need to introduce new operators (set union, set intersection, and so on) for dealing with those sets ... Much better to get as much mileage as we can out of the operators we already have!

Fig. 2.2: Relation R4 (a revised version of R3)

Terminology: Attribute PNO_REL in Fig. 2.2 is a relation valued attribute (RVA). Of course, the underlying domain is relation valued too (that is, the values it’s made up of are relations). I’ll have more to say about RVAs in Chapter 7; here let me just note that SQL doesn’t support them. (More precisely, it doesn’t support what would be its analog of RVAs, table valued columns. Oddly enough, however, it does support columns whose values are arrays, and columns whose values are rows, and even columns whose values are “multisets of rows”—where a multiset, also known as a bag, is like a set except that it permits duplicates.⁹ Columns whose values are multisets of rows thus do look a little bit like “table valued columns”; however, they aren’t table valued columns, because the values they contain can’t be operated upon by means of SQL’s regular table operators and thus aren’t regular SQL table values, by definition.)

Now, I chose the foregoing example deliberately, for its shock value. After all, relations with RVAs do look rather like “relations with repeating groups,” and you’ve probably always heard that repeating groups are a no-no in the relational world. But I could have used any number of different examples to make my point; I could have shown attributes (and therefore domains) that contained arrays; or bags (multisets); or lists; or photographs; or audio or video recordings; or X rays; or fingerprints; or XML documents; or any other kind of value, “atomic” or “nonatomic,” you might care to think of. Attributes, and therefore domains, can contain anything (any values, that is).

Incidentally, you might recall that a few years ago we were hearing a great deal about so called “object/relational” systems. Well, the foregoing paragraph goes a long way toward explaining why a true object/relational system would in fact be nothing more nor less than a true relational system—which is to say, a system that supports the relational model, with all that such support entails. After all, the whole point about an object/relational system from the user’s point of view is precisely that we can have attribute values in relations that are of arbitrary complexity. Perhaps a better way to say it is: A proper object/relational system is just a relational system with proper type support (including proper user defined type support in particular)—which just means it’s a proper relational system, no more and no less. And what some are pleased to call “the object/relational model” is, likewise, just the relational model, no more and no less.

WHAT’S A TYPE?

From this point forward I’ll favor the term type over the term domain. So what is a type, exactly? In essence, it’s a named, finite set of values—all possible values of some specific kind: for example, all possible integers, or all possible character strings, or all possible supplier numbers, or all possible XML documents, or all possible relations with a certain heading (and so on). To elaborate briefly:

The types we’re interested in are always finite because we’re dealing with computers, which (as pointed out in connection with type RATIONAL earlier in the chapter) are finite by definition.
Note also that qualifier named: Types with different names are different types.

Moreover:

Every value is of some type—in fact, of exactly one type, except possibly if type inheritance is supported, a concept that’s beyond the scope of this book.

Aside: If every value is of exactly one type, then no value is of two or more types, and thus types are always disjoint (nonoverlapping). However, perhaps I need to elaborate on this point briefly. As one reviewer of this chapter said, surely types WarmBloodedAnimal and FourLeggedAnimal overlap? Indeed they do; but what I’m saying is that if types overlap, then for a variety of reasons we’re getting into the realm of type inheritance—in fact, into the realm of what’s called multiple type inheritance. Since those reasons, and indeed the whole topic of inheritance as such, are independent of the context we’re in (be it relational or something else), I’m not going to discuss them in this book. End of aside.
Every variable, every attribute, every operator that returns a result, and every parameter of every operator is defined, or declared, to be of some type.¹⁰ And to say that, e.g., variable V is declared to be of type T means, precisely, that every value v that can legally be assigned to V is in turn of type T.
Every expression denotes some value and is therefore of some type: namely, the type of the value in question, which is to say the type of the value returned by the outermost operator in the expression (where by “outermost” I mean the operator that’s executed last). For example, the type of the expression

( a / b ) + ( x - y )

is the type declared for the operator “+”, whatever that happens to be.

The fact that parameters in particular are declared to be of some type touches on an issue that I’ve mentioned but haven’t properly discussed as yet: namely, the fact that associated with every type there’s a set of operators for operating on values and variables of the type in question—where to say that operator Op is “associated with” type T basically just means that operator Op has a parameter of type T.¹¹ For example, integers have the usual arithmetic operators; dates and times have special calendar arithmetic operators; XML documents have what are called “XPath” and “XQuery” operators; relations have the operators of the relational algebra; and every type has the operators of assignment (“:=”) and equality comparison (“=”). Thus, any system that provides proper type support—and “proper type support” here certainly includes the ability for users to define their own types—must provide a way for users to define their own operators, too, because types without operators are useless. Note: User defined operators can be defined in association with system defined types as well as user defined ones (or a mixture, of course), as you would surely expect.

Observe now that, by definition, values and variables of a given type T can be operated upon only by means of the operators associated with that type T. For example, in the case of the system defined type INTEGER:

The system provides an assignment operator “:=” for assigning integer values to integer variables.
It also provides a format for writing integer literals. (However, it doesn’t provide any more general integer selector operators, nor does it provide any corresponding THE_ operators, because—as should be obvious if you think about it—such operators aren’t needed for a system defined type like INTEGER.)
It also provides comparison operators “=”, “≠”, “<”, and so on, for comparing integer values.
It also provides arithmetic operators “+”, “*”, and so on, for performing arithmetic on integer values.
It does not provide string operators LIKE, SUBSTR (substring), “| |” (concatenate), and so on, for performing string operations on integer values; in other words, string operations on integer values aren’t supported.

By contrast, in the case of the user defined type SNO (still assuming it is user defined), we would certainly define the necessary selector and THE_ operators, and we would also define assignment (“:=”) and comparison operators (“=”, “≠”, possibly “<”, and so on). However, we probably wouldn’t define operators “+”, “*”, and so on, which would mean that arithmetic on supplier numbers wouldn’t be supported (what could it possibly mean to add or multiply two supplier numbers?).

From everything I’ve said so far, then, it should be clear that defining a new type involves at least all of the following:

Defining a name for the type (obviously enough).
Defining the values that make up that type. I’ll discuss this aspect in detail in Chapter 8.
Defining the hidden physical representation for values of that type. As noted earlier, this is an implementation issue, not a model issue, and I won’t discuss it further in this book (at least, not much).
Defining one or more selector operators for selecting, or specifying, values of that type. Note: Here’s as good a place as any to point out in the interest of accuracy that the selectors for type T aren’t “associated with” type T in the sense that they have a parameter of type T; rather, they return a result of type T.
Defining the operators, including in particular assignment (“:=”), equality comparison (“=”), and THE_ operators, that apply to values and variables of that type (see below).
For those operators that return a result, defining the type of that result (again, see below).

Observe that points 4, 5, and 6 taken together imply that (a) the system knows precisely which expressions are legal, and (b) for those expressions that are legal it knows the type of the result as well.

By way of example, suppose we have a user defined type POINT, representing geometric points in two-dimensional space. Here then is the Tutorial D definition—I could have used SQL, but operator definitions in SQL involve a number of details that I don’t want to get into here—for an operator called REFLECT which, given a point P with cartesian coordinates (x,y), returns the “reflected” or “inverse” point with cartesian coordinates (-x,-y):

1.  OPERATOR REFLECT ( P POINT ) RETURNS POINT ;
2.     RETURN POINT ( - THE_X ( P ) , - THE_Y ( P ) ) ;
3.  END OPERATOR ;

Explanation:

Line 1 shows that the operator (a) is called REFLECT, (b) takes a single parameter P, of type POINT, and (c) returns a result also of type POINT (so the type of the operator is declared to be POINT).
Line 2 is the operator implementation code. It consists of a single RETURN statement. The value to be returned is a point, and it’s obtained by invoking the POINT selector; that invocation has two arguments, corresponding to the X and Y coordinates of the point to be returned. Each of those arguments is defined by means of a THE_ operator invocation; those invocations yield the X and Y coordinates of the point argument corresponding to parameter P, and negating those coordinates leads us to the desired result.¹²
Line 3 marks the end of the definition.

Now, the discussions in this section so far have been framed in terms of user defined types, for the most part. But similar considerations apply to system defined types also, except that in this case the various definitions are furnished by the system instead of by some user. For example, if INTEGER is a system defined type, then it’s the system that defines the name, defines legal integer values, defines the hidden representation, and—as we’ve already seen— defines a corresponding literal format, defines the corresponding operators “:=”, “=”, “+”, and so on (though users can define additional operators, of course, if they want to).

There’s one last point I want to make. I’ve mentioned selector operators several times; what I haven’t said, however (at least not explicitly), is that selectors—more precisely, selector invocations—are really just a generalization of the more familiar concept of a literal.¹³ What I mean by this remark is that all literals are selector invocations, but not all selector invocations are literals; in fact, a selector invocation is a literal if and only if its arguments are themselves all specified as literals in turn. For example, POINT(X,Y) and POINT(1.0,2.5) are both invocations of the POINT selector, but only the second is a POINT literal. It follows that every type has (must have) an associated format for writing literals. And for completeness I should add that every value of every type must be denotable by means of some literal of the type in question.

SCALAR vs. NONSCALAR TYPES

Types are frequently said to be either scalar or nonscalar. Loosely, a type is scalar if it has no user visible components and nonscalar otherwise—and values, variables, attributes, operators, parameters, and expressions of some type T are scalar or nonscalar according as type T itself is scalar or nonscalar. For example:

Type INTEGER is a scalar type; hence, values, variables, and so on of type INTEGER are also all scalar, meaning they have no user visible components.
Tuple and relation types are nonscalar—the pertinent user visible components being the corresponding attributes—and hence tuple and relation values, variables, and so on are also all nonscalar.

That said, I must now emphasize that these notions are quite informal. Indeed, we’ve already seen that the concept of data value atomicity has no absolute meaning, and “scalarness” is really just that same concept by another name. Thus, the relational model certainly doesn’t rely on the scalar vs. nonscalar distinction in any formal sense. In this book, however, I do rely on it informally; I mean, I do find it intuitively useful on occasion. To be specific, I occasionally use the term scalar in connection with types that are neither tuple nor relation types, and the term nonscalar in connection with types that are either tuple or relation types.¹⁴

Aside: Another term you’ll sometimes hear used to mean “scalarness” is encapsulation. Be aware, however, that this term is also used—especially in object oriented contexts—to refer to the physical bundling, or packaging, of code and data (or operator definitions and data representation definitions, to be more precise). But this latter use of the term mixes model and implementation considerations; clearly the user shouldn’t care, and shouldn’t need to care, whether code and data are physically bundled together or are kept separate. End of aside.

Let’s look at an example. Here’s a Tutorial D definition for the base relvar S (“suppliers”) —and note that, for simplicity, I now define the attributes all to be of some system defined type:

1.  VAR S BASE
2.      RELATION { SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR }
3.      KEY { SNO } ;

Explanation:

The keyword VAR in line 1 means this is a variable definition; S is the name of that variable, and the keyword BASE means the variable is a base relvar specifically.
Line 2 specifies the type of this variable. The keyword RELATION shows it’s a relation type; the rest of the line specifies the set of attributes that make up the corresponding heading (where, as you’ll recall from Chapter 1, an attribute is defined to be an attribute-name : type-name pair, and no two attributes in the same heading have the same attribute name). The type is, of course, a nonscalar type. No significance attaches to the order in which the attributes are specified.
Line 3 defines {SNO} to be a key for this relvar.

In fact, the example also illustrates another point—namely, that the type

RELATION { SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR }

is an example of a generated type. A generated type is a type that’s obtained by invoking some type generator (in the example, the type generator is, specifically, RELATION). You can think of a type generator as a special kind of operator; it’s special because (a) it returns a type instead of a value, and (b) it’s invoked at compile time instead of run time. For instance, most programming languages support a type generator called ARRAY, which lets users define a variety of specific array types. For present purposes, however, the only type generators we’re interested in are TUPLE and RELATION. Here’s an example involving the TUPLE type generator:

VAR STV /* tuple variable */
TUPLE { STATUS INTEGER , SNO CHAR , CITY CHAR , SNAME CHAR } ;

The value of variable STV at any given time is a tuple with the same heading as that of relvar S (I’ve deliberately specified the attributes in a different order, just to show the order doesn’t matter).¹⁵ Thus, we might imagine a code fragment that (a) extracts a one-tuple relation (perhaps the relation containing just the tuple for supplier S1) from the current value of relvar S, then (b) extracts the single tuple from that one-tuple relation, and finally (c) assigns that tuple to the variable STV. In Tutorial D:

STV := TUPLE FROM ( S WHERE SNO = 'S1' ) ;

Important: I don’t want you to misunderstand me here. While a variable like STV might certainly be needed in some application program that accesses the suppliers-and-parts database, I’m not saying such a variable can appear inside the database itself. A relational database contains variables of exactly one kind—namely, relation variables (relvars). In other words, relvars are the only kind of variable allowed in a relational database. Note: This fact—i.e., that relvars are the only kind of variable allowed in a relational database—constitutes what’s called The Information Principle. I’ll have more to say about it in Appendix A.

By the way, note carefully that (as the foregoing example suggests) there’s a logical difference between a tuple t and the relation r that contains just that tuple t. In particular, they’re of different types—t is of some tuple type and r is of some relation type (though the types do at least have the same heading, or in other words the same attributes and same degree).

Finally, a few miscellaneous points to close this section:

Even though tuple and relation types do have user visible components (namely, their attributes), there’s no suggestion that those components have to be physically stored as such, in the form in which they’re seen by the user. In fact, the physical representation of tuples and relations should be hidden from the user, just as it is for scalar values (recall the discussion of physical data independence in Chapter 1).
Like scalar types, tuple and relation types certainly need associated selector operators (and literals as a special case). I’ll defer the details to the next chapter. They don’t need THE_ operators, however; instead, they have operators that provide access to the corresponding attributes, and those operators play a role somewhat analogous to that played by THE_ operators in connection with scalar types.
Tuple and relation types also need assignment and equality comparison operators. I gave an example of tuple assignment earlier in the present section; I’ll defer details of the other operators—relational assignment, and tuple and relational equality comparisons—to the next chapter.

SCALAR TYPES IN SQL

I turn now to SQL. SQL supports the following more or less self-explanatory system defined scalar types (it also allows users to define their own types, but as I’ve already said I don’t intend to say much about user defined types in this chapter):

BOOLEAN      INTEGER           CHARACTER(n)
             SMALLINT          CHARACTER VARYING(n)
             BIGINT            CHARACTER LARGE OBJECT(n)
             NUMERIC(p,q)      BINARY(n)
             DECIMAL(p,q)      BINARY VARYING(n)
             FLOAT(p)          BINARY LARGE OBJECT(n)

This isn’t a complete list—other SQL system defined types include an “XML document” type (XML); a variety of “national character string types” (NATIONAL CHARACTER(n), etc.); and a variety of datetime types (DATE, TIME, TIMESTAMP, INTERVAL). However, I’ll be ignoring such types, mostly, for the purposes of this book. Points arising:

A number of defaults, abbreviations, and alternative spellings, including INT for INTEGER, CHAR for CHARACTER, VARCHAR for CHARACTER VARYING, VARBINARY for BINARY VARYING, CLOB for CHARACTER LARGE OBJECT, BLOB for BINARY LARGE OBJECT, are also supported.
As you can see, SQL, unlike Tutorial D, requires its various character string types to have an associated length specification.
The same goes for the various BINARY types. Note that BINARY doesn’t mean binary numbers, it means bit string (or, perhaps more accurately, byte string, since the associated length specification gives the corresponding length in octets).¹⁶
The p in NUMERIC, DECIMAL, and FLOAT is the associated precision, and the q in NUMERIC and DECIMAL is the associated scale factor (p > 0, 0 ≤ q ≤ p).¹⁷ Thus, e.g., the specification DECIMAL(5,2) denotes decimal numbers in the range -999.99 to +999.99, inclusive.
Strictly speaking, CHAR (for example) isn’t really a type as such—rather, it’s a type generator. By contrast, CHAR(25), for example, is a type as such, and it’s obtained by invoking that type generator with the value 25 as sole argument to that invocation. What’s more, analogous remarks apply to every “scalar type” in the foregoing list except for type BOOLEAN and the various integer types (SMALLINT, INTEGER, BIGINT).¹⁸ For simplicity, however, I’ll overlook this point in what follows (most of the time, at any rate) and continue to refer to CHAR and the rest as if they were indeed types as such, much as SQL itself does.
Literals of more or less conventional format are supported for all of these types.
An explicit assignment operator is supported for all of these types. The syntax is:

SET <scalar variable ref> = <scalar exp> ;

Scalar assignments are also performed implicitly when various other operations (e.g., FETCH) are executed. Note: Throughout this book in formal syntax definitions like the one just shown, I use ref and exp as convenient abbreviations for reference and expression, respectively.
An explicit equality comparison operator is also supported for all of these types.¹⁹ The syntax is:

<scalar exp> = <scalar exp>

Equality comparisons are also performed implicitly when numerous other operations (e.g., joins and unions, grouping and duplicate elimination operations, and many others) are executed.
Regarding type BOOLEAN in particular, I should point out that although it’s included in the standard, it’s supported by few if any of the mainstream SQL products. Of course, boolean expressions can always appear in WHERE, ON, and HAVING clauses, even if the system in question doesn’t support type BOOLEAN as such. In such a system, however, no table can have a column of type BOOLEAN, and no variable can be declared to be of type BOOLEAN. As a consequence, workarounds (e.g., “yes/no columns”) might sometimes be needed.
Finally, in addition to the foregoing scalar types, SQL also supports something it calls domains. However, SQL’s domains aren’t types at all; rather, they’re just a kind of factored out “common column definition,” with a number of rather strange properties that are well beyond the scope of this book. You can use them if you like, but don’t make the mistake of thinking they’re true relational domains (i.e., types).

Note: One thing SQL’s domains most definitely don’t do is constrain comparisons. For example, suppose columns S.CITY and P.CITY are defined on SQL domains SCD and PCD, respectively. Then you might expect the comparison S.CITY = P.CITY to fail. However, it won’t, not necessarily; rather, it’ll fail if and only if the data types underlying those domains fail to satisfy the requirements for such comparisons as outlined in the section immediately following. In SQL, in other words, such comparisons are legal if and only if the columns involved have what might be called compatible data types, regardless of whether they’re defined on the same SQL domain, and regardless even of whether any SQL domains as such are involved at all.

TYPE CHECKING AND COERCION IN SQL

SQL supports only a weak form of strong typing (if you see what I mean). To be specific:

BOOLEAN values can be assigned only to BOOLEAN variables and compared only with BOOLEAN values.
Numeric values can be assigned only to numeric variables and compared only with numeric values (where “numeric” means INTEGER, SMALLINT, BIGINT, NUMERIC, DECIMAL, or FLOAT).
Character string values can be assigned only to character string variables and compared only with character string values (where “character string” means CHAR, VARCHAR, or CLOB).
Bit string values can be assigned only to bit string variables and compared only with bit string values (where “bit string” means BINARY, VARBINARY, or BLOB).

Thus, for example, an attempt to compare a number and a character string is illegal. However, an attempt to compare (say) two numbers is legal, even if those numbers are of different types—say INTEGER and FLOAT, respectively (in this example, the INTEGER value will be coerced to type FLOAT before the comparison is done). Which brings me to the question of type coercion.

Now, it’s a widely accepted principle in computing that coercions are generally best avoided, because they’re error prone. In SQL in particular, one bizarre consequence of permitting coercions is that certain unions, intersections, and differences can yield a result with rows that don’t appear in either operand! By way of example, consider the SQL tables T1 and T2 shown in Fig. 2.3 below. Let column X be of type INTEGER in table T1 but NUMERIC(5,1) in table T2, and let column Y be of type NUMERIC(5,1) in table T1 but INTEGER in table T2. Now consider the SQL query:

SELECT X , Y FROM T1
UNION
SELECT X , Y FROM T2

The result is shown as the rightmost table (T3) in Fig. 2.3. As the figure suggests, columns X and Y in that result are both of type NUMERIC(5,1), and all values in those columns are obtained, in effect, by coercing some INTEGER value to type NUMERIC(5,1). Thus, the result consists exclusively of rows that appear in neither T1 nor T2! —a very strange kind of union, you might be forgiven for thinking.²⁰

Fig. 2.3: A very strange “union”

Strong recommendations: Do your best to avoid coercions wherever possible. (My own clear preference would be to do away with them entirely, regardless of whether we’re in the SQL context or any other context.) In the SQL context in particular, I recommend that you ensure that columns with the same name are always of the same type; this discipline, along with others recommended elsewhere in this book, will go a long way toward ensuring that type conversions in general are avoided. And when they can’t be avoided, I recommend doing them explicitly, using CAST or some CAST equivalent. For example (with reference to the foregoing UNION query):

SELECT CAST ( X AS NUMERIC(5,1) ) AS X , Y FROM T1
UNION
SELECT X , CAST ( Y AS NUMERIC(5,1) ) AS Y FROM T2

For completeness, however, I need to add that certain coercions are unfortunately built into the very fabric of SQL and so can’t be avoided. (I realize the following remarks might not make much sense at this point in the book, but I don’t want to lose them.) To be specific:

If a table expression tx is used as a row subquery, then the table t denoted by tx is supposed to have just one row r, and that table t is coerced to that row r. Note: The term subquery occurs ubiquitously in SQL contexts. I’ll explain it in detail in Chapter 12; prior to that point, you can take it to mean, albeit a trifle loosely, just a SELECT expression enclosed in parentheses.
If a table expression tx is used as a scalar subquery, then the table t denoted by tx is supposed to have just one column and just one row and hence to contain just one value v, and that table t is doubly coerced to that value v. Note: This case occurs in connection with SQL-style aggregation in particular (see Chapter 7).
In practice, the row expression rx in the ALL or ANY comparison rx θ sq—where (a) θ is a simple scalar comparison operator, such as “<” or “>”,²¹ followed by the keyword ALL or ANY, and (b) sq is a subquery—often consists of a simple scalar expression, in which case the scalar value denoted by that expression is effectively coerced to a row that contains just that scalar value. Note: Throughout this book, I use the term row expression to mean either a row subquery or a row selector invocation (where row selector in turn is my preferred term for what SQL calls a row value constructor—see Chapter 3); in other words, I use row expression to mean any expression that denotes a row, just as I use table expression to mean any expression that denotes a table. As for ALL or ANY comparisons, they’re discussed in Chapter 11.

Finally, SQL also uses the term coercion in a very special sense in connection with character strings. The details are beyond the scope of this book.

COLLATIONS IN SQL

SQL’s rules regarding type checking and coercion, in the case of character strings in particular, are (sadly) rather more complex than I’ve been pretending so far, and I need to elaborate somewhat. Actually it’s impossible in a book of this nature to do more than just scratch the surface of the matter, but the basic idea is this: Any given character string (a) consists of characters from one associated character set and (b) has one associated collation. A collation— also known as a collating sequence—is a rule that’s associated with a specific character set and governs the comparison of strings of characters from that character set. Let C be a collation for character set S, and let a and b be any two characters from S. Then C must be such that exactly one of the comparisons a < b, a = b, and a > b evaluates to TRUE and the other two to FALSE (under C). Note: In early versions of SQL there was just one character set, that character set had just one collation, and that collation was based on the numerical order of the binary codes used to represent the characters in that character set. But there’s no intrinsic reason why collating sequences should have to depend on internal coding schemes, and there are good practical reasons why they shouldn’t.

So much for the basic idea. However, there are complications. One arises from the fact that any given collation can (or, rather, must) have either PAD SPACE or NO PAD defined for it. Suppose the character strings 'AB' and 'AB ' (note the trailing space in the second of these) have the same character set and the same collation. Then those two strings are clearly distinct, and yet they’re considered to “compare equal” if PAD SPACE applies. Recommendation: Don’t use PAD SPACE—always use NO PAD instead, if possible. Note, however, that the choice between PAD SPACE and NO PAD affects comparisons only—it makes no difference to assignments.²²

Another complication arises from the fact that the comparison a = b might evaluate to TRUE under a given collation, even if the characters a and b are distinct. For example, we might define a collation called CASE_INSENSITIVE in which each lowercase letter is defined to compare equal to its uppercase counterpart. As a consequence, again, strings that are clearly distinct will sometimes compare equal.

We see, therefore, that certain comparisons of the form v1 = v2 can give TRUE in SQL even if v1 and v2 are distinct (possibly even if they’re of different types, thanks to SQL’s support for coercion). I’ll use the term “distinct, considered equal” to refer to such pairs of values. Now, equality comparisons are performed, often implicitly, in numerous contexts—examples include MATCH, LIKE, UNIQUE, UNION, and JOIN—and the kind of equality involved in all such cases is indeed such that “distinct, considered equal” values are treated as equal. For example, let collation CASE_INSENSITIVE be as defined above, and let PAD SPACE apply to that collation. Then, if the PNO columns of tables P and SP both use that collation, and if 'P2' and 'p2 ' are PNO values in, respectively, some row of P and some row of SP, those two rows will be regarded as satisfying the foreign key constraint from SP to P, despite the lowercase p and trailing spaces in the foreign key value.

What’s more, when evaluating expressions involving operators such as UNION, INTERSECT, EXCEPT, JOIN, GROUP BY, DISTINCT (and so on), the system sometimes has to decide which of several “distinct, considered equal” values is to be chosen as the value of some column in some result row. Unfortunately, SQL itself fails to give complete guidance in such situations. As a consequence, certain table expressions are indeterminate—the SQL term is possibly nondeterministic—in the sense that SQL doesn’t fully specify how they should be evaluated; indeed, they might quite legitimately give different results on different occasions. For example, if collation CASE_INSENSITIVE applies to column C in table T, then SELECT MAX(C) FROM T might return 'ZZZ' on one occasion and 'zzz' on another, even if T hasn’t changed in the interim.

I won’t give SQL’s rules here for when a given expression is “possibly nondeterministic” (see Chapter 12 for further discussion). It’s important to note, however, that such expressions aren’t allowed in integrity constraints (see Chapter 8), presumably because they could cause updates to succeed or fail unpredictably. Observe in particular, therefore, that this rule implies among other things that many table expressions—even simple SELECT expressions, sometimes—aren’t allowed in constraints if they involve a column of some character string type! Strong recommendation: Avoid possibly nondeterministic expressions as much as you can.

Aside: As I’ve just said, possibly nondeterministic expressions aren’t allowed in constraints. Oddly enough, however, they are allowed to appear in queries and updates, where they can surely do just as much damage (?). End of aside.

ROW AND TABLE TYPES IN SQL

Here repeated from the section “Scalar vs. Nonscalar Types” is an example of a tuple variable definition:

VAR STV TUPLE { STATUS INTEGER , SNO CHAR , CITY CHAR , SNAME CHAR } ;

The expression TUPLE {...} here is, as you’ll recall, an invocation of the TUPLE type generator. SQL has a corresponding ROW type generator (though it calls it a type constructor). Here’s an SQL analog of the foregoing Tutorial D example:

DECLARE SRV /* SQL row variable */
        ROW ( SNO    VARCHAR(5) ,
              SNAME  VARCHAR(25) ,
              STATUS INTEGER ,
              CITY   VARCHAR(20) ) ;

Unlike tuples, however, rows in SQL have a left to right ordering to their components.²³ In the case at hand, therefore, there are actually 4! = 4 * 3 * 2 * 1 = 24 different row types all consisting of the same four components (!).²⁴

SQL also supports row assignment. Recall this Tutorial D tuple assignment:

STV := TUPLE FROM ( S WHERE SNO = 'S1' ) ;

Here’s an SQL row assignment analog:

SET SRV = ( S WHERE SNO = 'S1' ) ;

The expression on the right side here is a row subquery—i.e., it’s a table expression, syntactically speaking, but it’s a table expression that’s acting as a row expression. That’s why there’s no explicit counterpart in the example to Tutorial D’s TUPLE FROM (see the discussion of subqueries and coercion in the section “SQL Type Checking and Coercion” a couple of pages back).

Row assignments are also involved, in effect, in SQL UPDATE statements (see Chapter 3).

Turning now to tables: Interestingly, SQL doesn’t really have a TABLE type generator (or type constructor, as SQL would probably call it) at all!—i.e., it has nothing directly analogous to the RELATION type generator described earlier in this chapter. However, it does have a mechanism, CREATE TABLE, for defining what by rights should be called table variables. For example, recall this Tutorial D definition from the section “Scalar vs. Nonscalar Types”:

VAR S BASE
RELATION { SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR }
KEY { SNO } ;

Here’s an SQL analog:

CREATE TABLE S
     ( SNO    VARCHAR(5)  NOT NULL ,
       SNAME  VARCHAR(25) NOT NULL ,
       STATUS INTEGER     NOT NULL ,
       CITY   VARCHAR(20) NOT NULL ,
       UNIQUE ( SNO ) ) ;

Note carefully, however, that there’s nothing—no sequence of linguistic tokens—in this example that can logically be labeled “an invocation of the TABLE type constructor.” (This fact might become more apparent when you realize that the specification UNIQUE (SNO), which defines a certain integrity constraint on suppliers, doesn’t have to come after the column definitions but can appear almost anywhere—e.g., between the definitions of columns SNO and SNAME. Not to mention the NOT NULL specifications on the individual column definitions, which also define certain integrity constraints.) In fact, to the extent that the variable S can be regarded (in SQL) as having any type at all, that type is nothing more than bag of rows, where the rows in question are of type ROW (SNO VARCHAR(5), SNAME VARCHAR(25), STATUS INTEGER, CITY VARCHAR(20)).

That said, I should say too that SQL does support something it calls “typed tables.” The term isn’t very appropriate, however, because if TT is a “typed table” that has been defined to be “of type T,” then TT is not of type T, and neither are its rows! More important, I think you should avoid such tables anyway, because they’re inextricably intertwined with SQL’s support for pointers, and pointers are explicitly prohibited in the relational model.²⁵ In fact, if some table T has a column whose values are pointers to rows in some “target” table T′, then that table T can’t possibly represent a relation in the relational model sense. (As a matter of fact, the target table T′ can’t do so either.) As I’ve just indicated, however, tables like table T are unfortunately permitted in SQL; the pointers are called reference values, and the columns that contain them are said to be of some REF type. Quite frankly, it’s not clear why these features are included in SQL at all; certainly there seems to be no useful functionality that can be achieved with them that can’t equally well—in fact, better—be achieved without them. Strong recommendation: Don’t use them, nor any features related to them.

Aside: To avoid a possible confusion, I should add that SQL actually uses the terminology of “referencing” in two quite different senses. One is as sketched above. The other, and older, sense has to do with foreign keys; a foreign key value in one row is said to “reference” the row that contains the corresponding target key value. Note, however, that foreign keys certainly aren’t pointers!—there are many logical differences between the two concepts, including in particular the fact that foreign keys refer to rows, which are values, whereas pointers are addresses and therefore, by definition, refer to variables. (Recall from Chapter 1 that it’s variables, not values, that “have location.” Values, having no location, certainly don’t have addresses.) End of aside.

CONCLUDING REMARKS

It’s a common misconception that the relational model deals only with rather simple types: numbers, strings, perhaps dates and times, and not much else. In this chapter, I’ve tried to show among other things that this is indeed a misconception. Rather, relations can have attributes of any type whatsoever (other than as noted in just a moment)—the relational model nowhere prescribes just what those types must be, and in fact they can be as complex as you like (they can even be relation types). In other words, the question as to what types are supported is orthogonal to the question of support for the relational model itself. Or, less precisely but more catchily: Types are orthogonal to tables.

I also remind you that the foregoing state of affairs in no way violates the requirements of first normal form—first normal form just means that every tuple in every relation contains a single value, of the appropriate type, in every attribute position. Now we know that those types can be anything, we also know that all relations are in first normal form by definition.

Finally, I mentioned in the introduction to this chapter that there were certain important exceptions to the rule that relational attributes can be of any type whatsoever. In fact, there are two (both of which I’ll simplify just slightly for present purposes). The first is that if relation r is of type T, then no attribute of r can itself be of type T (think about it!). The second (which in fact I’ve already touched on) is that no relation in the database can have an attribute of any pointer type. Prerelational databases were full of pointers, and access to such databases involved a lot of pointer chasing, a state of affairs that made application programming error prone and direct end user access to those databases almost impossible. (These aren’t the only problems with pointers, but they’re among the more obvious ones.) Codd wanted to get away from such problems in his relational model, and of course he succeeded.

EXERCISES

2.1 What’s a type? What’s the difference between a domain and a type?

2.2 What do you understand by the term selector? And what exactly is a literal?

2.3 What’s a THE_ operator?

2.4 Physical representations are always hidden from the user: True or false?

2.5 This chapter has touched on several more logical differences (refer back to Chapter 1 if you need to refresh your memory regarding this important notion), including:

argument	vs.	parameter
database	vs.	DBMS
foreign key	vs.	pointer
generated type	vs.	nongenerated type
relation	vs.	type
type	vs.	representation
user defined type	vs.	system defined type
user defined operator	vs.	system defined operator

What exactly is the logical difference in each of these cases?

2.6 Explain in your own words the difference between the concepts scalar and nonscalar.

2.7 What do you understand by the term coercion? Why is coercion a bad idea?

2.8 Why doesn’t domain check override make sense?

2.9 What’s a type generator?

2.10 Define first normal form. Why do you think it’s so called?

2.11 Let X be an expression. What’s the type of X? What’s the significance of the fact that X is of some type?

2.12 Using the definition of the REFLECT operator in the body of the chapter (section “What’s a Type?”) as a template, define a Tutorial D operator that, given an integer, returns the cube of that integer.

2.13 Let LENGTH be a user defined type, with the obvious semantics. Use Tutorial D to define an operator that, given the length of two adjacent sides of a rectangle, returns the corresponding area.

2.14 Give an example of a relation type. Distinguish between relation types, relation values, and relation variables.

2.15 Use SQL or Tutorial D or both to define relvars P and SP from the suppliers-and-parts database. If you give both SQL and Tutorial D definitions, identify as many differences between them as you can. What’s the significance of the fact that relvar P (for example) is of a certain relation type?

2.16 With reference to the departments-and-employees database from Chapter 1 (see Fig. 1.1), suppose the attributes are of the following user defined types:

DNO    : DNO
DNAME  : NAME
BUDGET : MONEY
ENO    : ENO
ENAME  : NAME
SALARY : MONEY

Suppose departments also have a LOCATION attribute, of user defined type CITY (say). Which of the following scalar expressions (or would-be expressions) are valid? For those that are, state the type of the result; for the others, give an expression that will achieve what appears to be the desired effect.

LOCATION = 'London'
ENAME = DNAME
SALARY * 5
BUDGET + 50000
ENO > 'E2'
ENAME || DNAME
LOCATION || 'burg'

2.17 It’s sometimes suggested that types are really variables, in a sense. For example, employee numbers might grow from three digits to four as a business expands, so we might need to update “the set of all possible employee numbers.” Discuss.

2.18 A type is a set of values and the empty set is a legitimate set; thus, we might define an empty type to be a type where the set in question is empty. Can you think of any uses for such a type?

2.19 In the relational world, the equality operator “=” applies to every type. By contrast, SQL doesn’t require “=” to apply to every type, and it doesn’t fully define the semantics in all of the cases where it does apply. What are the implications of this state of affairs?

2.20 Following on from the previous exercise, we can say that if v1 = v2 evaluates to TRUE in the relational world, then executing some operator Op on v1 and executing that same operator Op on v2 always has exactly the same effect, for all possible operators Op. But this is another precept that SQL violates. Can you think of any examples of such violation? What are the implications?

2.21 Why are pointers excluded from the relational model?

2.22 The Assignment Principle—which is very simple, but fundamental—states that after assignment of the value v to the variable V, the comparison V = v evaluates to TRUE (see Chapter 5). Yet again, however, this is a precept that SQL violates (fairly ubiquitously, in fact). Can you think of any examples of such violation? What are the implications?

2.23 Do you think that types “belong to” databases, in the same sense that relvars do?

2.24 In the first example of an SQL SELECT expression in this chapter, I pointed out that there was no terminating semicolon because the expression was an expression and not a statement. But what’s the difference?

2.25 Explain as carefully as you can the logical difference between a relation with a relation valued attribute (RVA) and a “relation with a repeating group.”

2.26 What’s a subquery?

2.27 To repeat from Exercise 2.19: In the relational world, the equality operator “=” applies to every type. But what about type BOOLEAN? And what about SQL’s row and table types?

ANSWERS

2.1 A type is a named, finite set of values—all possible values of some specific kind: for example, all possible integers, or all possible character strings, or all possible supplier numbers, or all possible XML documents, or all possible relations with a certain heading (etc., etc.). There’s no difference between a domain and a type. Note: SQL does draw a distinction between domains and types, however. The distinction shows up most immediately in the fact that SQL supports both a CREATE TYPE statement and a CREATE DOMAIN statement. To a first approximation, CREATE TYPE can be thought of as SQL’s counterpart to the TYPE statement of Tutorial D, which I’ll be discussing in Chapter 8 (though there are many, many differences, not all of them trivial, between the two). As for CREATE DOMAIN, it might be regarded, very charitably, as SQL’s attempt to provide a tiny part of the total functionality of CREATE TYPE (it was introduced in SQL:1992, while CREATE TYPE wasn’t introduced until SQL:1999); now that CREATE TYPE exists, there seems little reason to use, or even support, CREATE DOMAIN at all.

2.2 Every type has at least one associated selector; a selector is an operator that allows us to select, or specify, an arbitrary value of the type in question. Let T be a type and let S be a selector for T; then every value of type T must be returned by some successful invocation of S, and every successful invocation of S must return some value of type T. See Chapter 8 for further discussion. Note: Selectors are provided “automatically” in Tutorial D—since they’re required by the relational model, at least implicitly—but not, in general, in SQL. In fact, although the selector concept necessarily exists, SQL doesn’t really have a term for it; certainly selector as such isn’t an SQL term. Further details are beyond the scope of this book.

A literal is a “self-defining symbol”; it denotes a value that can be determined at compile time. More precisely, a literal is a symbol that denotes a value that’s fixed and determined by the symbol in question (and the type of that value is therefore also fixed and determined by the symbol in question). Here are some Tutorial D examples:

4                               /* a literal of type INTEGER  */
'XYZ'                           /* a literal of type CHAR     */
FALSE                           /* a literal of type BOOLEAN  */
2.5                             /* a literal of type RATIONAL */
POINT ( 5.0 , 2.5 )             /* a literal of type POINT    */

(The last of these involves the user defined type POINT from the body of the chapter.)

Every value of every type, tuple and relation types included, must be denotable by means of some literal (of the applicable type, of course). A literal is a special case of a selector invocation; to be specific, it’s a selector invocation all of whose arguments are themselves specified as literals in turn (implying in particular that a selector invocation with no arguments at all, like the INTEGER selector invocation 4, is a literal by definition). Note finally that there’s a logical difference between a literal as such and a constant—a constant is a value, while a literal is a symbol that denotes such a value. (By the same token. there’s a logical difference between a literal and a value—as just stated, a value is a constant, such as the constant 3, while a literal is a symbol that denotes such a constant.)

Aside: Some languages also support so called “named constants.” A named constant denotes a value—the constant in question—that can be referenced by means of a name that’s not just a simple literal representation of that constant. In other words, a named constant resembles a named variable, in that it can be thought of as an abstraction of a storage location that contains a value; however, it differs from a variable in two obvious ways. First, it can never serve as the target for an assignment operation. Second, every reference to the pertinent name always denotes the same value. End of aside.

2.3 A THE_ operator is an operator that provides access to some component of some “possible representation,” or possrep, of some specified value of some specified type (see Chapter 8 for further discussion). Note: THE_ operators are effectively provided “automatically” in both Tutorial D and SQL, to a first approximation. However, although the THE_ operator concept necessarily exists, SQL doesn’t exactly have a term for it; certainly THE_ operator as such isn’t an SQL term. Further details are beyond the scope of this book.

2.4 True in principle; might not be completely true in practice (but to the extent it isn’t, we’re talking about a confusion over model vs. implementation). Incidentally, the epigraph to the chapter is highly pertinent to the present exercise. Here it is again: “A major purpose of type systems is to avoid embarrassing questions about representations, and to forbid situations in which these questions might come up.” In other words, types are a good idea because they raise the level of abstraction (without a proper type system, everything would be nothing but tedious—and error prone—bit twiddling). And here’s another nice quote (this one’s from Andrew Wright: “On Sapphire and Type-Safe Languages,” CACM 46, No. 4, April 2003): “[Types make] program development and debugging easier by making program behavior more understandable.”

2.5 A parameter is a formal operand in terms of which some operator is defined. An argument is an actual operand that’s substituted for some parameter in some invocation of the operator in question. (People often use these terms as if they were interchangeable; much confusion is caused that way, and you need to be on the lookout for it.) Note: There’s also a logical difference between an argument as such and the expression that’s used to specify it. For example, consider the expression (2 + 3) - 1, which represents an invocation of the arithmetic operator “-”. The first argument to that invocation is the value five, but that argument is specified by the expression 2 + 3, which represents an invocation of the arithmetic operator “+”. (In fact, of course, every expression represents some operator invocation. Even a simple variable reference—V, say—can be be regarded as representing an invocation of a certain operator: namely, the operator that returns the current value of the specified variable V.)

A database is a repository for data. (Note: In the relational world, we might say, a little more specifically, that a database is a container for relvars. But much more precise definitions are possible; one such can be found in Chapter 5 of this book. See also Appendix A.) A DBMS is a software system for managing databases; it provides data storage, recovery, concurrency, integrity, query/update, and other services.

A foreign key is a subset of the heading of some relvar, values of which must be equal to values of some “target” key in some other relvar (or possibly in the same relvar). A pointer is a value (an address, essentially) for which certain special operators—notably certain referencing and dereferencing operators—can (and in fact must) be defined.²⁶ Note: Brief definitions of the referencing and dereferencing operators were given in a footnote in the body of the chapter.

A generated type is a type obtained by executing some type generator such as ARRAY, RELATION, or (in SQL) CHAR; specific array, relation, and (in SQL) character string types are thus generated types. A nongenerated type is a type that’s not a generated type.

A relation is a value; it has a type—a relation type, of course—but it isn’t itself a type. A type is a named, finite set of values: viz., all possible values of some particular kind.

Type is a model concept; types have semantics that must be understood by the user. Representation is an implementation concept; representations are supposed to be hidden from the user. In particular (and as noted in the body of the chapter), if X is a value or variable of type T, then the operators that apply to X are the operators defined for values and variables of type T, not the operators defined for the representation that applies to values and variables of type T. For example, just because the representation for type ENO (“employee numbers”) happens to be CHAR, say, it doesn’t follow that we can concatenate two employee numbers; we can do that only if concatenation is an operator that’s defined for values of type ENO. See the answer to Exercise 2.4 above for further discussion.

A system defined (or built in) type is a type that’s available for use as soon as the system is installed (it “comes in the same box the system comes in”). A user defined type is a type whose definition and implementation are provided by some suitably skilled user after the system is installed. (To the user of such a type, however—as opposed to the user who actually defines that type—that type should look and feel just like a system defined type.)

A system defined (or built in) operator is an operator that’s available for use as soon as the system is installed (it comes in the same box the system comes in). A user defined operator is an operator whose definition and implementation are provided by some suitably skilled user after the system is installed. (To the user of such an operator, however—as opposed to the user who designs and implements that operator—that operator should look and feel just like a system defined operator.) User defined operators can take arguments of either user or system defined types (or a mixture), but system defined operators can obviously take arguments of system defined types only.

2.6 A scalar type is a type that has no user visible components; a nonscalar type is a type that’s not a scalar type. Values, variables, and operators (etc.) are scalar or nonscalar according as their type is scalar or nonscalar. Be aware, however, that these terms are neither very formal nor very precise, in the final analysis. In particular, we’ll meet a couple of important relations in Chapter 3 called TABLE_DUM and TABLE_DEE that are “scalar” by the foregoing definition!—or so it might be argued, at least.

2.7 Coercion is implicit type conversion. It’s deprecated because it’s error prone (but note that this is primarily a pragmatic issue; whether or not coercions are permitted has little or nothing to do with the relational model as such).

2.8 Because it muddles type and representation.

2.9 A type generator is an operator that returns a type instead of a value (and is invoked at compile time instead of run time). The relational model requires support for two such: namely, TUPLE and RELATION. Points arising:

Types generated by the TUPLE and RELATION type generators are nonscalar, but there’s no reason in principle why generated types have to be nonscalar. SQL in particular supports several scalar type generators (CHAR, NUMERIC, REF, and many others).
Type generators are known by many different names in the literature, including type constructors (the SQL term), parameterized types, polymorphic types, type templates, and generic types.

2.10 A relation is in first normal form (1NF) if and only if every tuple contains a single value, of the appropriate type, in every attribute position; in other words, every relation is in first normal form. Given this fact, you might be forgiven for wondering why we bother to talk about the concept at all (and in particular why it’s called “first”). The reason, as I’m sure you know (and as was in fact mentioned in Chapter 1), is that (a) we can extend it to apply to relvars as well as relations, and then (b) we can define a series of “higher” normal forms for relvars that turn out to be important in database design. In other words, 1NF is the base on which those higher normal forms build. But it really isn’t all that important as a notion in itself.

Note: I should add that 1NF is one of those concepts whose definition has evolved somewhat over the years. It used to be defined to mean that every tuple had to contain a single “atomic” value in every attribute position. As we’ve come to realize, however (and as I tried to show in the body of the chapter), the concept of data value atomicity actually has no objective meaning. An extensive discussion of such matters can be found in the paper “What First Normal Form Really Means” (see Appendix G).

2.11 The type of X is the type, T say, specified as the type of the result of the operator to be executed last—“the outermost operator”—when X is evaluated. That type is significant because it means X can be used in exactly (that is, in all and only) those positions where a literal of type T can appear.

2.12	OPERATOR CUBE ( I INTEGER ) RETURNS INTEGER ; RETURN I * I * I ; END OPERATOR ;
2.13	OPERATOR AREA_OF_R ( H LENGTH , W LENGTH ) RETURNS AREA ; RETURN H * W ; END OPERATOR ;

I’m assuming here, not unreasonably, that (a) it’s legal to multiply (“*”) a value of type LENGTH by another such value, and (b) the result of such a multiplication is a value of type AREA (another user defined type).

2.14 The following relation type is the type of the suppliers relvar S:

RELATION { SNO CHAR , SNAME CHAR , STATUS INTEGER , CITY CHAR }

The suppliers relvar S itself is a variable of this type. And every legal value of that variable—for example, the value shown in Fig. 1.3 in Chapter 1—is a value of this type.

2.15 SQL definitions are given in the answer to Exercise 1.13 in Chapter 1. Tutorial D definitions:

VAR P BASE RELATION
  { PNO CHAR , PNAME CHAR , COLOR CHAR , WEIGHT RATIONAL , CITY CHAR }
  KEY { PNO } ;

VAR SP BASE RELATION
  { SNO CHAR , PNO CHAR , QTY INTEGER }
  KEY { SNO , PNO }
  FOREIGN KEY { SNO } REFERENCES S
  FOREIGN KEY { PNO } REFERENCES P ;

Some differences between the SQL and Tutorial D definitions:

As noted in the answer to Exercise 1.13 in Chapter 1, SQL specifies keys and foreign keys, along with table columns (and certain other items too, beyond the scope of the present discussion), all inside the same set of parentheses—a fact that makes it hard to determine exactly what the pertinent type is. (As a matter of fact, SQL doesn’t really support the concept of a relation type—or table type, rather—at all. See Chapter 3 for further discussion.)
The left to right order in which columns are listed matters in SQL. See Chapter 3 for further discussion.
SQL tables don’t have to have keys at all.

The significance of the fact that relvar P, for example, is of a certain relation type is as follows:

The only values that can ever be assigned to relvar P are relations of that type.
A reference to relvar P can appear wherever a literal of that type can appear (as in, for example, the expression P JOIN SP), in which case it denotes the relation that happens to be the current value of that relvar at the pertinent time. (In other words, a relvar reference is a valid relational expression in Tutorial D; note, however, that an analogous remark does not apply to SQL, at least not 100 percent.) See Chapters 6 and 12 for further discussion.

One further point: As you can see, I’ve defined attribute QTY to be of type INTEGER. However, my reason for doing so is partly historical—every DBMS I know supports type INTEGER, while few DBMSs if any support the type that would really be more appropriate in the case at hand (viz., NONNEGATIVE_INTEGER, with the obvious semantics). Of course, we could make NONNEGATIVE_INTEGER a user defined type, but as I’ve said I don’t want to get into too much detail regarding user defined types in this book.

2.16 I assume throughout the following answers a.-g. that a given type T always has a selector with the same name. See Chapter 8 for further discussion.

Not valid; LOCATION = CITY('London').
Valid; BOOLEAN.
Presumably valid; MONEY. I’m assuming that multiplying a money value by an integer returns another money value.
Not valid; BUDGET + MONEY(50000).
Not valid; ENO > ENO('E2').
Not valid; NAME(THE_C(ENAME) || THE_C(DNAME)). I’m assuming that type NAME has a single “possrep component”—see Chapter 8—called C, of type CHAR.
Not valid; CITY(THE_C(LOCATION) || 'burg'). I’m assuming that type CITY has a single “possrep component” called C, of type CHAR.

2.17 Such an operation logically means replacing one type by another, not “updating a type” (types aren’t variables and hence can’t be updated, by definition). Consider the following. First of all, the operation of defining a type doesn’t actually create the corresponding set of values; conceptually, those values already exist, and always will exist (think of type INTEGER, for example). Thus, all the “define type” operation (the TYPE statement, in the case of Tutorial D— see Chapter 8) really does is introduce a name by which that set of values can be referenced. Likewise, dropping a type doesn’t actually drop the corresponding values, it just drops the name that was introduced by the corresponding “define type” operation. It follows that “updating a type” really means dropping the type name and then reintroducing that very same name to refer to a different set of values. Of course, there’s nothing to preclude support for some kind of pragmatic “alter type” shorthand to simplify matters—and SQL does support such an operator, in fact—but invoking such a shorthand shouldn’t be thought of as “updating the type.”

2.18 The empty type is certainly a valid type; however, it wouldn’t make much sense to define a variable to be of such a type, because no value could ever be assigned to such a variable! Despite this fact, the empty type turns out to be crucially important in connection with type inheritance—but that’s a topic that’s (sadly) beyond the scope of the present book. Refer to the book Databases, Types, and the Relational Model: The Third Manifesto, by Hugh Darwen and myself (see Appendix G), if you want to know more.

2.19 Let T be an SQL type for which “=” isn’t defined and let C be a column of type T. Then C can’t be part of a key or foreign key, nor can it be part of the argument to DISTINCT or GROUP BY or ORDER BY, nor can restrictions or joins or unions or intersections or differences be defined in terms of it. And what about implementation constructs such as indexes? There are probably other implications as well.

Second, let T be an SQL type for which the semantics of “=” are user defined (so T is necessarily user defined itself), and let C be a column of type T. Then the effects of making C part of a key or foreign key or applying DISTINCT or GROUP BY (etc., etc.) to it will be user defined as well. Note: Presumably for this very reason, the standard doesn’t actually allow such a column C to be used in all of the contexts just mentioned—and possibly not in any of them (?). The specifics of exactly what’s allowed are baroque in the extreme, however; so if you want to know more, I’m afraid I’m going to have to refer you to the standard itself (see Appendix G).

2.20 Here’s a trivial example of such violation. Let X be the character string 'AB ' (note the trailing space), let Y be the character string 'AB', and let PAD SPACE apply to the pertinent collation. Then the comparison X = Y gives TRUE, and yet the operator invocations CHAR_LENGTH(X) and CHAR_LENGTH(Y) give 3 and 2, respectively. (Note too that even though the comparison X = Y gives TRUE, the comparison X||X = Y||Y doesn’t!) I leave the detailed implications for you to think about, but it should be clear that problems are likely to surface in connection with DISTINCT, GROUP BY, and ORDER BY operations among others (as well as in connection with keys, foreign keys, and certain implementation constructs, such as indexes).

2.21 Because (a) they’re logically unnecessary, (b) they’re error prone, (c) end users can’t readily use them, (d) they’re clumsy—in particular, they have a direction to them, which other values don’t—and (e) they undermine type inheritance. (Details of this last point are beyond the scope of this book.) There are other reasons too. See the paper cited in a footnote to the answer to Exercise 2.5, “Inclusion Dependencies and Foreign Keys,” for further discussion.

2.22 One answer has to do with nulls; if we “set X to null” (which isn’t really assigning a value to X, because nulls aren’t values, but never mind), the comparison X = NULL certainly doesn’t give TRUE. There are many other examples too, not involving reliance on nulls. E.g., let X be a variable of type CHAR(3), let Y be the character string 'AB' (no trailing space), and let NO PAD apply to the pertinent collation. Then assigning Y to X will actually set X to the string 'AB ' (one trailing space), and after that assignment the comparison X = Y will give FALSE. Again I leave the implications for you to think about.

2.23 No! (Which database does type INTEGER belong to?) In an important sense, the whole subject of types and type management is orthogonal to the subject of databases and database management. We might even imagine the need for a “type administrator,” whose job it would be to look after types in a manner analogous to that in which the database administrator looks after databases.

2.24 An expression represents an operator invocation, and it denotes a value; it can be thought of as a rule for computing or determining the value in question. (Incidentally, the arguments to that operator invocation are themselves specified as expressions in turn—though the expressions in question might just be simple literals or simple variable references.) By contrast, a statement doesn’t denote a value; instead, it causes some action to occur, such as assigning a value to some variable or changing the flow of control. In SQL, for example,

X + Y

is an expression, but

SET Z = X + Y ;

is a statement.

2.25 An RVA is an attribute whose type is some relation type, and whose values are therefore relations of that type (see Chapter 7 for further discussion). A repeating group is an “attribute” of some type T whose values aren’t values of type T—note the contradiction in terms here!— but, rather, bags or sets or sequences (or ...) of values of type T. Note: Type T here is often a tuple type (or something approximating a tuple type). In a system that allows repeating groups, for example, a file might be such that each record consists of an ENO field (employee number), an ENAME field (employee name), and a repeating group JOBHIST, in which each entry consists of a JOB field (job title), a FROM field, and a TO field (where FROM and TO are dates).

2.26 “Subquery” is an SQL term meaning, loosely, a SELECT expression enclosed in parentheses. Later chapters will elaborate (especially Chapter 12).

2.27 Regarding SQL row and table types, see Chapter 3. As for type BOOLEAN, yes, “=” does apply; TRUE is equal to TRUE and FALSE is equal to FALSE. In SQL, what’s more, “<” applies as well!—FALSE is considered to be less than TRUE (i.e., the comparison “FALSE < TRUE” returns TRUE, in SQL).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for SQL and Relational Theory

Create new playlist

Sign In

Sign Up