The New Relational Database Dictionary

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Introduction

This dictionary contains over 1,700 entries dealing with issues, terms, and concepts involved in, or arising from use of, the relational model of data. Most of the entries include not only a definition as such—often several definitions, in fact—but also an illustrative example (sometimes more than one). What’s more, I’ve tried to make those entries as clear, precise, and accurate as I can; they’re based on my own best understanding of the material, an understanding I’ve gradually been honing over some 45 years of involvement in this field.

I’d also like to stress the fact that the dictionary is, as advertised, relational. To that end, I’ve deliberately omitted many topics that are only tangentially connected to relational databases as such (in particular, topics that have to do with database technology in general, as opposed to relational databases specifically); for example, I have little or nothing to say about security, recovery, or concurrency matters. I’ve also omitted certain SQL topics that—despite the fact that SQL is supposed to be a relational language—aren’t really relational at all (cursors, outer join, and SQL’s various “retain duplicates” options are examples here). At the same time, I’ve deliberately included a few nonrelational topics in order to make it clear that, contrary to popular opinion, the topics in question are indeed nonrelational (index is a case in point here).

I must explain too that this is a dictionary with an attitude. It’s my very firm belief that the relational model is the right and proper foundation for database technology and will remain so for as far out as anyone can see, and many of the definitions in what follows reflect this belief. As I said in my book SQL and Relational Theory: How to Write Accurate SQL Code (3rd edition, O’Reilly Media Inc., 2015):

In my opinion, the relational model is rock solid, and “right,” and will endure. A hundred years from now, I fully expect database systems still to be based on Codd’s relational model. Why? Because the foundations of that model—namely, set theory and predicate logic—are themselves rock solid in turn. Elements of predicate logic in particular go back well over 2000 years, at least as far as Aristotle (384–322 BCE).

Partly as a consequence of this state of affairs, I haven’t hesitated to mark some term or concept as deprecated if I believe there are good reasons to avoid it, even if the term or concept in question is in widespread use at the time of writing. Materialized view is a case in point here.

The Suppliers-and-Parts Database

Many of the examples used to illustrate the definitions are based on the familiar (not to say hackneyed) suppliers-and-parts database. I apologize for dragging out this old warhorse yet one more time, but as I’ve said many times before, I believe that using the same example—or essentially the same example, at any rate—in a variety of different publications can be a help, not a hindrance, in learning. Here are the relvar definitions for that database (and if you don’t know what a relvar is, then please see the pertinent dictionary entry!):

VAR S BASE RELATION
  { SNO    SNO ,
    SNAME  NAME ,
    STATUS INTEGER ,
    CITY   CHAR }
  KEY { SNO } ;

VAR P BASE RELATION
  { PNO    PNO ,
    PNAME  NAME ,
    COLOR  COLOR ,
    WEIGHT WEIGHT ,
    CITY   CHAR }
  KEY { PNO } ;

VAR SP BASE RELATION
  { SNO    SNO ,
    PNO    PNO ,
    QTY    QTY }
  KEY { SNO , PNO }
  FOREIGN KEY { SNO } REFERENCES S
  FOREIGN KEY { PNO } REFERENCES P ;

These definitions are expressed in a language called Tutorial D (see the section “Technical Issues” below for further explanation). The semantics are as follows:

Relvar S represents suppliers under contract. Each supplier has one supplier number (SNO), unique to that supplier; one name (SNAME), not necessarily unique; one status value (STATUS); and one location (CITY). Attributes SNO, SNAME, STATUS, and CITY are of types SNO, NAME, INTEGER, and CHAR, respectively.
Relvar P represents kinds of parts. Each kind of part has one part number (PNO), which is unique; one name (PNAME); one color (COLOR); one weight (WEIGHT); and one location where parts of that kind are stored (CITY). Attributes PNO, PNAME, COLOR, WEIGHT, and CITY are of types PNO, NAME, COLOR, WEIGHT, and CHAR, respectively.
Relvar SP represents shipments (it shows which parts are shipped, or supplied, by which suppliers). Each shipment has one supplier number (SNO), one part number (PNO), and one quantity (QTY). There’s at most one shipment at any given time for a given supplier and given part, and so the combination of supplier number and part number is unique to the shipment in question. Attributes SNO, PNO, and QTY are of types SNO, PNO, and QTY, respectively.

Fig. 1 shows a set of sample values for these relvars. Examples in the body of the dictionary assume those specific values, where applicable.

Fig. 1: The suppliers-and-parts database—sample values

Alphabetization

For alphabetization purposes, I’ve followed these rules:

Blanks precede numerals.
Numerals precede letters.
Uppercase precedes lowercase.
Punctuation symbols (parentheses, hyphens, underscores, etc.) are treated as blanks.

Technical Issues

Keywords, variable names, and the like are set in all uppercase throughout.
Coding examples are expressed, mostly, in a language called Tutorial D. Now, I believe those examples are reasonably self-explanatory, but in any case that language is largely defined in the dictionary itself in the entries for the various relational operators (projection, join, and so on). A comprehensive description of the language can be found if needed in the book Databases, Types, and the Relational Model: The Third Manifesto (3rd edition), by C. J. Date and Hugh Darwen (Addison-Wesley, 2007). To elaborate briefly: As its subtitle indicates, that book—the Manifesto book for short—also introduces and explains The Third Manifesto, which is a precise though somewhat formal definition of the relational model and a supporting type theory (including a comprehensive model of type inheritance). In particular, that book uses the name D as a generic name for any language that conforms to the principles laid down by The Third Manifesto. Any number of distinct languages could qualify as a valid D; sadly, however, SQL isn’t one of them, which is why coding examples are expressed for the most part in Tutorial D and not SQL. (Tutorial D is, of course, a valid D; in fact, it was expressly designed to be suitable as a vehicle for illustrating and teaching the ideas of The Third Manifesto.)

Note: Tutorial D has been revised and extended somewhat since the Manifesto book was first published. A description of the current version can be found in the book Database Explorations: Essays on The Third Manifesto and Related Topics, by C. J. Date and Hugh Darwen (Trafford, 2010)—available online at the Manifesto website www.thethirdmanifesto.com.¹ What’s more, that Explorations book also includes some proposals for extending the language still further (e.g., to incorporate explicit foreign key support), proposals that for the purposes of this dictionary I assume to have been adopted.
Following on from the previous point, I should make it clear that definitions in this dictionary are intended to conform fully to the relational model as defined by The Third Manifesto. As a consequence, you might find certain aspects of those definitions a trifle surprising—for example, the assertion in the entry for deferred checking that such checking is logically flawed. As I’ve said, this is a dictionary with an attitude.
The notion of set is ubiquitous in the database world. On paper, a set is typically represented by a comma separated list (or “commalist”) of items denoting the elements that constitute the set in question, the whole enclosed in braces, as here: {a,b,c}. (Blanks appearing immediately before the first item or any comma, or immediately after the last item or any comma, are ignored.) Throughout this dictionary, therefore, I use braces to enclose commalists of items whenever the items in question are meant to denote the elements of some set, implying among other things that (a) the order in which the items appear within that commalist is immaterial and (b) if some item appears more than once, it’s treated as if it appeared just once.
Tutorial D in particular uses braces to enclose the commalist of argument expressions in certain n-adic (prefix) operator invocations. If the operator in question is idempotent, as in the case of, e.g., JOIN, then the argument expression commalist truly does represent a set of arguments, and the remarks of the previous paragraph apply unconditionally. For other operators, however, the argument expression commalist represents a bag of arguments, not a set—in which case the order in which the argument expressions appear is still immaterial, but repetition has significance (despite the fact that Tutorial D and this dictionary do still both use braces in such a context). For example, the operator XOR (“exclusive OR”)—meaning the version of that operator defined in this dictionary, at any rate—isn’t idempotent. As a consequence, the Tutorial D expressions

XOR { TRUE , FALSE }

and

XOR { TRUE , FALSE , TRUE }

aren’t logically equivalent—the first returns TRUE and the second FALSE.
The notion of logic is, of course, also ubiquitous in the database world. The relational model in particular is firmly based on logic. More precisely, it’s based on conventional two-valued logic (“2VL”), and all references to logic in this dictionary should be taken as referring to that logic specifically, except very occasionally where the context demands otherwise. Note: As these remarks suggest, many of the dictionary entries do have to do with concepts from logic. Unfortunately, logic texts (and logicians) vary widely not just in the terminology they use but also, in some cases, in the substance of their definitions. The definitions I give are the ones I find most appropriate myself, but be warned that they’re sometimes at odds with others you can find in the literature.
A note on the relational operators: Perhaps unfortunately, it has become standard practice in the database world to use terms such as projection, join, and so on in two somewhat different senses. To be specific, they’re used to refer sometimes to those operators as such and sometimes to the results obtained when those operators are invoked. I’ve followed this practice myself in this dictionary on occasion, and hope it won’t lead to confusion.
In fact, it has become standard practice to use terms such as projection, join, and so on in another sense also. By definition, these operators apply to relation values specifically. In particular, of course, they apply to the values that happen to be the current values of relvars. It thus clearly makes sense to talk about, e.g., the join of relvars R1 and R2, meaning the relation r that results from taking the join of the current values r1 and r2, respectively, of those two relvars. In some contexts, however (normalization, for example, also view processing), it turns out to be convenient to use expressions like “the join of relvars R1 and R2” in a slightly different sense. To be specific, we might say, loosely but very conveniently, that some relvar, R say, is the join of relvars R1 and R2—meaning, more precisely, that the value of R is equal at all times to the join of the values of R1 and R2 at the time in question. In a sense, therefore, we can talk in terms of joins of relvars per se, rather than just in terms of joins of current values of relvars. Analogous remarks apply to all of the relational operations.
Regarding projection in particular, please note that Tutorial D treats projection as having very high precedence, in order to reduce the number of parentheses that might otherwise be required in relational expressions. For example, the Tutorial D expression

SP JOIN S { SNO }

is defined to be equivalent to

SP JOIN ( S { SNO } )

and not

( SP JOIN S ){ SNO }
Talk of projection raises yet another point. Here’s the definition from the pertinent dictionary entry:

Let relation r have attributes called A1, A2, ..., An (and possibly others). Then (and only then) the expression r{A1,A2,...,An} denotes the projection of r on {A1, A2, ..., An}, and it returns the relation with heading {A1,A2,...,An} and body consisting of all tuples t such that there exists a tuple in r that has the same value for attributes A1, A2, ..., An as t does.

Now, if the result has heading {A1,A2,...,An}, then by definition each of those Ai’s is an <attribute name, type name> pair. But in the projection expression r{A1,A2,...,An}, each of those Ai’s is just an attribute name. (The syntax works because attribute names are unique within the pertinent heading and thus imply the associated type names.) So there’s a kind of punning going on here: The very same symbol Ai is being used to denote slightly different things in different contexts.

Generalizing slightly from the foregoing remarks, please understand that the term attribute is sometimes used in the body of the dictionary to mean an attribute name rather than an attribute as such; likewise, the term heading is sometimes used to mean a set of attribute names rather than a set of attributes as such. I apologize if you find this state of affairs confusing, but once again it’s fairly standard practice.

Note: While I’m on the subject of headings, I should mention that in previous versions of this dictionary, headings were denoted {H}; in the present version, by contrast, they’re denoted simply H (i.e., the enclosing braces have been dropped).
There’s another convention I need to mention (yet again it’s fairly standard, but it’s worth spelling out in detail in order to avoid any possible confusion). It’s illustrated by, e.g., the entry for joinable, which includes the following sentence:

Relations r1, r2, ..., rn (n ≥ 0) are joinable if and only if for all i and j, relations ri and rj are joinable (1 ≤ i ≤ n, 1 ≤ j ≤ n).

Consider the opening part of this sentence—“Relations r1, r2, ..., rn (n ≥ 0) are joinable.” Here the case n = 0 is to be understood as meaning, not that there exists a relation, not mentioned in the commalist, called r0, but rather that the commalist is empty—i.e., there aren’t in fact any relations at all.

Similarly, consider the closing part of the sentence—“relations ri and rj are joinable (1 ≤ i ≤ n, 1 ≤ j ≤ n).” Here the case n = 0 is to be understood as meaning that there aren’t any i’s or j’s, and hence that there are no relations ri and rj.
I’d also like to draw your attention to still another standard convention, followed throughout this dictionary (and in fact spelled out explicitly in the pertinent dictionary entries): viz., I use the generic term update in lowercase to refer to—among other things—the familiar INSERT, DELETE, and UPDATE operators considered collectively. By contrast, when I want to refer to the UPDATE operator as such, I’ll set it in uppercase (“all caps”) as just shown.
Certain of the definitions and examples make use of a simplified notation for tuples. For example, consider the SP tuple shown in Fig. 1 for supplier S1 and part P1. A formal Tutorial D representation of that tuple might look like this:

TUPLE { SNO SNO('S1') , PNO PNO('P1') , QTY QTY(300) }

In the simplified notation under discussion, however, the same tuple would be represented thus:

<S1,P1,300>

—or, very occasionally, sometimes even thus:

S1 P1 300
This dictionary has almost nothing to say about distributed databases or related matters. The reason is that the whole point about a distributed database as far as the relational model is concerned is that it’s supposed to look exactly like a nondistributed database! In other words, all of the problems of distributed databases (and problems there most certainly are) are, at least in an ideal system, problems of physical implementation, not problems of the logical model.
Finally, please note that all references to SQL in this dictionary are to the version of that language defined by the official SQL standard. As you might be aware, however, that standard has been through several versions, or editions, over the years. The version current at the time of writing—and the version on which references to SQL in this dictionary are based—is the 2011 version (“SQL:2011”). Here’s the formal reference:

International Organization for Standardization (ISO), Database Language SQL, Document ISO/IEC 9075:2011.

Publishing History and Structure of This Edition

This is the third version, or edition, of this dictionary; the first (with the title The Relational Database Dictionary) was published by O’Reilly Media Inc. in 2006, and the second (with the title The Relational Database Dictionary, Extended Edition) by Apress in 2008. The following remarks are taken from the introduction to that second edition:

It’s a fact of life that dictionaries always expand from one edition to the next. The first edition of this dictionary had just over 600 entries; this one has over 900—an almost 50 percent increase. New entries include atomic relvar, attribute reference, cardinality constraint, class, computational completeness, connection trap, default, field, Great Divide, overriding, referential cycle, safe expression, stored procedure, and many others. I’ve also taken the opportunity to improve (and in a few cases correct) several of the existing entries; examples here include derived relation, essentiality, fifth normal form, foreign key, JD implied by superkeys, NAND, NOR, ordering, and pointer. No entries have been removed!

One thing I was slightly surprised to discover in working on this edition was the extent to which database concepts rely, ultimately, on certain mathematical terms and constructs. As a result, I decided to include a few somewhat mathematical entries; examples here include boolean algebra, group, inverse, nonnegative, partial ordering, and mathematical (as opposed to relational model) definitions for relation and tuple. The relevance of such entries might not be immediately apparent, but I felt it was useful to collect them together in one place in order to serve as a convenient reference for anyone who wishes to delve a little more deeply into the precise meaning and origins of a term like relational algebra (or the term relation itself, come to that).

The foregoing remarks, suitably amended, apply to this new edition as well, but with even more force (which is why I decided to use the slightly revised, but I believe merited, title The New Relational Database Dictionary). There are now over 1,700 entries in total (an almost 90% increase over the previous edition); new ones include axiom of choice, constant reference, disjoint INSERT, domain of discourse, double negation, exclusive union, individual constant, logical difference, mediator, possibly nondeterministic, primary key attribute, Query-By-Example, repeating field, scalar operator, and tuple product. In addition, numerous existing entries have been expanded and improved (and occasionally corrected), cosmetic improvements have been made throughout, and many more examples have been included.

But the foregoing remarks are far from being the whole of the story. Indeed, the major reason for the increase in size in this edition is that I decided to include, this time around, both (a) definitions arising from the underlying theory of types—including those having to do with the concept of type inheritance in particular—and (b) definitions arising from the use of interval types in particular. Thus, the dictionary is now divided into three parts, as follows:

Part I: Given that relations have attributes and attributes have types (also called domains), it’s clear that relational theory does rely on, or assume, a supporting type theory. But nowhere does it say what that theory has to look like. In other words, relational theory and type theory are, at least to a first approximation, completely independent of one another. At the same time, it’s quite difficult—certainly less than fully satisfactory, at least—to define and illustrate relational concepts properly without saying something about the underlying theory of types. Thus, Part I of this new dictionary (“Types and Relations”), which effectively subsumes the previous edition in its entirety, now contains numerous entries having to do with that type theory specifically. (Those entries, like the ones having to do with relational theory as such, are all intended to conform to the prescriptions laid down by The Third Manifesto. As you’ll soon see, however, the inclusion of such entries inevitably led to the inclusion of several further entries dealing with concepts from the world of object orientation (OO). But those entries too are intended to conform to the prescriptions of The Third Manifesto, inasmuch as it makes sense for them to do so.)
Part II: As mentioned earlier in these introductory notes, the Manifesto book not only defines a theory of types as such, it builds on that theory to define a model of type inheritance (“the Manifesto model”).² Part II of the dictionary (“Inheritance”) deals with terms and concepts arising in connection with that model. The definitions and examples in that part of the dictionary are intended to conform to that model specifically. More details can be found in the Manifesto book.
Part III: Finally, Part III of the dictionary (“Intervals”) deals with terms and concepts arising in connection with the theory of intervals. Interval theory provides the formal underpinnings for the support of data of any of a variety of interval types; in particular, it supports the pragmatically important case of temporal data specifically. The definitions and examples in this part of the dictionary are intended to conform to the theory presented in the book Time and Relational Theory: Temporal Data in the Relational Model and SQL, by C. J. Date, Hugh Darwen, and Nikos A. Lorentzos (Morgan Kaufmann, 2014), where further details can be found.

Note: All three parts include a few additional remarks of an introductory nature that are specific to the part in question.

Acknowledgments

This dictionary was Jonathan Gennick’s brainchild. Indeed, Jonathan originally intended to write it himself, and I’m very grateful to him for stepping out of the limelight, as it were, and letting me steal his idea and run with it as I’ve done. Jonathan and I have very different writing styles, and what follows is no doubt a long way from what he originally had in mind; but I hope it at least does justice to his overall vision. I’d also like to thank Apress (publisher of the second edition) for allowing me to return to O’Reilly Media Inc. (publisher of the first edition) with this vastly expanded new version, and my friends and colleagues Hugh Darwen and (for Part III in particular) Nikos Lorentzos for numerous helpful comments and much technical assistance over the past several years. It goes without saying that any remaining errors and infelicities are my own responsibility.

C. J. Date
Healdsburg, California 2015

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for The New Relational Database Dictionary

Create new playlist

Sign In