
I assume you’re a database practitioner and therefore reasonably familiar with SQL already. To be specific, I assume you have a working knowledge of either the SQL standard or (perhaps more likely in practice) at least one SQL product. However, I don’t assume you have a deep knowledge of relational theory as such (though I do hope you understand that the relational model is a good thing in general, and adherence to it wherever possible is a desirable goal). In order to avoid misunderstandings, therefore, I’ll be describing various features of the relational model in detail, as well as showing how to use SQL to conform to those features. But what I won’t do is attempt to justify all of those features; rather, I’ll assume you’re sufficiently experienced in database matters to understand why, e.g., the notion of a key makes sense, or why you sometimes need to do a join, or why many to many relationships need to be supported. (If I were to include such justifications, this would be a very different book—quite apart from anything else, it would be much bigger than it already is—and in any case, that book has already been written.)

I’ve said I expect you to be reasonably familiar with SQL. However, I should add that I’ll be explaining certain aspects of SQL in detail anyway—especially aspects that might be encountered less frequently in practice. (The SQL notion of possibly nondeterministic expressions is a case in point here. See Chapter 12.)

Database in Depth

This book is based on, and intended to replace, an earlier one with the title Database in Depth: Relational Theory for Practitioners (O’Reilly Media Inc., 2005). My aim in that earlier book was as follows (this is a quote from the preface):

After many years working in the database community in various capacities, I’ve come to realize there’s a real need for a book for practitioners (not novices) that explains the basic principles of relational theory in a way not tainted by the quirks and peculiarities of existing products, commercial practice, or the SQL standard. I wrote this book to fill that need. My intended audience is thus experienced database practitioners who are honest enough to admit they don’t understand the theory underlying their own field as well as they might, or should. That theory is, of course, the relational model—and while it’s true that the fundamental ideas of that theory are all quite simple, it’s also true that they’re widely misrepresented, or underappreciated, or both. Often, in fact, they don’t seem to be understood at all. For example, here are a few relational questions ... How many of them can you answer?[1]

  1. What exactly is first normal form?

  2. What’s the connection between relations and predicates?

  3. What’s semantic optimization?

  4. What’s an image relation?

  5. Why is semidifference important?

  6. Why doesn’t deferred integrity checking make sense?

  7. What’s a relation variable?

  8. What’s prenex normal form?

  9. Can a relation have an attribute whose values are relations?

  10. Is SQL relationally complete?

  11. Why is The Information Principle important?

  12. How does XML fit with the relational model?

    This book provides answers to these and many related questions. Overall, it’s meant to help database practitioners understand relational theory in depth and make good use of that understanding in their professional day-to-day activities.

As the final sentence in this extract indicates, it was my hope that readers of that book would be able to apply its ideas for themselves, without further assistance from me as it were. But I’ve since come to realize that, contrary to popular opinion, SQL is such a difficult language that it can be far from obvious how to use it without violating relational principles. I therefore decided to expand the original book to include explicit, concrete advice on exactly that issue (how to use SQL relationally, I mean). So my aim in the present book is still the same as before—I want to help database practitioners understand relational theory in depth and make good use of that understanding in their professional activities—but I’ve tried to make the material a little easier to digest, perhaps, and certainly easier to apply. In other words, I’ve included a great deal of SQL-specific material (and it’s this fact, more than anything else, that accounts for the increase in size over the previous book).

Further Remarks on the Text

I need to take care of several further preliminaries. First of all, my own understanding of the relational model has evolved over the years, and continues to do so. This book represents my very latest thinking on the subject; thus, if you detect any technical discrepancies—and there are a few—between this book and other books you might have seen by myself (including in particular the one the present book is meant to replace), the present book should be taken as superseding. Though I hasten to add that such discrepancies are mostly of a fairly minor nature; what’s more, I’ve taken care always to relate new terms and concepts to earlier ones, wherever I felt it was necessary to do so.

Second, I will, as advertised, be talking about theory—but it’s an article of faith with me that theory is practical. I mention this point explicitly because so many seem to believe the opposite: namely, that if something’s theoretical, it can’t be practical. But the truth is that theory (at least, relational theory, which is what I’m talking about here) is most definitely very practical indeed. The purpose of that theory is not just theory for its own sake; the purpose of that theory is to allow us to build systems that are 100 percent practical. Every detail of the theory is there for solid practical reasons. As Stéphane Faroult, a reviewer of the earlier book, wrote: “When you have a bit of practice, you realize there’s no way to avoid having to know the theory.” What’s more, that theory is not only practical, it’s fundamental, straightforward, simple, useful, and it can be fun (as I hope to demonstrate in the course of this book).

Of course, we really don’t have to look any further than the relational model itself to find the most striking possible illustration of the foregoing thesis. Indeed, it really shouldn’t be necessary to have to defend the notion that theory is practical, in a context such as ours: namely, a multibillion dollar industry totally founded on one great theoretical idea. But I suppose the cynic’s position would be “Yes, but what has theory done for me lately?” In other words, those of us who do think theory is important must continually be justifying ourselves to our critics—which is another reason why I think a book like this one is needed.

Third, as I’ve said, the book does go into a fair amount of detail regarding features of SQL or the relational model or both. (It deliberately has little to say on topics that aren’t particularly relational; for example, there isn’t much on transactions.) Throughout, I’ve tried to make it clear when the discussions apply to SQL specifically, when they apply to the relational model specifically, and when they apply to both. I should emphasize, however, that the SQL discussions in particular aren’t meant to be exhaustive. SQL is such a complex language, and provides so many different ways of doing the same thing, and is subject to so many exceptions and special cases, that to be exhaustive—even if it were possible, which I tend to doubt—would be counterproductive; certainly it would make the book much too long. So I’ve tried to focus on what I think are the most important issues, and I’ve tried to be as brief as possible on the issues I’ve chosen to cover. And I’d like to claim that if you do everything I tell you, and don’t do anything I don’t tell you, then to a first approximation you’ll be safe: You’ll be using SQL relationally. But whether that claim is justified, or to what extent it is, must be for you to judge.

To the foregoing I have to add that, unfortunately, there are some situations in which SQL just can’t be used relationally. For example, some SQL integrity checking simply has to be deferred (usually to commit time), even though the relational model explicitly rejects such checking as logically flawed. The book does offer advice on what to do in such cases, but I fear it often boils down to just Do the best you can. At least I hope you’ll understand the risks involved in departing from the model.

I should say too that some of the recommendations offered aren’t specifically relational anyway but are, rather, just matters of general good practice—though sometimes there are relational implications (implications that can be a little unobvious, too, perhaps I should add). Avoid coercions is a good example here.

Fourth, please note that I use the term SQL throughout the book to mean the standard version of that language exclusively, not some proprietary dialect, barring explicit statements to the contrary. In particular, I follow the standard in assuming the pronunciation “ess cue ell,” not “sequel” (though this latter is common in the field), thereby saying things like an SQL table, not a SQL table.

Fifth, the book is meant to be read in sequence, pretty much, except as noted here and there in the text itself (most of the chapters do rely to some extent on material covered in earlier ones, so you shouldn’t jump around too much). Also, each chapter includes a set of exercises. You don’t have to do those exercises, of course, but I think it’s a good idea to have a go at some of them at least. Answers, often giving more information about the subject at hand, are given in Appendix F.

Finally, I’d like to mention that I have some live seminars available based on the material in this book. See or for further details. An online version of one of those seminars is available too, at

I’d been thinking for some time about revising the earlier book to include more on SQL in particular, but the spur that finally got me down to it was sitting in on a class, late in 2007, for database practitioners. The class was taught by Toon Koppelaars and was based on the book he wrote with Lex de Haan (see Appendix G of the present book), and very good it was, too. But what struck me most about that class was seeing firsthand the kinds of difficulties the attendees had in applying relational and logical principles to their use of SQL. Now, I do assume those attendees had some knowledge of those topics—they were database practitioners, after all—but it seemed to me they really needed some guidance in the application of those ideas to their daily database activities. And so I put this book together. So I’m thankful, first of all, to Toon and Lex for providing me with the necessary impetus to get started on this project. I’m grateful also to my reviewers Herb Edelstein, Sheeri Ktitzer, Andy Oram, Peter Robson, and Baron Schwartz for their comments on earlier drafts, and Hugh Darwen and Jim Melton for other technical assistance. Next, I’d like to thank my wife Lindy, as always, for her support throughout this and all of my other database projects over the years. Finally, I’m grateful to everyone at O’Reilly—especially Isabel Kunkle and Andy Oram—for their encouragement, contributions, and support throughout the production of this book.

C. J. Date

Healdsburg, California


[1] For reasons that aren’t important here, I’ve replaced a few of the questions in this list by new ones.

