Another query: “Get full supplier details for suppliers who supply all purple parts.” Note: This query, or one very like it, is often used to demonstrate a flaw in the relational divide operator as originally defined. See the further remarks on this topic at the end of the present section.
Here first is a logical formulation:
{ SX } WHERE FORALL PX ( IF PX.COLOR = 'Purple' THEN EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) )
(“names of suppliers SX such that, for all parts PX, if PX is purple, there exists a shipment SPX with SNO equal to the supplier number for supplier SX and PNO equal to the part number for part PX”). First we apply the implication law:
{ SX } WHERE FORALL PX ( NOT ( PX.COLOR = 'Purple' ) OR EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) )
Next De Morgan:
{ SX } WHERE FORALL PX ( NOT ( ( PX.COLOR = 'Purple' ) AND NOT EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) ) )
Apply the quantification law:
{ SX } WHERE NOT EXISTS PX ( NOT ( NOT ( ( PX.COLOR = 'Purple' ) AND NOT EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) ) ) )
Double negation:
{ SX } WHERE NOT EXISTS PX ( ( PX.COLOR = 'Purple' ) AND NOT EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) )
Drop some parentheses and map to SQL:
SELECT * FROM S AS SX WHERE NOT EXISTS ( SELECT * FROM P AS PX WHERE PX.COLOR = 'Purple' AND NOT EXISTS ( SELECT * FROM SP AS SPX WHERE SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) )
Recall now from Chapter 7 that if there aren’t any purple parts, every supplier supplies all of them—even supplier S5, who supplies no parts at all (see the discussion of empty ranges in Chapter 10 for further explanation). So the result is the entire suppliers relation:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Now, you might have had some difficulty in following the transformations in the foregoing example, and you might also be having some difficulty in understanding the final SQL formulation. Well, a useful technique, when the expressions start getting a little complicated as in this example, is to abstract a little by introducing symbolic names for subexpressions (I did briefly mention this point in the previous chapter, but now I want to get more specific). Let’s use exp1 to denote the subexpression
PX.COLOR = 'Purple'
and exp2 to denote the subexpression
EXISTS SPX ( SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO )
(note that both of these subexpressions can be directly represented, more or less, in SQL). Then the original relational calculus expression becomes:
{SX } WHERE FORALL PX ( IFexp1
THENexp2
)
As I said in the previous chapter, now we can see the forest as well as the trees (as it were), and we can start to apply our usual transformations—though now it seems to make more sense to apply them in a different sequence, precisely because we do now have a better grasp of the big picture. First, then, the quantification law:
{SX } WHERE NOT EXISTS PX ( NOT ( IFexp1
THENexp2
) )
Implication law:
{SX } WHERE NOT EXISTS PX ( NOT ( NOT (exp1
) ORexp2
) )
De Morgan:
{SX } WHERE NOT EXISTS PX ( NOT ( NOT (exp1
AND NOT (exp2
) ) ) )
Double negation:
{SX } WHERE NOT EXISTS PX (exp1
AND NOT (exp2
) )
Finally, expand exp1 and exp2 and map to SQL:
SELECT * FROM S AS SX WHERE NOT EXISTS ( SELECT * FROM P AS PX WHERE PX.COLOR = 'Purple' AND NOT EXISTS ( SELECT * FROM SP AS SPX WHERE SPX.SNO = SX.SNO AND SPX.PNO = PX.PNO ) )
As I think this example demonstrates, SQL expressions obtained by the techniques under discussion are often quite hard to understand directly; as I said earlier, however, we know they’re correct, because of the systematic manner in which they’ve been derived.[158]
As an aside, I can’t resist showing a Tutorial D version of the example by way of comparison:
S WHERE ( !!SP ) { PNO } ⊇ ( P WHERE COLOR = 'Purple' ) { PNO }
Now let me explain the remark I made at the beginning of this section, regarding divide. Let’s denote the restriction P WHERE COLOR = ‘Purple’ by the symbol PP. Also, let’s simplify the query at hand—“Get full supplier details for suppliers who supply all purple parts”—such that it asks for supplier numbers only, instead of full supplier details. Then it might be thought that the query could be represented by the following algebraic expression:
SP { SNO , PNO } DIVIDEBY PP { PNO }
Note: DIVIDEBY here represents the divide operator as originally defined. See Chapter 7 if you need an explanation of this point.
With our usual sample data values, however, relation PP, and hence the projection of relation PP on {PNO}, are both empty (because there aren’t any purple parts), and the foregoing expression therefore returns the supplier numbers S1, S2, S3, and S4. But if there aren’t any purple parts, then every supplier supplies all of them (see the discussion of empty ranges in the previous chapter)—even supplier S5, who supplies no parts at all. And the foregoing division can’t possibly return supplier number S5, because it extracts supplier numbers from SP instead of S, and supplier S5 isn’t currently represented in SP. So the informal characterization of that division as “Get supplier numbers for suppliers who supply all purple parts” is incorrect; it should be, rather, “Get supplier numbers for suppliers who supply at least one part and also supply all purple parts.” As this example demonstrates, therefore (and to repeat something I said in Chapter 7), the divide operator doesn’t really solve the problem it was originally, and explicitly, intended to solve.
[158] It’s worth pointing out in passing that the tactic of introducing names for subexpressions is reminiscent, somewhat, of the use of WITH in simplifying complex expressions as discussed in Chapter 6. But there’s a difference: For WITH, the subexpressions in question are required to be closed, whereas no such requirement applies in the present context. Indeed, all we’re doing in the present context is, in effect, simple text substitution, which is not what happens with WITH.