As promised earlier, there’s a little more I want to say about the GROUP BY and HAVING clauses. Consider this query: “For each part supplied by no more than two suppliers, get the part number and city and the total quantity supplied of that part.” Here’s a possible logical (relational calculus) formulation:
{ PX.PNO , PX.CITY , TPQ := SUM ( SPX.QTY WHERE SPX.PNO = PX.PNO , QTY ) } WHERE COUNT ( SPY WHERE SPY.PNO = PX.PNO ) ≤ 2
SQL formulation:
SELECT PX.PNO , PX.CITY , ( SELECT COALESCE ( SUM ( SPX.QTY ) , 0 ) FROM SP AS SPX WHERE SPX.PNO = PX.PNO ) AS TPQ FROM P AS PX WHERE ( SELECT COUNT ( * ) FROM SP AS SPY WHERE SPY.PNO = PX.PNO ) <= 2
Result:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
As the opening to this section suggests, however, the interesting thing about this example is that it’s one that might appear to be more easily—certainly more succinctly—expressed using GROUP BY and HAVING, thus:
SELECT PX.PNO , PX.CITY , COALESCE ( SUM ( SPX.QTY ) , 0 ) AS TPQ FROM P AS PX , SP AS SPX WHERE PX.PNO = SPX.PNO GROUP BY PX.PNO HAVING COUNT ( * ) <= 2
But:
In that GROUP BY / HAVING formulation, is the appearance of PX.CITY in the SELECT item commalist legal? Answer: Yes, it is—at least according to the standard—though it used not to be. (I did mention this point in Chapter 7, but I’ll repeat it here for convenience.) Let S be a SELECT expression with a GROUP BY clause, and let column C be referenced in the SELECT clause of S. In earlier versions of SQL, then, C had to be one of the grouping columns (or be referenced inside a “set function” invocation, but let’s agree to ignore that possibility for simplicity). In the current version, by contrast, it’s required only that C—or {C}, rather—be functionally dependent on the grouping columns.
Do you think the GROUP BY / HAVING formulation is easier to understand? (Debatable.)
Does the GROUP BY / HAVING formulation work correctly for parts that aren’t supplied by any suppliers at all? (No, it doesn’t.)
Are the formulations equivalent if the database permits nulls? Or duplicates?
As a further exercise, give SQL formulations (a) using GROUP BY and HAVING, (b) not using GROUP BY and HAVING, for the following queries:
Get supplier numbers for suppliers who supply N different parts for some N > 3.
Get supplier numbers for suppliers who supply N different parts for some N < 4.
What do you conclude from this exercise?