Set operations combine the results of more than one query into a single result set. They include UNION, INTERSECT, and EXCEPT. PostgreSQL implements all of them and offers many important optimizations to speed them up.
The planner is able to push restrictions down into the set operation, opening the door for fancy indexing and speedups in general. Let's take a look at the following query, which shows us how this works:
test=# EXPLAIN SELECT *
FROM
(
SELECT aid AS xid
FROM a
UNION ALL
SELECT bid FROM b
) AS y WHERE xid = 3; QUERY PLAN
----------------------------------------------------------------
Append (cost=0.29..12.89 rows=2 width=4)
-> Index Only Scan using idx_a on a
(cost=0.29..8.30 rows=1 width=4)
Index Cond: (aid = 3)
-> Index Only Scan using idx_b on b
(cost=0.57..4.59 rows=1 width=4)
Index Cond: (bid = 3)
(5 rows)
What you can see here is that two relations are added to each other. The trouble is that the only restriction is outside the subselect. However, PostgreSQL figures out that the filter can be pushed further down the plan. Therefore, xid = 3 is attached to aid and bid, opening up the option for us to use indexes on both tables. By avoiding the sequential scan on both tables, the query will run a lot faster.
The UNION clause is different as it will filter out duplicates. The following plan shows how that works:
test=# EXPLAIN SELECT *
FROM
(
SELECT aid AS xid
FROM a
UNION SELECT bid
FROM b
) AS y
WHERE xid = 3; QUERY PLAN
----------------------------------------------------------------
Unique (cost=12.92..12.93 rows=2 width=4)
-> Sort (cost=12.92..12.93 rows=2 width=4)
Sort Key: a.aid
-> Append (cost=0.29..12.91 rows=2 width=4)
-> Index Only Scan using idx_a on a
(cost=0.29..8.30 rows=1 width=4)
Index Cond: (aid = 3)
-> Index Only Scan using idx_b on b
(cost=0.57..4.59 rows=1 width=4)
Index Cond: (bid = 3)
(8 rows)
PostgreSQL has to add a Sort node on top of the Append node to ensure that duplicates can be filtered later on.