Speedup set operations

Set operations combine the results of more than one query into a single result set. They include UNION, INTERSECT, and EXCEPT. PostgreSQL implements all of them and offers many important optimizations to speed them up.

The planner is able to push restrictions down into the set operation, opening the door for fancy indexing and speedups in general. Let's take a look at the following query, which shows us how this works:

test=# EXPLAIN SELECT * 
FROM 
(
   SELECT aid AS xid 
   FROM a 
   UNION ALL 
   SELECT bid FROM b
) AS y 
WHERE xid = 3; 
                           QUERY PLAN 
---------------------------------------------------------------- 
 Append (cost=0.29..12.89 rows=2 width=4) 
   -> Index Only Scan using idx_a on a 
         (cost=0.29..8.30 rows=1 width=4) 
         Index Cond: (aid = 3) 
   -> Index Only Scan using idx_b on b 
         (cost=0.57..4.59 rows=1 width=4) 
         Index Cond: (bid = 3) 
(5 rows)

What you can see here is that two relations are added to each other. The trouble is that the only restriction is outside the subselect. However, PostgreSQL figures out that the filter can be pushed further down the plan. Therefore, xid = 3 is attached to aid and bid, opening up the option for us to use indexes on both tables. By avoiding the sequential scan on both tables, the query will run a lot faster.

Note that there is a distinction between the UNION clause and the UNION ALL clause. The UNION ALL clause will just blindly append the data and deliver the results of both tables.

The UNION clause is different as it will filter out duplicates. The following plan shows how that works:

test=# EXPLAIN SELECT * 
 FROM 
 (
     SELECT aid AS xid 
     FROM a 
     UNION SELECT bid 
     FROM b
 ) AS y 
WHERE xid = 3; 
                               QUERY PLAN 
---------------------------------------------------------------- 
 Unique (cost=12.92..12.93 rows=2 width=4) 
   -> Sort (cost=12.92..12.93 rows=2 width=4) 
         Sort Key: a.aid 
         -> Append (cost=0.29..12.91 rows=2 width=4) 
               -> Index Only Scan using idx_a on a 
                     (cost=0.29..8.30 rows=1 width=4) 
                     Index Cond: (aid = 3) 
               -> Index Only Scan using idx_b on b 
                     (cost=0.57..4.59 rows=1 width=4) 
                     Index Cond: (bid = 3) 
(8 rows)

PostgreSQL has to add a Sort node on top of the Append node to ensure that duplicates can be filtered later on.

Many developers who are not fully aware of the difference between the UNION clause and the UNION ALL clause complain about bad performance because they are unaware that PostgreSQL has to filter out duplicates, which is especially painful in the case of large datasets.

Table of Contents for Speedup set operations

Create new playlist

Sign In

Sign Up

Table of Contents for
Speedup set operations