Join pruning

PostgreSQL also provides an optimization called join pruning. The idea is to remove joins if they are not needed by the query. This can come in handy if queries are generated by some middleware or some ORM. If a join can be removed, it naturally speeds things up dramatically and leads to less overhead.

The question now is, how does join pruning work? Here is an example:

CREATE TABLE x (id int, PRIMARY KEY (id));
CREATE TABLE y (id int, PRIMARY KEY (id));

First of all, two tables are created. Make sure that both sides of the join condition are actually unique. Those constraints will be important in a minute.

Now, we can write a simple query:

test=# EXPLAIN SELECT * 
FROM x LEFT JOIN y ON (x.id = y.id)
WHERE x.id = 3; QUERY PLAN --------------------------------------------------------------------------- Nested Loop Left Join (cost=0.31..16.36 rows=1 width=8) Join Filter: (x.id = y.id) -> Index Only Scan using x_pkey on x
(cost=0.15..8.17 rows=1 width=4) Index Cond: (id = 3) -> Index Only Scan using y_pkey on y
(cost=0.15..8.17 rows=1 width=4) Index Cond: (id = 3) (6 rows)

As you can see, PostgreSQL will join those tables directly. So far, there are no surprises. However, the following query has been slightly modified. Instead of selecting all the columns, it only selects those columns on the left-hand side of the join:

test=# EXPLAIN SELECT x.* 
FROM x LEFT JOIN y ON (x.id = y.id)
WHERE x.id = 3; QUERY PLAN --------------------------------------------------------------------- Index Only Scan using x_pkey on x (cost=0.15..8.17 rows=1 width=4) Index Cond: (id = 3) (2 rows)

PostgreSQL will go for a direct inside scan and skip the join completely. There are two reasons why this is actually possible and logically correct:

  • No columns are selected from the right-hand side of the join; thus, looking those columns up doesn't buy us anything.
  • The right-hand side is unique, which means that joining cannot increase the number of rows due to duplicates on the right-hand side.

If joins can be pruned automatically, then the queries may be a magnitude faster. The beauty here is that an increase in speed can be achieved by just removing columns that may not be needed by the application anyway.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset