Merge joins

Finally, there is the merge join. The idea here is to use sorted lists to join the results. If both sides of the join are sorted, the system can just take the rows from the top and see if they match and return them. The main requirement here is that the lists are sorted. Here is a sample plan:

Merge join 
  Sort table 1 
     Sequential scan table 1 
  Sort table 2 
     Sequential scan table 2

To join these two tables (table 1 and table 2), data has to be provided in a sorted order. In many cases, PostgreSQL will just sort the data. However, there are other options we can use to provide the join with sorted data. One way is to consult an index, as shown in the following example:

Merge join
    Index scan table 1
    Index scan table 2

One side of the join or both sides can use sorted data coming from lower levels of the plan. If the table is accessed directly, an index is the obvious choice for this, but only if the returned result set is significantly smaller than the entire table. Otherwise, we encounter almost double the overhead because we have to read the entire index and then the entire table. If the result set is a large portion of the table, a sequential scan is more efficient, especially if it is being accessed in the primary key order.

The beauty of a merge join is that it can handle a lot of data. The downside is that data has to be sorted or taken from an index at some point.

Sorting is O(n * log(n)). Therefore, sorting 300 million rows to perform the join is not attractive either.

Note that, since the introduction of PostgreSQL 10.0, all the join options described here are also available in a parallel version. Therefore, the optimizer will not just consider those standard join options, but also evaluate whether it makes sense to perform parallel queries or not.

Table of Contents for Merge joins

Create new playlist

Sign In

Sign Up

Table of Contents for
Merge joins