Making use of parallel queries

Starting with version 9.6, PostgreSQL supports parallel queries. This support for parallelism has been improved gradually over time, and version 11 has added even more functionality to this important feature. In this section, we will take a look at how parallelism works and what can be done to speed up things.

Before digging into the details, it is necessary to create some sample data, as follows:

test=# CREATE TABLE t_parallel AS 
 SELECT * FROM generate_series(1, 25000000) AS id;
SELECT 25000000

After loading the initial data, we can run our first parallel query. A simple count will show what a parallel query looks like in general:

test=# explain SELECT count(*) FROM t_parallel;
 QUERY PLAN 
------------------------------------------------------------------------------------
 Finalize Aggregate (cost=258537.40..258537.41 rows=1 width=8)
 -> Gather (cost=258537.19..258537.40 rows=2 width=8)
 Workers Planned: 2
 -> Partial Aggregate (cost=257537.19..257537.20 rows=1 width=8)
 -> Parallel Seq Scan on t_parallel (cost=0.00..228153.75 rows=11753375 width=0)
(5 rows)

Let's take a detailed look at the execution plan of the query. First, PostgreSQL performs a parallel sequential scan. This implies that PostgreSQL will use more than 1 CPU to process the table (block by block) and it will create partial aggregates. The job of the gather node is to collect the data and to pass it on to do the final aggregation. The gather node is the end of parallelism. It is important to mention that parallelism is (currently) never nested. There can never be a gather node inside another gather node. In this example, PostgreSQL has decided on two worker processes. Why is that?

Let's consider the following variable:

test=# SHOW max_parallel_workers_per_gather;
 max_parallel_workers_per_gather 
---------------------------------
 2
(1 row)

max_parallel_workers_per_gather limits the number of worker processes allowed below the gather node to two. The important thing is this: if a table is small, it will never use parallelism. The size of a table has to be at least 8 MB, as defined by the following config setting:

test=# SHOW min_parallel_table_scan_size;
 min_parallel_table_scan_size 
------------------------------
 8MB
(1 row)

Now, the rule for parallelism is as follows: the size of the table has to triple in order for PostgreSQL to add one more worker process. In other words, to get four additional workers, you need at least 81 times as much data. This makes sense because the size of your database goes up 100 times, and the storage system is usually not 100 times faster. Therefore, the number of useful cores is somewhat limited.

However, our table is fairly large:

test=# d+
 List of relations
 Schema | Name        | Type  | Owner | Size   | Description 
--------+-------------+-------+-------+--------+-------------
 public | t_parallel  | table | hs    | 864 MB | 
(1 row)

In this example, max_parallel_workers_per_gather limits the number of cores. If we change this setting, PostgreSQL will decide on more cores:

test=# SET max_parallel_workers_per_gather TO 10;
SET
test=# explain SELECT count(*) FROM t_parallel;
                                           QUERY PLAN 
------------------------------------------------------------------------------------------------
 Finalize Aggregate (cost=174119.72..174119.73 rows=1 width=8)
   -> Gather (cost=174119.20..174119.71 rows=5 width=8)
         Workers Planned: 5
         -> Partial Aggregate (cost=173119.20..173119.21 rows=1 width=8)
               -> Parallel Seq Scan on t_parallel (cost=0.00..160619.36 rows=4999936 width=0)
 JIT:
   Functions: 4
   Options: Inlining false, Optimization false, Expressions true, Deforming true
(8 rows)

In this case, we get 5 workers (just as expected).

However, there are cases in which you'll want the number of cores being used for a certain table to be a lot higher. Just imagine a 200 GB database, 1 TB of RAM, and only a single user. This user could use up all the CPU without harming anybody else. ALTER TABLE can be used to overrule what we have just discussed:

test=# ALTER TABLE t_parallel SET (parallel_workers = 9);
ALTER TABLE

If you want to overrule the x3 rule to determine the number of desired CPUs, you can use ALTER TABLE to hardware the number of CPUs explicitly.

Note that max_parallel_workers_per_gather will still be effective and serve as the upper limit.

If you look at the plan, you will see that the number of cores will actually be considered:


test=# explain SELECT count(*) FROM t_parallel;
                                           QUERY PLAN 
------------------------------------------------------------------------------------------------
 Finalize Aggregate (cost=146342.71..146342.72 rows=1 width=8)
   -> Gather (cost=146341.77..146342.68 rows=9 width=8)
         Workers Planned: 9
         -> Partial Aggregate (cost=145341.77..145341.79 rows=1 width=8)
               -> Parallel Seq Scan on t_parallel (cost=0.00..138397.42 rows=2777742 width=0)
 JIT:
   Functions: 4
   Options: Inlining false, Optimization false, Expressions true, Deforming true
(8 rows)

Time: 2.454 ms

However, that doesn't mean that those cores are actually used as well:

test=# explain analyze SELECT count(*) FROM t_parallel;
                                                QUERY PLAN 
--------------------------------------------------------------------------------------------------
 Finalize Aggregate (cost=146342.71..146342.72 rows=1 width=8) 
                    (actual time=1164.445..1164.445 rows=1 loops=1)
   -> Gather (cost=146341.77..146342.68 rows=9 width=8) 
             (actual time=1164.427..1164.533 rows=8 loops=1)
         Workers Planned: 9
         Workers Launched: 7
         -> Partial Aggregate (cost=145341.77..145341.79 rows=1 width=8) 
                              (actual time=1128.530..1128.530 rows=1 loops=8)
               -> Parallel Seq Scan on t_parallel (cost=0.00..138397.42 rows=2777742 width=0) 
                          (actual time=0.218..1023.449 rows=3125000 loops=8)
 Planning Time: 0.028 ms
 JIT:
   Functions: 18
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 1.703 ms, Inlining 0.000 ms, Optimization 1.119 ms, 
       Emission 14.707 ms, Total 17.529 ms
 Execution Time: 1164.922 ms
(12 rows)

As you can see, only seven cores were launched, despite the fact that nine processes were planned. What's the reason for this? In this example, two more variables come into play:

test=# SHOW max_worker_processes;
 max_worker_processes 
----------------------
 8
(1 row)

test=# SHOW max_parallel_workers;
 max_parallel_workers 
----------------------
 8
(1 row)

The first process tells PostgreSQL how many worker processes are generally available. max_parallel_workers states how many workers are available for parallel queries. Why are there two parameters? Background processes aren't only used by the parallel query infrastructure – they can also be used for other purposes and therefore most developers decide to use two parameters.

In general, we at Cybertec (https://www.cybertec-postgresql.com) tend to set max_worker_processes to the number of CPUs in the server. It seems that using more is usually not beneficial.

Table of Contents for Making use of parallel queries

Create new playlist

Sign In

Sign Up

Table of Contents for
Making use of parallel queries