Making use of parallel queries

Starting with version 9.6 PostgreSQL supports parallel queries. The support for parallelism has been improved gradually over time and version 11 has added even more functionality to this important feature. In this section, we will take a look at how parallelism works and what can be done to speed up things.

Before digging into the details it is necessary to create some sample data as shown in the following section:

test=# CREATE TABLE t_parallel AS 
 SELECT * FROM generate_series(1, 25000000) AS id;
SELECT 25000000

After loading the initial data, we can run our first parallel query. A simple count will show, what a parallel query looks like in general:

test=# explain SELECT count(*) FROM t_parallel;
                                 QUERY PLAN 
------------------------------------------------------------------------------------
 Finalize Aggregate (cost=258537.40..258537.41 rows=1 width=8)
 -> Gather (cost=258537.19..258537.40 rows=2 width=8)
    Workers Planned: 2
    -> Partial Aggregate (cost=257537.19..257537.20 rows=1 width=8)

       -> Parallel Seq Scan on t_parallel (cost=0.00..228153.75 rows=11753375 width=0)
(5 rows)

Let us take a detailed look at the execution plan of the query. First PostgreSQL performs a parallel sequential scan. This imples that PostgreSQL will use more than 1 CPU to process the table (block by block) and it will create partial aggregates. The job of the gather node is to collect the data and to pass it on to do the final aggregation. The gather node is therefore the end of parallelism. It is important to mention that parallelism is (currently) never nested. There can never be a gather node inside a gather node. In my example PostgreSQL has decided on two worker processes. Why is that?

Let us consider the following variable:

test=# SHOW max_parallel_workers_per_gather;
 max_parallel_workers_per_gather 
---------------------------------
 2
(1 row)

max_parallel_workers_per_gather limits the number of worker processes allowed below gather node to two. The important thing is: if a table is small, it will never use parallelism. The size of a table has to be at least 8 MB as defined by the following config setting:

test=# SHOW min_parallel_table_scan_size;
 min_parallel_table_scan_size 
------------------------------
 8MB
(1 row)

The rule for parallelism now is: The size of the table has to triple in order for PostgreSQL to add one more worker process. In other words: To get four additional workers you need at least 81 times as much data. That makes sense because of the size of your database goes up 100 times, the storage system is usually not 100 times faster. The number of useful cores is therefore somewhat limited.

However, our table is fairly large:

test=# d+
 List of relations
 Schema | Name        | Type  | Owner | Size   | Description 
--------+-------------+-------+-------+--------+-------------
 public | t_parallel  | table | hs    | 864 MB | 
(1 row)

In my example max_parallel_workers_per_gather limits the number of cores. If we change the setting, PostgreSQL will decide on more cores:

test=# SET max_parallel_workers_per_gather TO 10;
SET
test=# explain SELECT count(*) FROM t_parallel;
                                   QUERY PLAN 
------------------------------------------------------------------------------------
 Finalize Aggregate (cost=174120.82..174120.83 rows=1 width=8)
  -> Gather (cost=174120.30..174120.81 rows=5 width=8)
     Workers Planned: 5
     -> Partial Aggregate (cost=173120.30..173120.31 rows=1 width=8)
     -> Parallel Seq Scan on t_parallel (cost=0.00..160620.24 rows=5000024 width=0)
(5 rows)

In this case we get 5 workers just (as expected).

However, there are cases in which you want the number of cores used for a certain table is a lot higher. Just imagine a 200 GB database, 1 TB or RAM and only a single user. This user could use up all CPUs without harming anybody else. ALTER TABLE can be used to overrule what we have just discussed:

test=# ALTER TABLE t_parallel SET (parallel_workers = 9);
ALTER TABLE

If you want to overrule the x3 rule to determine the number of desired CPUs, you can use ALTER TABLE to hardware the number of CPUs explicitly. Note, max_parallel_workers_per_gather will still be effective and serve as the upper limit.

If you look at the plan, you will see that number of cores will actually be considered:


test=# explain SELECT count(*) FROM t_parallel;
                                   QUERY PLAN 
------------------------------------------------------------------------------------
 Finalize Aggregate (cost=146343.32..146343.33 rows=1 width=8)
   -> Gather (cost=146342.39..146343.30 rows=9 width=8)
      Workers Planned: 9
      -> Partial Aggregate (cost=145342.39..145342.40 rows=1 width=8)
      -> Parallel Seq Scan on t_parallel (cost=0.00..138397.91 rows=2777791 width=0)
(5 rows)

However, that does not mean that those cores are actually used as well:

test=# explain analyze SELECT count(*) FROM t_parallel;
                                 QUERY PLAN 
-------------------------------------------------------------------------------------
 Finalize Aggregate (cost=146343.32..146343.33 rows=1 width=8) 
                    (actual time=1209.441..1209.441 rows=1 loops=1)
    -> Gather (cost=146342.39..146343.30 rows=9 width=8) 
              (actual time=1209.432..1209.772 rows=8 loops=1)
       Workers Planned: 9
       Workers Launched: 7
       -> Partial Aggregate (cost=145342.39..145342.40 rows=1 width=8) 
                            (actual time=1200.437..1200.438 rows=1 loops=8)
          -> Parallel Seq Scan on t_parallel (cost=0.00..138397.91 ...) 
                            (actual time=0.038..817.430 rows=3125000 loops=8)
 Planning Time: 0.091 ms
 Execution Time: 1209.827 ms
(8 rows)

As you can see only seven cores were launch—despite the fact that nine processes were planned. What is the reason? In this example two more variables come into play:

test=# SHOW max_worker_processes;
 max_worker_processes 
----------------------
 8
(1 row)

test=# SHOW max_parallel_workers;
 max_parallel_workers 
----------------------
 8
(1 row)

The first process tells PostgreSQL, how many worker processes are generally available. max_parallel_workers says, how many workers are available for parallel queries. Why are there two parameters? Background processes are not only used by the parallel query infrastructure—they can also be used for other purposes and therefore developers have decided to use two parameters.

In general we at Cybertec (https://www.cybertec-postgresql.com) tend to set max_worker_processes to the number of CPUs in the server. It seems that using more is usually not beneficial.

Table of Contents for Making use of parallel queries

Create new playlist

Sign In

Sign Up

Table of Contents for
Making use of parallel queries