Enabling and disabling optimizer settings

So far, the most important optimizations performed by the planner have been discussed in detail. PostgreSQL has improved a lot over the years. Still, it can happen that something goes south and users have to convince the planner to do the right thing.

To modify plans, PostgreSQL offers a couple of runtime variables that will have a significant impact on planning. The idea is to give the end user the chance to make certain types of nodes in the plan more expensive than others. What does that mean in practice? Here is a simple plan:

test=# explain SELECT * 
FROM generate_series(1, 100) AS a,
generate_series(1, 100) AS b
WHERE a = b;
QUERY PLAN
-----------------------------------------------------------------
Merge Join (cost=119.66..199.66 rows=5000 width=8)
Merge Cond: (a.a = b.b)
-> Sort (cost=59.83..62.33 rows=1000 width=4)
Sort Key: a.a
-> Function Scan on generate_series a
(cost=0.00..10.00 rows=1000 width=4)
-> Sort (cost=59.83..62.33 rows=1000 width=4)
Sort Key: b.b
-> Function Scan on generate_series b
(cost=0.00..10.00 rows=1000 width=4)
(8 rows)

The plan shows that PostgreSQL reads the data from the function and sorts both results. Then, a merge join is performed.

However, what if a merge join is not the fastest way to run the query? In PostgreSQL, there is no way to put planner hints into comments as you could do in Oracle. Instead, you can ensure that certain operations are simply considered to be expensive. The SET enable_mergejoin TO off command will simply make merging too expensive:

test=# SET enable_mergejoin TO off;
SET
test=# explain SELECT *
FROM generate_series(1, 100) AS a,
generate_series(1, 100) AS b
WHERE a = b;
QUERY PLAN
-----------------------------------------------------------------
Hash Join (cost=22.50..210.00 rows=5000 width=8)
Hash Cond: (a.a = b.b)
-> Function Scan on generate_series a
(cost=0.00..10.00 rows=1000 width=4)
-> Hash (cost=10.00..10.00 rows=1000 width=4)
-> Function Scan on generate_series b
(cost=0.00..10.00 rows=1000 width=4)
(5 rows)

Because merging is too expensive, PostgreSQL decided to try a hash join. As you can see, the costs are a bit higher, but the plan is still taken as merging is not desired anymore.

What happens if hash joins are turned off as well?

test=# SET enable_hashjoin TO off;
SET
test=# explain SELECT *
FROM generate_series(1, 100) AS a,
generate_series(1, 100) AS b
WHERE a = b;
QUERY PLAN
-----------------------------------------------------------------
Nested Loop (cost=0.01..22510.01 rows=5000 width=8)
Join Filter: (a.a = b.b)
-> Function Scan on generate_series a
(cost=0.00..10.00 rows=1000 width=4)
-> Function Scan on generate_series b
(cost=0.00..10.00 rows=1000 width=4)
(4 rows)

PostgreSQL will again try something else and come up with a nested loop. The costs of a nested loop are already staggering, but the planner starts to run out of options.

What happens if nested loops are turned off as well?

test=# SET enable_nestloop TO off;
SET
test=# explain SELECT *
FROM generate_series(1, 100) AS a,
generate_series(1, 100) AS b
WHERE a = b;
QUERY PLAN
-----------------------------------------------------------------
Nested Loop (cost=10000000000.00..10000022510.00
rows=5000 width=8)
Join Filter: (a.a = b.b)
-> Function Scan on generate_series a
(cost=0.00..10.00 rows=1000 width=4)
-> Function Scan on generate_series b
(cost=0.00..10.00 rows=1000 width=4)
(4 rows)

PostgreSQL will still perform a nested loop. The important part here is that off does not really mean off—it just means treat it as a very expensive thing. This is important because otherwise, the query could not be performed.

What settings influence the planner? The following switches are available:

  • enable_bitmapscan = on
  • enable_hashagg = on
  • enable_hashjoin = on
  • enable_indexscan = on
  • enable_indexonlyscan = on
  • enable_material = on
  • enable_mergejoin = on
  • enable_nestloop = on
  • enable_seqscan = on
  • enable_sort = on
  • enable_tidscan = on

While these settings can definitely be beneficial, let's understand that these tweaks should be handled with care. They should only be used to speed up individual queries and not turn off things globally. Switching off options can turn against you fairly quickly and destroy performance. Therefore, it really makes sense to think twice before changing these parameters.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset