Configuring VACUUM and autovacuum

Back in the early days of PostgreSQL projects, people had to run VACUUM manually. Fortunately, those days are long gone. Nowadays, administrators can rely on a tool called autovacuum, which is part of the PostgreSQL server infrastructure. It automatically takes care of cleanup and works in the background. It wakes up once per minute (see autovacuum_naptime = 1 in postgresql.conf) and checks whether there is work to do. If there is work, autovacuum will fork up to three worker processes (see autovacuum_max_workers in postgresql.conf).

The main question is, when does autovacuum trigger the creation of a worker process?

Actually, the autovacuum process doesn't fork processes itself. Instead, it tells the main process to do so. This is done to avoid zombie processes in the case of failure and to improve robustness.

The answer to this question can, again, be found in postgresql.conf, as shown in the following code:

autovacuum_vacuum_threshold = 50  
autovacuum_analyze_threshold = 50  
autovacuum_vacuum_scale_factor = 0.2  
autovacuum_analyze_scale_factor = 0.1 

The autovacuum_vacuum_scale_factor command tells PostgreSQL that a table is worth vacuuming if 20% of its data has been changed. The trouble is that if a table consists of one row, one change is already 100%. It makes absolutely no sense to fork a complete process to clean up just one row. Therefore, autovacuum_vacuuum_threshold says that we need 20%, and this 20% must be at least 50 rows. Otherwise, VACUUM won't kick in. The same mechanism is used when it comes to the creation of optimizer statistics. We need 10% and at least 50 rows to justify new optimizer statistics. Ideally, autovacuum creates new statistics during a normal VACUUM to avoid unnecessary trips to the table.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset