After inspecting active queries, checking for I/O problems, and locking, it might be interesting to see which queries are actually causing most of the trouble. Without knowing the actual time-consuming queries, it is pretty hard to improve things in the long run. Fortunately, PostgreSQL provides a module called pg_stat_statements
that is capable of tracking queries and providing administrators with statistical information about those queries.
To use this module, it has to be enabled:
test=# CREATE EXTENSION pg_stat_statements; CREATE EXTENSION
Note that the module has to be enabled inside a database. The system views created by this module will only live in this database for now.
However, we are not done yet:
test=# SELECT * FROM pg_stat_statements; ERROR: pg_stat_statements must be loaded via shared_preload_libraries
The module is actually loaded when the postmaster starts up. It cannot be loaded on the fly because the information created must survive the disconnection.
To load the module at startup, change shared_preload_libraries
in postgresql.conf
to this:
shared_preload_libraries = 'pg_stat_statements'
Then the database has to be restarted. Now the system will automatically collect all of the information and present it in a system view:
test=# d pg_stat_statements View "public.pg_stat_statements" Column | Type | Modifiers ---------------------+------------------+----------- userid | oid | dbid | oid | queryid | bigint | query | text | calls | bigint | total_time | double precision | rows | bigint | shared_blks_hit | bigint | shared_blks_read | bigint | shared_blks_dirtied | bigint | shared_blks_written | bigint | local_blks_hit | bigint | local_blks_read | bigint | local_blks_dirtied | bigint | local_blks_written | bigint | temp_blks_read | bigint | temp_blks_written | bigint | blk_read_time | double precision | blk_write_time | double precision |
For each database (dbid
), we will see the query and the number of times it was called (calls
). In addition to that, there is the total time used up by each query, and also the total number of rows returned (rows
). In many cases, this information is enough:
SELECT query, total_time, calls FROM pg_stat_statements ORDER BY 2 DESC;
This simple query will reveal the most expensive queries and their total runtime, but there is more. The next couple of columns will tell us about the I/O behavior of the query that is about to be inspected. It is possible to detect issues related to shared buffers as well as the behavior when it comes to local buffers allocated by the backend itself.
Then there are temp_blks_read
and temp_blks_written
. These two fields will tell us whether temporary data has to be written to disk or not.
Finally, there is information about I/O timing. Again, track_io_timing
has to be turned on to gather this information. It is possible to check for each query how much time it has spent on input as well as output.
Many people ask about the overhead of this module. It has turned out that this is hard to measure. It is virtually zero. There is no need to be afraid of serious performance degradations when the module is used.
In general, it is useful to always enable pg_stat_statements
to track slow queries. This gives you a valuable insight into what is happening on your server.