Inspecting databases

Once you have inspected the active database connections, you can dig deeper and inspect database-level statistics. pg_stat_database will return one line per database inside your PostgreSQL instance.

This is what you will find there:

test=# d pg_stat_database
View "pg_catalog.pg_stat_database"
Column | Type | Collation | Nullable | Default
-----------------------+--------------------------+-----------+----------+---------
datid | oid | | |
datname | name | | |
numbackends | integer | | |
xact_commit | bigint | | |
xact_rollback | bigint | | |
blks_read | bigint | | |
blks_hit | bigint | | |
tup_returned | bigint | | |
tup_fetched | bigint | | |
tup_inserted | bigint | | |
tup_updated | bigint | | |
tup_deleted | bigint | | |
conflicts | bigint | | |
temp_files | bigint | | |
temp_bytes | bigint | | |
deadlocks | bigint | | |
checksum_failures | bigint | | |
checksum_last_failure | timestamp with time zone | | |
blk_read_time | double precision | | |
blk_write_time | double precision | | |
stats_reset | timestamp with time zone | | |

Next to the database ID and the database name is a column called numbackends, which shows the number of database connections that are currently open.

Then, there's xact_commit and xact_rollback. These two columns indicate whether your application tends to commit or roll back. blks_hit and blks_read will tell you about cache hits and cache misses. When inspecting these two columns, keep in mind that we are mostly talking about shared buffer hits and shared buffer misses. At the database level, there is no reasonable way to distinguish between filesystem cache hits and real disk hits. At Cybertec (https://www.cybertec-postgresql.com), we like to see if there are disk wait and cache misses at the same time in pg_stat_database to get an idea of what really goes on in the system.

The tup_ columns will tell you whether there is a lot of reading or a lot of writing going on in your system.

Then, we have temp_files and temp_bytes. These two columns are of incredible importance because they will tell you whether your database has to write temporary files to disk, which will inevitably slow down operations. What can be the reasons for high temporary file usage? The major reasons are as follows:

  • Poor settings: If your work_mem settings are too low, there is no way to do anything in RAM, and therefore PostgreSQL will go to disk.
  • Stupid operations: It happens quite frequently that people torture their system with fairly expensive and pointless queries. If you see many temporary files on an OLTP system, consider checking for expensive queries.
  • Indexing and other administrative tasks: Once in a while, indexes may be created or people may run DDLs. These operations can lead to temporary file I/O but are not necessarily considered a problem (in many cases).

In short, temporary files can occur, even if your system is perfectly fine. However, it definitely makes sense to keep an eye on them and ensure that temporary files are not needed frequently.

Finally, there are two more important fields: blk_read_time and blk_write_time. By default, these two fields are empty and no data is collected. The idea behind these fields is to give you a way of seeing how much time was spent on I/O. The reason these fields are empty is that track_io_timing is off by default. This is for a good reason. Imagine that you want to check how long it takes to read 1 million blocks. To do that, you have to call the time function in your C library twice, which leads to 2 million additional function calls just to read 8 GB of data. It really depends on the speed of your system as to whether this will lead to a lot of overhead or not.

Fortunately, there is a tool that helps you determine how expensive the timing is, as shown in the following code block:

[hs@zenbook ~]$ pg_test_timing 
Testing timing overhead for 3 seconds. 
Per loop time including overhead: 23.16  nsec 
Histogram of timing durations: 

< usec % of total count
1 97.70300 126549189
2 2.29506 2972668
4 0.00024 317
8 0.00008 101
16 0.00160 2072
32 0.00000 5
64 0.00000 6
128 0.00000 4
256 0.00000 0
512 0.00000 0
1024 0.00000 4
2048 0.00000 2

In my case, the overhead of turning track_io_timing on for a session in the postgresql.conf file is around 23 nanoseconds, which is fine. Professional high-end servers can provide you with numbers as low as 14 nanoseconds, while really bad virtualization can return values of up to 1,400 nanoseconds or even 1,900 nanoseconds. If you are using a cloud service, you can expect around 100-120 nanoseconds (in most cases). If you are ever confronted with four-digit values, measuring the I/O timing may lead to real measurable overhead, which will slow down your system. The general rule is this: on real hardware, timing is not an issue; on virtual systems, check it out before you turn it on.

It is also possible to turn things on selectively by using ALTER DATABASE, ALTER USER, and so on.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset