Inspecting databases

Once you have inspected active database connections, you can dig deeper and inspect database-level statistics. pg_stat_database will return one line per database inside your PostgreSQL instance.

This is what you can find there:

test=# d pg_stat_database 
View "pg_catalog.pg_stat_database"
Column | Type | Collation | Nullable | Default
----------------+--------------------------+-----------+----------+---------
datid | oid | | |
datname | name | | |
numbackends | integer | | |
xact_commit | bigint | | |
xact_rollback | bigint | | |
blks_read | bigint | | |
blks_hit | bigint | | |
tup_returned | bigint | | |
tup_fetched | bigint | | |
tup_inserted | bigint | | |
tup_updated | bigint | | |
tup_deleted | bigint | | |
conflicts | bigint | | |
temp_files | bigint | | |
temp_bytes | bigint | | |
deadlocks | bigint | | |
blk_read_time | double precision | | |
blk_write_time | double precision | | |
stats_reset | timestamp with time zone | | |

Next to the database ID and the database name, there is a column called numbackends that shows the number of database connections that are currently open.

Then, there are xact_commit and xact_rollback. These two columns indicate whether your application tends to commit or roll back. blks_hit and blks_read will tell you about cache hits and cache misses. When inspecting these two columns, keep in mind that we are mostly talking about shared buffer hits and shared buffer misses. There is no reasonable way, on the database level, to distinguish filesystem cache hits and real disk hits. At Cybertec (https://www.cybertec-postgresql.com), we like to see if there are disk wait and cache misses at the same time in pg_stat_database to get an idea of what really goes on in the system.

The tup_ columns will tell you whether there is a lot of reading or a lot of writing going on in your system.

Then, we have temp_files and temp_bytes. These two columns are of incredible importance because they will tell you whether your database has to write temporary files to disk, which will inevitably slow down operations. What can be the reasons for high temporary file usage? The major reasons are as follows:

  • Poor settings: If your work_mem settings are too low, there is no way to do anything in RAM, and therefore PostgreSQL will go to disk.
  • Stupid operations: It happens quite frequently that people torture their system with fairly expensive, pointless queries. If you see many temporary files on an OLTP system, consider checking for expensive queries.
  • Indexing and other administrative tasks: Once in a while, indexes might be created or people might run DDLs. These operations can lead to temporary file I/O but are not necessarily considered a problem (in many cases).

In short, temporary files can occur. even if your system is perfectly fine. However, it definitely makes sense to keep an eye on them and ensure that temporary files are not needed frequently.

Finally, there are two more important fields: blk_read_time and blk_write_time. By default, these two fields are empty and no data is collected. The idea behind these fields is to give you a way of seeing how much time was spent on I/O. The reason these fields are empty is that track_io_timing is off by default. This is for good reason. Imagine that you want to check how long it takes to read 1 million blocks. To do that, you have to call the time function in your C library twice, which leads to 2 million additional function calls just to read 8 GB of data. It really depends on the speed of your system as to whether this will lead to a lot of overhead or not.

Fortunately, there is a tool that helps you determine how expensive the timing is:

[hs@zenbook ~]$ pg_test_timing 
Testing timing overhead for 3 seconds. 
Per loop time including overhead: 23.16  nsec 
Histogram of timing durations: 

< usec % of total count
1 97.70300 126549189
2 2.29506 2972668
4 0.00024 317
8 0.00008 101
16 0.00160 2072
32 0.00000 5
64 0.00000 6
128 0.00000 4
256 0.00000 0
512 0.00000 0
1024 0.00000 4
2048 0.00000 2

In my case, the overhead of turning track_io_timing on for a session or in the postgresql.conf file is around 23 nanoseconds, which is fine. Professional high-end servers can provide you with numbers as low as 14 nanoseconds, while really bad virtualization can return values up to 1,400 nanoseconds or even 1,900 nanoseconds. If you are using a cloud service, you can expect around 100-120 nanoseconds (in most cases). In case you are confronted with four digit values, measuring the I/O timing might surely lead to real measurable overhead, which will slow down your system. The general rule is this: on real hardware, timing is not an issue; on virtual systems, check it out before you turn it on.

It is also possible to turn things on selectively by using ALTER DATABASE, ALTER USER, or the like.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset