Inspecting buffer usage

However, the plan itself is not the only thing that can cause issues. In many cases, dangerous things are hidden on some other level. Memory and caching can lead to undesired behavior, which is often hard to understand for end users who are not trained to see the problem that was described in this section.

Here is an example that depicts the random insertion of data into the table. The query will generate some randomly ordered data and add it to a new table:

test=# CREATE TABLE t_random AS 
          SELECT * FROM generate_series(1, 10000000) AS id ORDER BY random();
SELECT 10000000
test=# ANALYZE t_random ;
ANALYZE

Now, we have generated a simple table containing 10 million rows and created the optimizer statistics.

In the next step, a simple query that's retrieving only a handful of rows is executed:

test=# EXPLAIN (analyze true, buffers true, costs true, timing true)   
       SELECT * FROM t_random WHERE id < 1000;
                           QUERY PLAN 
----------------------------------------------------------------- 
 Seq Scan on t_random (cost=0.00..169248.60 rows=1000 width=4) 
   (actual time=1.068..685.410 rows=999 loops=1) 
   Filter: (id < 1000) 
   Rows Removed by Filter: 9999001 
 Buffers: shared hit=2112 read=42136 
 Planning time: 0.035 ms 
 Execution time: 685.551 ms 
(6 rows)

Before inspecting the data, make sure that you have executed the query twice. Of course, it makes sense to use an index here. However, in this query, PostgreSQL has found 2112 buffers inside the cache and 421136 buffers that had to be taken from the operating system. Now, there are two things that can happen. If you are lucky, the operating system lands a couple of cache hits and the query is fast. If the filesystem cache is not lucky, those blocks have to be taken from disk. This may seem obvious; it can, however, lead to wild swings in the execution time. A query that runs entirely in the cache can be 100 times faster than a query that has to slowly collect random blocks from disk.

Let's try to outline this problem by using a simple example. Suppose we have a phone system that stores 10 billion rows (which is not uncommon for large phone carriers). Data flows in at a rapid rate, and users want to query this data. If you have 10 billion rows, the data will only partially fit into memory and therefore a lot of stuff will naturally end up coming from the disk.

Let's run a simple query to learn how PostgreSQL looks up a phone number:

SELECT * FROM data WHERE phone_number = '+12345678';

Even if you are on the phone, your data will be spread all over the place. If you end a phone call just to start the next call, thousands of people will do the same, so the odds that two of your calls will end up in the very same 8,000 block is close to zero. Just imagine for the time being that there are 100,000 calls going on at the same time. On disk, data will be randomly distributed. If your phone number shows up often, it means that for each row, at least one block has to be fetched from disk (assuming there's a very low cache hit rate). Let's say 5,000 rows are returned. Assuming that you have to go to disk 5,000 times, it leads to something such as 5,000 x 5 milliseconds = 25 seconds of execution time. Note that the execution time of this query may vary between milliseconds and, say, 30 seconds, depending on how much has been cached by the operating system or by PostgreSQL.

Keep in mind that every server restart will naturally clean out the PostgreSQL and filesystem caches, which can lead to real trouble after a node failure.

Table of Contents for Inspecting buffer usage

Create new playlist

Sign In

Sign Up

Table of Contents for
Inspecting buffer usage