Managing corrupted data pages

PostgreSQL is a very stable database system. It protects data as much as possible, and it has proven its worth over the years. However, PostgreSQL relies on solid hardware and a filesystem that is working properly . If storage breaks, so will PostgreSQL—there isn't much that we can do about it, apart from adding replicas to make things more fail-safe.

Once in a while, it happens that the filesystem or the disk fails. But in many cases, the entire thing will not go south; just a couple of blocks become corrupted for whatever reason. Recently, we have seen this happening in virtual environments. Some virtual machines don't flush to the disk by default, which means that PostgreSQL cannot rely on things being written to the disk. This kind of behavior can lead to random problems that are hard to predict.

When a block cannot be read anymore, you might face an error message such as the following:

"could not read block %u in file "%s": %m"

The query that you are about to run will error out and stop working. Fortunately, PostgreSQL has a means of dealing with these things:

test=# SET zero_damaged_pages TO on; 
SET
test=# SHOW zero_damaged_pages;
 zero_damaged_pages
-------------------- 
 on
(1 row)

The zero_damaged_pages variable is a config variable that allows us to deal with broken pages. Instead of throwing an error, PostgreSQL will take the block and simply fill it with zeros.

Note that this will definitely lead to data loss. But remember, the data was broken or lost before anyway, so this is simply a way to deal with the corruption that is caused by bad things happening in our storage system.

I would advise everybody to handle the zero_damaged_pages variable with care—be aware of what you are doing when you call it.

Table of Contents for Managing corrupted data pages

Create new playlist

Sign In

Sign Up

Table of Contents for
Managing corrupted data pages