Managing corrupted data pages

PostgreSQL is a very stable database system. It protects data as much as possible and has proven its worth over the years. However, PostgreSQL relies on hardware and a properly working filesystem. If storage breaks, so will PostgreSQL—there is not much we can do about it apart from adding replicas to make things more fail-safe.

Once in a while, it happens that the filesystem or the disk fails. But in many cases, the entire thing will not go south; just a couple of blocks are corrupted for whatever reason. Recently, we have seen that happening in virtual environments. Some virtual machines don't flush to the disk by default, which means that PostgreSQL cannot rely on things being written to the disk. This kind of behavior can lead to random problems which are hard to predict.

When a block cannot be read anymore, you might face an error message such as the following one:

"could not read block %u in file "%s": %m"

The query you are about to run will error out and stop working.

Fortunately, PostgreSQL has a means of dealing with these things:

test=# SET zero_damaged_pages TO on; 
SET
test=# SHOW zero_damaged_pages;
 zero_damaged_pages
-------------------- 
 on
(1 row)

The zero_damaged_pages variable is a config variable that allows us to deal with broken pages. Instead of throwing an error, PostgreSQL will take the block and simply fill it with zeros.

Note that this will definitely lead to data loss. But remember, data was broken or lost before anyway, so this is simply a way to deal with corruption caused by bad things happening in our storage system.

I would advise everybody to handle the zero_damaged_pages variable with care—be aware of what you are doing when you call it.

Table of Contents for Managing corrupted data pages

Create new playlist

Sign In

Sign Up

Table of Contents for
Managing corrupted data pages