Zeroing out damaged pages

Once in a while, things do go wrong even if all precautions have been taken. A filesystem might lose some blocks here and there or a disk might simply lose a couple of sectors. The following might happen on the PostgreSQL side in this case:

test=# SELECT count(*) FROM t_test;
ERROR:  invalid page in block 535 of relation 
    base/16384/16436

If a block (or a couple of blocks) has fallen victim to a problem in the filesystem, PostgreSQL will error out and tell the end user that the query cannot be completed anymore.

In the scenario outlined here, you can be certain of one thing: some data has been lost in the storage system. It is important to point out that loss of data is virtually never caused by PostgreSQL itself. In most cases, we are talking about broken hardware, broken filesystems, or some other memory-related problem that has spread its blessings to your data files.

If storage-related problems arise, the general rule is, "don't touch stuff; back up stuff!" Before anything is done that might make the problem even worse, try to create a filesystem snapshot, a tar archive, or any kind of backup. In many cases, the good old "dd" can do a good job, and it can definitely be worthwhile to attempt to create a clone of the underlying device on your system.

Once everything has been rescued on the system, the next step can be performed. PostgreSQL provides a feature that allows you to zero out blocks that are not faulty. In the preceding example, the query could not have been completed because the file contained some bad data. By setting zero_damaged_pages To on, it is possible to tell PostgreSQL to zero out all broken blocks. Of course, this does not rescue lost data, but it at least helps us retrieve something from the system:

test=# SET zero_damaged_pages TO on;
SET

Once the value has been set, the query can be executed again:

test=# SELECT count(*) FROM t_test;
WARNING:  invalid page in block 535 of relation
      base/16384/16436; zeroing out page
   count 
-----------
     42342
(1 row)

This time, warnings will pop up for each broken block. Luckily, just one block is broken in this case. It is quickly replaced by some binary zeroes, and the query moves on to the next (proper) blocks.

Of course, zero_damaged_pages is dangerous because it eliminates bad data. So what choice do you have? When something is so broken, it is important to rescue at least what is there.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset