Adjusting durability

In this chapter, we have seen that data is either replicated synchronously or asynchronously. However, this is not a global thing. To ensure good performance, PostgreSQL allows us to configure things in a very flexible way. It is possible to replicate everything synchronously or asynchronously, but in many cases we might want to do things in a more fine-grained way. This is exactly when the synchronous_commit setting is needed.

Assuming that synchronous replication, the application_name setting in the recovery.conf file, as well as the synchronous_standby_names setting in the postgresql.conf file have been configured, the synchronous_commit setting will offer the following options:

  • off: This is basically a asynchronous replication. WAL won't be flushed to disk on the master instantly and the master won't wait for the slave to write everything to disk. If the master fails, some data might be lost (up to three times—wal_writer_delay).
  • local: The transaction log is flushed to disk on commit on the master. However, the master does not wait on the slave (asynchronous replication).
  • remote_write: The remote_write setting already makes PostgreSQL replicate synchronously. However, only the master saves data to disk. For the slave, it is enough to send the data to the operating system. The idea is to not wait for the second disk flush to speed things up. It is very unlikely that both storage systems crash at exactly the same time. Therefore, the risk of data loss is close to zero.
  • on: In this case, a transaction is okay if the master and the slaves have successfully flushed the transaction to disk. The application will not receive a commit unless data is safely stored on two servers (or more, depending on the configuration).
  • remote_apply: While on ensures that data is safely stored on two nodes, it does not guarantee that we can simply load balance right away. The fact that data is flushed on the disk does not ensure that the user can already see the data. For example, if there is a conflict, a slave will halt transaction replay—however, a transaction log is still sent to the slave during a conflict and flushed to disk. In short, it can occur that data is flushed on the slave, even if it is not visible to the end user yet. The remote_apply option fixes this problem. It ensures that data must be visible on the replica so that the next read request can be safely executed on the slave, which can already see the changes made to the master and expose them to the end user. The remote_apply option is, of course, the slowest way to replicate data because it requires the slave to already expose the data to the end user.

In PostgreSQL, the synchronous_commit parameter is not a global value. It can be adjusted on various levels, just like many other settings. We might want to do something as follows:

test=# ALTER DATABASE test SET synchronous_commit TO off;
ALTER DATABASE

Sometimes, only a single database should replicate in a certain way. It is also possible to just synchronously replicate if we are connected as a specific user. Last but not least, it is also possible to tell a single transaction how to commit. By adjusting the synchronous_commit parameter on the fly, it is even possible to control things on a per-transaction level.

For example, consider the following two scenarios:

  • Writing to a log table where we might want to use an asynchronous commit because we want to be quick
  • Storing a credit card payment where we want to be safe so a synchronous transaction might be the desired thing

As we can see, the very same database might have different requirements, depending on which data is modified. Therefore, changing data at the transaction level is very useful and helps to improve speed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset