Transactions are a core technique of every professional relational database system. In fact, it is pretty hard to imagine a world without transactions these days. Atomic deletions, proper locking, and all the functionalities provided by a modern relational system are simple and expected from a modern system, and many applications rely on them. Therefore, this chapter is all about transactions and locking.
The following topics will be covered:
FOR UPDATE
, FOR SHARE
, and NOWAIT
The PostgreSQL transaction model differs quite a lot from other systems. In many cases, these differences are a root source of trouble. In this section, some basic examples have been included to make you aware of the way PostgreSQL works.
The first thing to take into account is that a transaction block has to be started with BEGIN
statement and finished with a simple COMMIT
statement All statements inside the transaction block have to be correct:
test=# BEGIN; BEGIN test=# SELECT now(); now ------------------------------- 2014-09-22 13:27:24.952578+02 (1 row) test=# SELECT now(); now ------------------------------- 2014-09-22 13:27:24.952578+02 (1 row) test=# COMMIT; COMMIT
Note that now()
will return the transaction time. Therefore it is always the same, regardless of how often it is used. Many people like to use DEFAULT now()
in their column definitions. Always keep in mind that DEFAULT now()
sets the column to transaction time, and it stays constant even in a long transaction. This is a handy way to determine whether all rows have been written in the same transaction.
The second important thing to note is that a transaction must be free of mistakes:
test=# BEGIN; BEGIN test=# SELECT 1; ?column? ---------- 1 (1 row) test=# SELECT 1 / 0; ERROR: division by zero test=# SELECT 1; ERROR: current transaction is aborted, commands ignored until end of transaction block test=# SELECT 1; ERROR: current transaction is aborted, commands ignored until end of transaction block test=# COMMIT; ROLLBACK
In this example, an error occurs after a successful start of the transaction. The main issue here is that the transaction will never recover, even if properly working statements are issued after the mistake! Remember that a transaction has to be correct from the beginning to the end. Otherwise, all commands after the error will be ignored by the system.
If the conditions we just mentioned cannot be achieved, it is wise to consider using savepoints. Basically, a savepoint is a mechanism used to jump back inside a transaction. It can be used effectively to avoid errors and to ensure success. The following example shows how it works:
test=# BEGIN; BEGIN test=# SELECT 1; ?column? ---------- 1 (1 row) test=# SAVEPOINT s1; SAVEPOINT test=# SELECT 1 / 0; ERROR: division by zero test=# ROLLBACK TO SAVEPOINT s1; ROLLBACK test=# COMMIT; COMMIT
After the initial SELECT
statement, a savepoint is created. Note that the savepoint, as the name suggests, is used later on to identify the spot to return to. After an error occurs, it is still possible to jump to a specific savepoint. The main advantage is that the transaction can be committed nicely in this case.
Now that you have learned about the basic PostgreSQL model followed by the PostgreSQL system, it is time to dig into locking. In many cases, locking turns out to be a bottleneck. So, it is definitely worth inspecting things.
To get started, a simple demo table can be created like this:
test=# CREATE TABLE t_test AS SELECT 1 AS id; SELECT 1
The table contains only one row, which is enough to demonstrate some basic concepts of locking. In the first example, the goal is to demonstrate what happens when two users try to increment the value in the table at the same time:
User 1 |
User 2 |
---|---|
BEGIN;
| |
BEGIN;
| |
UPDATE t_test SET id = id +1 RETURNING *;
| |
UPDATE t_test SET id = id +1 RETURNING *;
| |
wait ... | |
COMMIT;
|
wait ... |
COMMIT;
|
The goal is to allow two users to increment the value in the table at the same time. The important thing to observe is that the second UPDATE
statement has to wait until the first UPDATE
statement has committed. It is important to mention that we are talking about a row level lock and not a table lock here. PostgreSQL only locks those rows affected by the UPDATE
statement, ensuring that other people are not facing serious bottlenecks.
The second important observation is the result; it is ensured that the outcome at the end of the example is always 3
. As soon as the first transaction commits, the second UPDATE
rereads the now-committed row and increments it. While this may seem like a logical issue, it is not widely known to many people in spite of its wide implications.
Let's expand the example a little:
test=# INSERT INTO t_test VALUES (5); INSERT 0 1 test=# SELECT * FROM t_test; id ---- 3 5 (2 rows)
Let's perform two updates now:
User 1: |
User 2: |
---|---|
BEGIN;
|
BEGIN;
|
UPDATE t_test SET id = 4 WHERE id = 3;
|
UPDATE t_test SET id = 6 WHERE id = 4;
|
UPDATE t_test SET id = 4 WHERE id = 3;
| |
UPDATE t_test SET id = 6 WHERE id = 5;
| |
… deadlock detected ... | |
COMMIT;
|
The first UPDATE
statement of both the users works as expected. The second UPDATE
statement of User 2 has to wait for User 1 because the same row is changed. The important part, however, is the final UPDATE
. Both users wait on each other, so nobody can win this game. Therefore, PostgreSQL steps in and resolves the hopeless situation. The exact error message is as follows:
test=# UPDATE t_test SET id = 6 WHERE id = 5; ERROR: deadlock detected DETAIL: Process 27988 waits for ShareLock on transaction 1131992; blocked by process 28084. Process 28084 waits for ShareLock on transaction 1131990; blocked by process 27988. HINT: See server log for query details. CONTEXT: while updating tuple (0,5) in relation "t_test"
The deadlock is resolved automatically. All that the user has to do is to catch the error and try again.
How can deadlocks be avoided? Actually, there is no general rule. Personally, I found it useful to try to update data in a sorted order. In many cases, this works well and cures the reasons for a deadlock (which is out of order changes). Besides this, there is not much that can be done to get around this issue.
When talking about deadlocks, many people instantly ask about a PostgreSQL parameter called deadlock_timeout
:
test=# SHOW deadlock_timeout; deadlock_timeout ------------------ 1s (1 row)
There is a general misconception regarding this parameter in the heads of many. The parameter actually tells us how long the system waits before checking for a deadlock. Deadlock detection is fairly expensive, and therefore, PostgreSQL waits for one second before it initializes the check.
In this section you will be introduced to a problem that has bugged generations of database developers—wrong or missing locking.
Let's assume you are reading some data and you definitely intend to change it later on, once you have read it. But what if the data was already gone by the time you were performing your UPDATE
? I assume it would not do you any good.
In PostgreSQL, reading can occur concurrently, and many people can read the same piece of data at the same time without interfering with each other. Therefore, reading does not provide sufficient protection against other folk who are planning to modify data as well. The result can be nasty. Consider the following example:
BEGIN; SELECT * FROM tab WHERE foo = bar; UPDATE tab SET foo = "value made in the application"; COMMIT;
A transaction like what is shown here is not safe. The reason is simple: What if people execute the same thing concurrently? It may happen that they overwrite each other's changes instantly. Clearly, this would cause major issues.
To prevent this from happening, SELECT … FOR UPDATE
comes to the rescue.
Consider the following example:
User 1: |
User 2: |
---|---|
BEGIN;
|
BEGIN;
|
SELECT * FROM t_test WHERE 1 > 0 FOR UPDATE;
| |
… do some work ... |
SELECT * FROM t_test WHERE 1 > 0 FOR UPDATE;
|
… do some work ... |
wait ... |
COMMIT;
| |
… returns latest data ... | |
… do some work and commit ... |
In this example, SELECT … FOR UPDATE
locks the rows returned by the query, just like UPDATE
would've locked those rows. The beauty here is that the second transaction has to wait until the first transaction either issues a COMMIT
statement or exits. The advantage here is that the second transaction can already build on the changes of the first query, ensuring that nothing is lost.
However, if used carelessly, SELECT … FOR UPDATE
can cause serious performance issues. Consider the following example:
SELECT … FROM a, b, c, d, e, f, g WHERE … FOR UPDATE;
Let's assume all you want to do is to modify a
. The rest of the tables are just used as lookup tables. PostgreSQL has no way to figure out that you are planning to modify just one table. Therefore, it has to lock all rows returned by the query, in all tables. While this is a necessary thing for PostgreSQL to do, it is likely to cause nasty contentions involving lookup tables.
There are two approaches to fix this problem. The first is to use SELECT … FOR UPDATE NOWAIT
. This will make a query stop if a lock cannot be obtained. It can come in handy if you want to avoid waiting indefinitely. The second option is even more sophisticated. Consider the following example:
SELECT … FROM a, b, c, d, e, f, g WHERE … FOR UPDATE OF a, b;
In this case, PostgreSQL knows that only two tables will most likely be updated in the next couple of steps, which will improve locking substantially.
In some cases, SELECT … FOR UPDATE
is not enough, and locking an entire table is necessary. To create a table lock, PostgreSQL provides a simple command:
test=# h LOCK Command: LOCK Description: lock a table Syntax: LOCK [ TABLE ] [ ONLY ] name [ * ] [, ...] [ IN lockmode MODE ] [ NOWAIT ] where lockmode is one of: ACCESS SHARE | ROW SHARE | ROW EXCLUSIVE | SHARE UPDATE EXCLUSIVE | SHARE | SHARE ROW EXCLUSIVE | EXCLUSIVE | ACCESS EXCLUSIVE
Note that PostgreSQL has eight lock levels here, ranging from ACCESS SHARE
all the way up to ACCESS EXCLUSIVE
. These allow you to define locking in a very fine-grained way. Suppose you want to make sure that you are the only person allowed to read and write a table. Then the lock mode needed is ACCESS EXCLUSIVE
. It perfectly ensures that nobody else can even look at that table. If nobody is supposed to modify a table but reading is perfectly fine, EXCLUSIVE
is the option of choice.
On the other end of the spectrum, there is a lock held by a simple read called ACCESS SHARE
. It can nicely coexist with other reads and does not harm others. It only conflicts with ACCESS EXCLUSIVE
, which is needed by DROP TABLE
and similar commands at http://www.postgresql.org/docs/9.4/static.
Locking a table is sometimes necessary, but it should never be done in a careless way because it can have nasty side effects and block other transactions.
More information about locking and a detailed overview of all potential conflicts can be found at:
http://www.postgresql.org/docs/9.4/static/explicit-locking.html.