Getting joins right

Before we dive into optimizing joins, it is important to take a look at some of the most common problems that arise with joins and which of them should ring alarm bells for you.

Here is an example of a simple table structure to demonstrate how joins work:

test=# CREATE TABLE a (aid int);
CREATE TABLE
test=# CREATE TABLE b (bid int);
CREATE TABLE
test=# INSERT INTO a VALUES (1), (2), (3);
INSERT 0 3
test=# INSERT INTO b VALUES (2), (3), (4);
INSERT 0 3

In the following example, you will see a simple outer join:

test=# SELECT * FROM a LEFT JOIN b ON (aid = bid);
aid | bid
-----+-----
1 |
2 | 2
3 | 3
(3 rows)

You can see that PostgreSQL will take all rows from the left-hand side and only list the ones fitting the join.

The following example might come as a surprise to many people:

test=# SELECT * FROM a LEFT JOIN b ON (aid = bid AND bid = 2);
aid | bid
-----+-----
1 |
2 | 2
3 |
(3 rows)

No, the number of rows does not decrease—it will stay constant. Most people assume that there will only be one row in the join, but this is not true, and will lead to some hidden issues.

Consider the following query, which performs a simple join:

test=# SELECT avg(aid), avg(bid) 
FROM a LEFT JOIN b
ON (aid = bid AND bid = 2);
avg | avg
--------------------+--------------------
2.0000000000000000 | 2.0000000000000000
(1 row)

Most people assume that the average is calculated based on a single row. However, as stated earlier, this is not the case and therefore queries such as that are often considered to be a performance problem because, for some reason, PostgreSQL does not index the table on the left-hand side of the join. Of course, we are not looking at a performance problem here—we are definitely looking at a semantic issue. It happens on a regular basis that people writing outer joins don't really know what they ask PostgreSQL to do. So, my personal advice is to always question the semantic correctness of an outer join before attacking the performance problem reported by the client.

I cannot stress enough how important this kind of work is to ensure that your queries are correct and do exactly what is needed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset