Understanding the subtle difference between ROWS and RANGE

So far, you have seen sliding windows using OVER ... ROWS. However, there is more. Let's take a look at the SQL specification taken directly from the PostgreSQL documentation:

{ RANGE | ROWS | GROUPS } frame_start [ frame_exclusion ]
{ RANGE | ROWS | GROUPS } BETWEEN frame_start AND frame_end [ frame_exclusion ]

There is more than just ROWS. In real life, we have seen that many people are struggling to understand the difference between RANGE and ROWS. In many cases, the result is the same, which adds even more to the confusion. To understand the problem, let's first create some simple data:

test=# SELECT *, x / 3 AS y FROM generate_series(1, 15) AS x;
x | y
----+---
1 | 0
2 | 0
3 | 1
4 | 1
5 | 1
6 | 2
7 | 2
8 | 2
9 | 3
10 | 3
11 | 3
12 | 4
13 | 4
14 | 4
15 | 5
(15 rows)

This is a simple dataset. Be particularly aware of the second column, which contains a couple of duplicates. Those will be relevant in a minute:

test=# SELECT *, x / 3 AS y, 
array_agg(x) OVER (ORDER BY x
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS rows_1,
array_agg(x) OVER (ORDER BY x
RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS range_1,
array_agg(x/3) OVER (ORDER BY (x/3)
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS rows_2,
array_agg(x/3) OVER (ORDER BY (x/3)
RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS range_2
FROM generate_series(1, 15) AS x;
x | y | rows_1 | range_1 | rows_2 | range_2
----+---+------------+------------+---------+---------------------
1 | 0 | {1,2} | {1,2} | {0,0} | {0,0,1,1,1}
2 | 0 | {1,2,3} | {1,2,3} | {0,0,1} | {0,0,1,1,1}
3 | 1 | {2,3,4} | {2,3,4} | {0,1,1} | {0,0,1,1,1,2,2,2}
4 | 1 | {3,4,5} | {3,4,5} | {1,1,1} | {0,0,1,1,1,2,2,2}
5 | 1 | {4,5,6} | {4,5,6} | {1,1,2} | {0,0,1,1,1,2,2,2}
6 | 2 | {5,6,7} | {5,6,7} | {1,2,2} | {1,1,1,2,2,2,3,3,3}
7 | 2 | {6,7,8} | {6,7,8} | {2,2,2} | {1,1,1,2,2,2,3,3,3}
8 | 2 | {7,8,9} | {7,8,9} | {2,2,3} | {1,1,1,2,2,2,3,3,3}
9 | 3 | {8,9,10} | {8,9,10} | {2,3,3} | {2,2,2,3,3,3,4,4,4}
10 | 3 | {9,10,11} | {9,10,11} | {3,3,3} | {2,2,2,3,3,3,4,4,4}
11 | 3 | {10,11,12} | {10,11,12} | {3,3,4} | {2,2,2,3,3,3,4,4,4}
12 | 4 | {11,12,13} | {11,12,13} | {3,4,4} | {3,3,3,4,4,4,5}
13 | 4 | {12,13,14} | {12,13,14} | {4,4,4} | {3,3,3,4,4,4,5}
14 | 4 | {13,14,15} | {13,14,15} | {4,4,5} | {3,3,3,4,4,4,5}
15 | 5 | {14,15} | {14,15} | {4,5} | {4,4,4,5}
(15 rows)

After listing the x and y columns, I have applied windowing functions on x. As you can see, the results are the same for both columns. rows_1 and range_1 are absolutely identical. The situation changes if we start to use the column containing those duplicates. In the case of ROWS, PostgreSQL simply takes the previous and the next rows. In the case of RANGE, it takes the entire group of duplicates. Hence, the array is a lot longer. The entire group of identical values is taken.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset