Using window functions

All aggregating functions can be used as window functions, with the exception of ordered-set and hypothetical-set aggregates. User-defined aggregating functions can also be used as window functions. The presence of the OVER clause indicates that the function is used as a window function.

When the aggregating function is used as a window function, it will aggregate the rows that belong to the window frame of a current row. The typical use cases for the window functions are computing statistical values of different kinds. Take the car portal database for example. There is a table called advertisement that contains information about the advertisements that users create.
Suppose it is required to analyze the quantity of advertisements that the users create over a period of time. The query that generates the report would be as follows:

car_portal=> WITH monthly_data AS (
SELECT date_trunc('month', advertisement_date) AS month, count(*) as cnt
FROM car_portal_app.advertisement GROUP BY date_trunc('month', advertisement_date)
)
SELECT to_char(month,'YYYY-MM') as month, cnt,
sum(cnt) OVER (w ORDER BY month) AS cnt_year,
round(avg(cnt) OVER (ORDER BY month ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING), 1) AS mov_avg,
round(cnt / sum(cnt) OVER w * 100,1) AS ratio_year
FROM monthly_data WINDOW w AS (PARTITION BY date_trunc('year',month));
month | cnt | cnt_year | mov_avg | ratio_year
---------+-----+----------+---------+------------
2014-01 | 42 | 42 | 40.3 | 5.8
2014-02 | 49 | 91 | 44.5 | 6.7
2014-03 | 30 | 121 | 56.8 | 4.1
2014-04 | 57 | 178 | 69.0 | 7.8
2014-05 | 106 | 284 | 73.0 | 14.6
2014-06 | 103 | 387 | 81.0 | 14.2
2014-07 | 69 | 456 | 86.0 | 9.5
2014-08 | 70 | 526 | 74.0 | 9.6
2014-09 | 82 | 608 | 60.6 | 11.3
2014-10 | 46 | 654 | 54.2 | 6.3
2014-11 | 36 | 690 | 49.8 | 5.0
2014-12 | 37 | 727 | 35.2 | 5.1
2015-01 | 48 | 48 | 32.5 | 84.2
2015-02 | 9 | 57 | 31.3 | 15.8

(14 rows)

In the WITH clause, the data is aggregated on a monthly basis. In the main query, the window w is defined, implying partitioning by year. This means that every window function that uses the window w will work with the records of the same year as the current record.

The first window function, sum, uses the window w. As ORDER BY is specified, each record has its place in the partition. The frame clause is omitted, therefore the frame, RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, is applied. This means that the function calculates the sum of the values for the records from the beginning of each year till the current month. It is the cumulative total on a yearly basis.

The second function, avg, calculates the moving average. For each record, it calculates the average value of five records—ranging from the second preceding the current record to the second following the current one. It does not use a predefined window, because the moving average does not take the year into account. Only the order of the values matters.

The third window function, sum, uses the same window definition again. It calculates the total sum of the values for the whole year. This value is used as denominator in the expression calculating the share of the current month over the year.

There are several window functions that are not aggregating functions. They are used to get the values of other records within the partition, calculate the rank of the current row among all rows, and generate row numbers.

For example, let's change the report from the previous example. Suppose it is necessary to calculate the difference in the quantity of advertisements for each month against the previous months and against the same month of the previous year. Suppose it is also required to get the rank of the current month. The query would be as follows:

car_portal=> WITH monthly_data AS (
SELECT date_trunc('month', advertisement_date) AS month, count(*) as cnt
FROM car_portal_app.advertisement GROUP BY date_trunc('month', advertisement_date)
)
SELECT to_char(month,'YYYY-MM') as month, cnt,
cnt - lag(cnt) OVER (ORDER BY month) as prev_m,
cnt - lag(cnt, 12) OVER (ORDER BY month) as prev_y,
rank() OVER (w ORDER BY cnt DESC) as rank
FROM monthly_data
WINDOW w AS (PARTITION BY date_trunc('year',month))
ORDER BY month DESC;
month | cnt | prev_m | prev_y | rank
---------+-----+--------+--------+------
2015-02 | 9 | -39 | -40 | 2
2015-01 | 48 | 11 | 6 | 1
2014-12 | 37 | 1 | | 10
2014-11 | 36 | -10 | | 11
2014-10 | 46 | -36 | | 8
2014-09 | 82 | 12 | | 3
2014-08 | 70 | 1 | | 4
2014-07 | 69 | -34 | | 5
2014-06 | 103 | -3 | | 2
2014-05 | 106 | 49 | | 1
2014-04 | 57 | 27 | | 6
2014-03 | 30 | -19 | | 12
2014-02 | 49 | 7 | | 7
2014-01 | 42 | | | 9

(14 rows)

The lag function returns the value of a given expression for the record, which is the given number of records before the current one (default is 1). In the first occurrence of the function in the example, it returns the value of the field cnt from the previous record, which corresponds to the previous month. You may see that the number of advertisements for February 2015 is 9, which is lower than in January 2015 by 39.

The second lag returns the value of cnt for the record that is 12 records before the current one--meaning a year ago. The number for February 2015 is lower than the number for February 2014 by 42.

The rank function returns the rank of the current row within the partition. It returns the rank with gaps. This means that if two records have the same position according to the ORDER BY clause, both of them will get the same rank. The next record will get the rank after the next rank. For example, two first records followed by a third one.

Other window functions are as follows:

  • lead: Similar to lag, but returning the value of a given expression evaluated for the record that is the given number of records after the current row.
  • first_value, last_value, nth_value: These return the value of a given expression evaluated for the first record, last record, or nth record of the frame respectively.
  • row_number: Returns the number of the current row within the partition.
  • dense_rank: Returns the rank of the current row without gaps.
  • percent_rank and cume_dist: Return the relative rank of the current row. The difference is that the first function uses rank and the second uses row_number as a numerator for the calculations.
  • ntile: Divides the partition into the given number of equal parts and returns the number of the part where the current record belongs.

A more detailed description of these functions is available in the documentation at http://www.postgresql.org/docs/current/static/functions-window.html.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset