© Daniel Bartholomew 2017
Daniel BartholomewMariaDB and MySQL Common Table Expressions and Window Functions Revealedhttps://doi.org/10.1007/978-1-4842-3120-3_5

5. Recognizing Opportunities for Window Functions

Daniel Bartholomew
(1)
Raleigh, North Carolina, USA
 
The previous chapter was an overview of what Window Functions are, with details on the syntax. It’s time to put that knowledge into practice. This chapter expands upon that with some simple yet practical examples that illustrate some of the types of problems Window Functions are good at solving. We’ll cover organizing results, maintaining running totals, and ranking results.
Caution
If you are using MySQL 8.0, you must be on at least version 8.0.2. This was the version that introduced Window Functions. Previous versions of MySQL, including MySQL 8.0.0 and 8.0.1, do not have Window Functions.
All versions of MariaDB 10.2 and higher have Window Functions.

Partitioning and Ordering Results

One of the most important purposes of a database is to organize the mountains of data that surround us. The <partition_definition> and <order_definition> sections of the OVER clause exist to help us do just that. We covered their syntax in the previous chapter, but some practical demonstrations of how these sections work is probably more useful than a dry syntax diagram on the path to mastering Window Functions.
Using the employees table from Chapter 1, here is a simple example of the ROW_NUMBER Window Function. First, we’ll call the function with none of the optional sections in the OVER clause:
SELECT
  ROW_NUMBER() OVER() AS rnum,
  name, title, office
FROM employees
WHERE office='Cleveland' OR office='Memphis'
ORDER BY title;
The result is a little unexpected:
+------+------------------+-------------+-----------+
| rnum | name             | title       | office    |
+------+------------------+-------------+-----------+
|    1 | Eileen Morrow    | dba         | Cleveland |
|    3 | Douglas Williams | dba         | Memphis   |
|    8 | Carol Monreal    | dba         | Cleveland |
|   14 | Elva Garcia      | dba         | Memphis   |
|    5 | Rosemary Bowers  | manager     | Cleveland |
|   12 | Richard Delgado  | manager     | Memphis   |
|    2 | Julius Ramos     | salesperson | Cleveland |
|    7 | Tammy Castro     | salesperson | Memphis   |
|    9 | Joyce Beck       | salesperson | Memphis   |
|   10 | Alonzo Page      | salesperson | Cleveland |
|   11 | Tina Jefferson   | salesperson | Cleveland |
|   13 | Leo Gutierrez    | salesperson | Cleveland |
|   15 | Joann Smith      | salesperson | Memphis   |
|    4 | Janet Edwards    | support     | Memphis   |
|    6 | Louise Lewis     | support     | Cleveland |
+------+------------------+-------------+-----------+
Looking at this result, you might be confused as to why the rnum column is out of order. In fact, your result may have a different rnum column entirely. It may be in order from 1 to 15, or it may have a completely different order. What is going on? The out-of-order rnum column happens because Window Functions are computed after the other parts of the SELECT statement have been fetched (the name, title, and office columns in our example) and after any WHERE, HAVING, or GROUP BY clauses, but before the final ORDER BY title clause. In fact, without an <order_definition> section in the OVER clause, Window Functions do not guarantee any particular ordering.
Ideally, we want the rnum column to always match the end output of the ORDER BY title clause. To ensure this happens, we add an <order_definition> section to the OVER clause like so:
SELECT
  ROW_NUMBER()
    OVER (
      ORDER BY title
    ) AS rnum,
  name, title, office
FROM employees
WHERE office='Cleveland' OR office='Memphis'
ORDER BY title;
Now both the rnum column and the final output are guaranteed to be ordered by the contents of the title column, so the result will always make logical sense:
+------+------------------+-------------+-----------+
| rnum | name             | title       | office    |
+------+------------------+-------------+-----------+
|    1 | Eileen Morrow    | dba         | Cleveland |
|    2 | Douglas Williams | dba         | Memphis   |
|    3 | Carol Monreal    | dba         | Cleveland |
|    4 | Elva Garcia      | dba         | Memphis   |
|    5 | Rosemary Bowers  | manager     | Cleveland |
|    6 | Richard Delgado  | manager     | Memphis   |
|    7 | Julius Ramos     | salesperson | Cleveland |
|    8 | Tammy Castro     | salesperson | Memphis   |
|    9 | Joyce Beck       | salesperson | Memphis   |
|   10 | Alonzo Page      | salesperson | Cleveland |
|   11 | Tina Jefferson   | salesperson | Cleveland |
|   12 | Leo Gutierrez    | salesperson | Cleveland |
|   13 | Joann Smith      | salesperson | Memphis   |
|   14 | Janet Edwards    | support     | Memphis   |
|   15 | Louise Lewis     | support     | Cleveland |
+------+------------------+-------------+-----------+
Between the Cleveland and Memphis offices there are 15 employees. What if we want to number employees in separate offices, well, separately?
This is the purpose of the <partition_definition> section. It allows us to group our results. With a <partition_definition> section in the OVER clause, the ROW_NUMBER function will do its thing to each group independently. So, we’ll add PARTITION BY office to the OVER clause like so:
SELECT
  ROW_NUMBER() OVER (
    PARTITION by office
    ORDER BY title
  ) AS rnum,
  name, title, office
FROM employees
WHERE office='Cleveland' OR office='Memphis'
ORDER BY title;
Unfortunately, the output is not as useful as we might have supposed:
+------+------------------+-------------+-----------+
| rnum | name             | title       | office    |
+------+------------------+-------------+-----------+
|    1 | Eileen Morrow    | dba         | Cleveland |
|    1 | Douglas Williams | dba         | Memphis   |
|    2 | Carol Monreal    | dba         | Cleveland |
|    2 | Elva Garcia      | dba         | Memphis   |
|    3 | Rosemary Bowers  | manager     | Cleveland |
|    3 | Richard Delgado  | manager     | Memphis   |
|    4 | Julius Ramos     | salesperson | Cleveland |
|    4 | Tammy Castro     | salesperson | Memphis   |
|    5 | Joyce Beck       | salesperson | Memphis   |
|    5 | Alonzo Page      | salesperson | Cleveland |
|    6 | Tina Jefferson   | salesperson | Cleveland |
|    7 | Leo Gutierrez    | salesperson | Cleveland |
|    6 | Joann Smith      | salesperson | Memphis   |
|    7 | Janet Edwards    | support     | Memphis   |
|    8 | Louise Lewis     | support     | Cleveland |
+------+------------------+-------------+-----------+
As with other results in this chapter, yours may be slightly different from this. But the <partition_definition> section is working perfectly here. The first Cleveland and first Memphis employees in the result are both given a row number of 1. The second ones 2, and so on. There’s a little confusion further down in the results because Cleveland has more salespersons than Memphis, but the correct row numbers are all there. We just need to fix the output so that it is more understandable.
This is another instance that plainly shows that Window Functions do not guarantee ordering. The solution in our case is to just add the office column, not in the OVER clause, but to the final ORDER BY clause, before title, like so:
SELECT
  ROW_NUMBER() OVER (
    PARTITION by office
    ORDER BY title
  ) AS rnum,
  name, title, office
FROM employees
WHERE office='Cleveland' OR office='Memphis'
ORDER BY office,title;
The output now looks like this:
+------+------------------+-------------+-----------+
| rnum | name             | title       | office    |
+------+------------------+-------------+-----------+
|    1 | Eileen Morrow    | dba         | Cleveland |
|    2 | Carol Monreal    | dba         | Cleveland |
|    3 | Rosemary Bowers  | manager     | Cleveland |
|    4 | Julius Ramos     | salesperson | Cleveland |
|    5 | Alonzo Page      | salesperson | Cleveland |
|    6 | Tina Jefferson   | salesperson | Cleveland |
|    7 | Leo Gutierrez    | salesperson | Cleveland |
|    8 | Louise Lewis     | support     | Cleveland |
|    1 | Douglas Williams | dba         | Memphis   |
|    2 | Elva Garcia      | dba         | Memphis   |
|    3 | Richard Delgado  | manager     | Memphis   |
|    4 | Tammy Castro     | salesperson | Memphis   |
|    5 | Joyce Beck       | salesperson | Memphis   |
|    6 | Joann Smith      | salesperson | Memphis   |
|    7 | Janet Edwards    | support     | Memphis   |
+------+------------------+-------------+-----------+
Success! Each office’s employees are numbered independently, and the numbering is all in order.

Maintaining a Running Total

Another common database task is to maintain a running total of something. This could be an account balance, the number of items sold over a period of months, or a host of other numeric values.
The commissions table we used back in Chapter 2 can be used to demonstrate this. It tracks commissions from salespersons in our fictional company. The table records the salesperson’s ID number, an ID number for the commission, the commission amount, and the date of the commission.
A brief sample of the data can be seen with the following query:
SELECT
  commission_date AS date,
  salesperson_id as sp,
  commission_id as id,
  commission_amount as amount
FROM commissions
ORDER BY sp,date;
The complete output is quite long, but here’s the first ten rows of output:
+------------+----+-------+--------+
| date       | sp | id    | amount |
+------------+----+-------+--------+
| 2016-01-21 |  3 | 15165 | 429.50 |
| 2016-02-09 |  3 | 15231 | 142.37 |
| 2016-02-12 |  3 | 15253 | 184.74 |
| 2016-03-22 |  3 | 15428 | 169.62 |
| 2016-04-01 |  3 | 15476 | 363.53 |
| 2016-05-10 |  3 | 15644 | 358.49 |
| 2016-07-11 |  3 | 15901 | 149.64 |
| 2016-11-25 |  3 | 16465 | 452.04 |
| 2017-01-25 |  3 | 16726 | 145.68 |
| 2017-03-10 |  3 | 16927 | 216.16 |
...
+------------+----+-------+--------+
To create a running commissions total, we need to take the first amount for a row matching a given salesperson and output it in a column. We’ll call this column total. Then, for each subsequent row we add the previous total to the current row’s commission and make that the new total and so on until we come to a new salesperson_id, whereupon we will start the process over. At least, that’s my way of thinking through the problem. Actually solving this using traditional SQL is a bit different. For example, one traditional SQL way of solving the problem is to use a self-join and the SUM function like so:
SELECT
  commission_date AS date, salesperson_id as sp,
  commission_id as id, commission_amount as amount,
  (SELECT SUM(commission_amount)
    FROM commissions AS c2
    WHERE c2.salesperson_id = c1.salesperson_id AND
          c2.commission_date <= c1.commission_date) AS total
FROM commissions AS c1
ORDER BY sp,date;
Inside our main query we have a subquery that looks for every row where the salesperson matches and the date is less than or equal to the date of the current row. For those rows that match the criteria, it sums them all up and outputs the answer in the total column. So, for the row with the oldest date stamp the only other row that it will match is itself, so the total column is the same as the commission_amount column for that row. For the row with the second oldest date stamp, the rows that match will be the oldest row and itself, so those are what are summed together into the total column. The process continues until all the rows that match a given salesperson_id have been fetched, and then the process restarts with the next salesperson.
The result is several thousand lines long, but here’s the first few rows:
+------------+-----+-------+--------+----------+
| date       | sp  | id    | amount | total    |
+------------+-----+-------+--------+----------+
| 2016-01-21 |   3 | 15165 | 429.50 | 429.50   |
| 2016-02-09 |   3 | 15231 | 142.37 | 571.87   |
| 2016-02-12 |   3 | 15253 | 184.74 | 756.61   |
| 2016-03-22 |   3 | 15428 | 169.62 | 926.23   |
| 2016-04-01 |   3 | 15476 | 363.53 | 1289.76  |
| 2016-05-10 |   3 | 15644 | 358.49 | 1648.25  |
| 2016-07-11 |   3 | 15901 | 149.64 | 1797.89  |
| 2016-11-25 |   3 | 16465 | 452.04 | 2249.93  |
| 2017-01-25 |   3 | 16726 | 145.68 | 2395.61  |
| 2017-03-10 |   3 | 16927 | 216.16 | 2611.77  |
| 2017-04-05 |   3 | 17046 | 277.36 | 2889.13  |
| 2017-04-11 |   3 | 17072 | 151.36 | 3040.49  |
| 2017-05-29 |   3 | 17272 | 368.20 | 3408.69  |
...
+------------+-----+-------+--------+----------+
All in all, it works, and you could even say it works well. It’s easy to confirm that the total column is accurately keeping a running total and that when the salesperson_id changes the count starts over.
But the process for how we get at the total is awkward. If this were a paper ledger, we wouldn’t continuously re-add everything that came before; we would simply add the new value to the old total.
Also, all of the adding and re-adding our query is doing means we’re constantly going out and fetching new groups of rows to sum together. All of this fetching and re-fetching takes time. Indexes can help, especially on large tables, but using a subquery like this doesn’t seem like the right way to go about solving our original task of displaying a running total.
A Window Function can do this job much better, and it does it in a more natural way. First off, like our previous Window Function example where we were numbering employees, we need to use both the <partition_definition> and <order_definition> sections in our OVER clause. We’ll PARTITION BY salesperson_id and ORDER BY commission_date to match the final ORDER BY clause of the query.
We then need to tell the Window Function what rows to add together for our total column. Aggregate Window Functions, like SUM, can use moving window frames to quickly identify the data they need to process. For this we’ll use a <frame_definition> that goes from the beginning of our result set, or UNBOUNDED PRECEDING, to the CURRENT ROW, so our full <frame_definition> section will be:
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
To put all of this in place, we take the previous query and replace the subquery part with our SUM Window Function, like so:
SELECT
  commission_date AS date, salesperson_id as sp,
  commission_id as id, commission_amount as amount,
  SUM(commission_amount)
    OVER (
      PARTITION BY salesperson_id
      ORDER BY commission_date
      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS total
FROM commissions AS c1
ORDER BY sp,date;
The result of this query is the same as we saw for the subquery version, so there’s no need to show it again.
When this query is run, on the first row the window frame is just the first row. As the query continues, the frame expands, and unlike subqueries, it doesn’t need to refetch any data; it just looks at the data that is already there. Then, when the frame crosses our partition boundary, it resets to just the current row and starts growing again. Instead of hundreds or thousands of trips back and forth to the database to fetch the information, which is what happens with a subquery, the Window Function only needs to make one pass.
This performance difference is so dramatic it can even be seen on the relatively small commissions table. On my laptop, processing the whole 3,000+ rows in the table takes 2.56 seconds to run the self-join version of the query. Contrast that with the Window Function version, which completes instantly. Two and a half seconds isn’t much, but as table size increases the speed advantage of Window Functions keeps growing. One of the MariaDB engineers performed some tests on similar, but much larger, tables using a similar query. Table 5-1 shows the results. All times are in seconds.
Table 5-1.
Window Functions Versus Self-joins
# Rows in Table
Self-join
Self-join with Index
Window Function
10,000
0.29
0.01
0.02
100,000
2.91
0.09
0.16
1,000,000
29.10
2.86
3.04
10,000,000
346.30
90.97
43.17
100,000,000
4357.20
813.20
514.24
Without an index on the table, the Window Function version of the query always wins by an order of magnitude over the self-join version. With an index on the table, the self-join version of the query can keep up with the Window Function version for a while, but after the table gets above a million rows, the Window Function takes the lead and is almost twice as fast.
Note
Window Functions, because of how they operate, don’t use indexes at all. They don’t even look for them. This can be considered a side benefit to using Window Functions. If you can eliminate expensive self-joins and subqueries by using Window Functions, you may not need to go through the time, effort, and overhead of creating and maintaining indexes.

Ranking Rows in a Result Set

Another common analytical query is to find the top N numbers of something, be it the top five selling items, the top ten salaries, or the top three scorers in the youth basketball league.
Using our commissions table again, it is trivial to use SQL to find the top five commissions earned. One way to do it is with the following SQL (with LIMIT 5 to keep the output small):
SELECT
  id, salesperson_id AS sid,
  commission_id AS cid,
  commission_amount AS amount,
  commission_date AS date
FROM commissions
ORDER BY commission_amount DESC
LIMIT 5;
The result of this query looks like this:
+------+-----+-------+--------+------------+
| id   | sid | cid   | amount | date       |
+------+-----+-------+--------+------------+
| 2897 | 121 | 17970 | 499.97 | 2017-11-03 |
| 2269 | 105 | 17340 | 499.90 | 2017-06-15 |
| 1916 | 131 | 16987 | 499.55 | 2017-03-22 |
| 2680 |  66 | 17756 | 499.43 | 2017-09-18 |
| 1610 |  40 | 16688 | 499.41 | 2017-01-16 |
+------+-----+-------+--------+------------+
We could improve upon this by doing a JOIN with the employees table to display the name of the employee and their office alongside the commission amount. The following SQL does just that:
SELECT
  commissions.commission_date AS date,
  commissions.commission_id AS cid,
  employees.name AS salesperson,
  employees.office AS office,
  commissions.commission_amount AS amount
FROM commissions LEFT JOIN employees
  ON (commissions.salesperson_id = employees.id)
ORDER BY amount DESC
LIMIT 5;
The result looks like the following:
+------------+-------+-----------------+-----------+--------+
| date       | cid   | salesperson     | office    | amount |
+------------+-------+-----------------+-----------+--------+
| 2017-11-03 | 17970 | Christina Terry | Wichita   | 499.97 |
| 2017-06-15 | 17340 | Alonzo Page     | Cleveland | 499.90 |
| 2017-03-22 | 16987 | Rene Gibbs      | Dallas    | 499.55 |
| 2017-09-18 | 17756 | Kathryn Barnes  | Dallas    | 499.43 |
| 2017-01-16 | 16688 | Joyce Beck      | Memphis   | 499.41 |
+------------+-------+-----------------+-----------+--------+
This is all well and good, and we now know who received the top commissions company-wide, but what if we want to find out the top two commissions from each office? It seems like a reasonable extension of the previous query, but it causes the complexity of our query to jump by quite a bit. There are probably other ways to do this, but here’s a query that does it:
SELECT * FROM (
  SELECT
    commissions.commission_date AS date,
    commissions.commission_id AS cid,
    employees.name AS salesperson,
    employees.office AS office,
    commissions.commission_amount AS amount
  FROM commissions LEFT JOIN employees
    ON (commissions.salesperson_id = employees.id)
) AS c1
WHERE (
  SELECT count(c2.amount)
  FROM (
    SELECT
      commissions.commission_id AS cid,
      commissions.salesperson_id AS sp_id,
      employees.office AS office,
      commissions.commission_amount AS amount
    FROM commissions LEFT JOIN employees
      ON (commissions.salesperson_id = employees.id)
    ) AS c2
  WHERE
    c1.cid != c2.cid
    AND
    c1.office = c2.office
    AND
    c2.amount > c1.amount) < 2
ORDER BY office,amount desc;
There’s a lot going on here, but basically we have our original query, some derived tables, and a subquery in our WHERE clause that uses the COUNT function to count the number of commissions in the commissions table that are from the same office AND where the commission amount is greater than the current row, and then limits those results to the top two. That probably stretches the definition of the word basically a bit too far. It’s quite frankly a bit of a mess, and it’s hard for mere humans to read and parse it. On the positive side, it does work. The result of this convoluted query is as follows:
+------------+-------+------------------+-------------+--------+
| date       | cid   | salesperson      | office      | amount |
+------------+-------+------------------+-------------+--------+
| 2017-09-22 | 17776 | Jack Green       | Chicago     | 497.83 |
| 2017-05-12 | 17215 | Deborah Peterson | Chicago     | 497.29 |
| 2017-06-15 | 17340 | Alonzo Page      | Cleveland   | 499.90 |
| 2017-05-25 | 17261 | Alonzo Page      | Cleveland   | 499.29 |
| 2017-03-22 | 16987 | Rene Gibbs       | Dallas      | 499.55 |
| 2017-09-18 | 17756 | Kathryn Barnes   | Dallas      | 499.43 |
| 2017-01-16 | 16688 | Joyce Beck       | Memphis     | 499.41 |
| 2017-03-02 | 16890 | Tammy Castro     | Memphis     | 497.58 |
| 2017-12-11 | 18116 | Terrance Reese   | Minneapolis | 499.31 |
| 2017-04-05 | 17047 | Ruby Boyd        | Minneapolis | 495.98 |
| 2017-08-30 | 17674 | Dorothy Anderson | Nauvoo      | 497.91 |
| 2017-12-11 | 18115 | Dorothy Anderson | Nauvoo      | 468.99 |
| 2017-01-20 | 16706 | John Conner      | Raleigh     | 499.33 |
| 2016-08-26 | 16082 | Randal Hogan     | Raleigh     | 499.07 |
| 2017-11-03 | 17970 | Christina Terry  | Wichita     | 499.97 |
| 2016-10-31 | 16345 | Christina Terry  | Wichita     | 497.83 |
+------------+-------+------------------+-------------+--------+
Apart from the aforementioned readability issues, the primary downside to this query is that it is slow, especially if there are no indexes on the table. On my laptop, for example, this query takes over ten seconds to run. And even if there were indexes in place, there would be the overhead of creating and maintaining the indexes, which could be problematic if our tables are updated frequently.
A much better and faster way to get at the result we’re after is to use Window Functions, specifically the RANK function. To start things off, we can simply take our original query with the JOIN and add the RANK function to it.
Because we want to find out the top two commissions per office, inside the OVER clause we will ORDER BY the commission_amount column from the commissions table and PARTITION BY the office column from the employees table. We’ll call the result of the RANK function rnk, to keep things simple.
Lastly, to get the ordering correct, we’ll ORDER BY office and our new rnk column.
Putting all of that together, we end up with some SQL that looks like the following:
SELECT
  RANK() OVER (
    PARTITION BY employees.office
    ORDER BY commissions.commission_amount DESC
  ) AS rnk,
  commissions.commission_date AS date,
  commissions.commission_id AS cid,
  employees.name AS salesperson,
  employees.office AS office,
  commissions.commission_amount AS amount
FROM commissions LEFT JOIN employees
  ON (commissions.salesperson_id = employees.id)
ORDER BY office,rnk;
I’ve removed the LIMIT from this query, so it will output every row in the commissions table, which isn’t what we want, but by showing everything we can verify that each row is ranked and partitioned by office correctly. So, we’re close.
Here’s what the first few rows of the output look like:
+-----+------------+-------+------------------+---------+--------+
| rnk | date       | cid   | salesperson      | office  | amount |
+-----+------------+-------+------------------+---------+--------+
|   1 | 2017-09-22 | 17776 | Jack Green       | Chicago | 497.83 |
|   2 | 2017-05-12 | 17215 | Deborah Peterson | Chicago | 497.29 |
|   3 | 2016-11-09 | 16394 | Jack Green       | Chicago | 496.97 |
|   4 | 2016-04-05 | 15486 | Donald Carter    | Chicago | 496.60 |
|   5 | 2017-08-03 | 17566 | Jason Wright     | Chicago | 494.83 |
|   6 | 2016-04-19 | 15549 | Deborah Peterson | Chicago | 494.42 |
|   7 | 2016-10-18 | 16285 | Jack Green       | Chicago | 493.74 |
|   8 | 2017-06-13 | 17324 | Jason Wright     | Chicago | 490.94 |
|   9 | 2016-03-24 | 15441 | Frances Griffin  | Chicago | 488.31 |
|  10 | 2017-10-19 | 17900 | Evelyn Alexander | Chicago | 488.17 |
...
+-----+------------+-------+------------------+---------+--------+
Our task now is to limit the output to the top two results from each office. A simple solution would be to add a WHERE rnk <=2 clause that looks for a rank of 2 or less. A quick modification of our query and we get:
SELECT
  RANK() OVER (
    PARTITION BY employees.office
    ORDER BY commissions.commission_amount DESC
  ) AS rnk,
  commissions.commission_date AS date,
  commissions.commission_id AS cid,
  employees.name AS salesperson,
  employees.office AS office,
  commissions.commission_amount AS amount
FROM commissions LEFT JOIN employees
  ON (commissions.salesperson_id = employees.id)
WHERE rnk <= 2
ORDER BY office,rnk;
However, when we try to run this new query we get the following error:
ERROR 1054 (42S22): Unknown column 'rnk' in 'where clause'
On the surface, this error is very confusing. We are referencing the rnk column in both the WHERE and ORDER BY clauses, and they’re right next to each other. So, why does it work in the ORDER BY clause and not in the WHERE clause? The reason is the same one we saw before in the ROW_NUMBER example. It bears repeating here: Window Functions aren’t computed until after any and all WHERE, HAVING, and GROUP BY clauses are finished. Once they are, the function will run, and only then will the rnk column exist, so with the way our query is written, only the ORDER BY clause can see it.
To solve our problem, we want the WHERE clause to be able to see the rnk column. So, we need to somehow force the RANK function to run prior to our WHERE clause. A simple way to do this is to use a derived table by taking our original query, everything from the initial SELECT down to just before the WHERE clause, and stuff it all into a simple SELECT like so:
SELECT * FROM (
<original_query>
) AS ranks
For the AS ranks part, we could have used any name, since we don’t use or refer to our derived table anywhere else. In this case, the name ranks seemed logical enough.
After adding our derived table wrapper, here’s our final query:
SELECT * FROM (
  SELECT
    RANK() OVER (
      PARTITION BY employees.office
      ORDER BY commissions.commission_amount DESC
    ) AS rnk,
    commissions.commission_date AS date,
    commissions.commission_id AS cid,
    employees.name AS salesperson,
    employees.office AS office,
    commissions.commission_amount AS amount
  FROM commissions LEFT JOIN employees
    ON (commissions.salesperson_id = employees.id)
) AS ranks
WHERE rnk <= 2
ORDER BY office,rnk;
Now that our main query is operating as a derived table, the WHERE clause is able to see the rnk column, and our output is as follows:
+-----+------------+-------+------------------+-------------+--------+
| rnk | date       | cid   | salesperson      | office      | amount |
+-----+------------+-------+------------------+-------------+--------+
|   1 | 2017-09-22 | 17776 | Jack Green       | Chicago     | 497.83 |
|   2 | 2017-05-12 | 17215 | Deborah Peterson | Chicago     | 497.29 |
|   1 | 2017-06-15 | 17340 | Alonzo Page      | Cleveland   | 499.90 |
|   2 | 2017-05-25 | 17261 | Alonzo Page      | Cleveland   | 499.29 |
|   1 | 2017-03-22 | 16987 | Rene Gibbs       | Dallas      | 499.55 |
|   2 | 2017-09-18 | 17756 | Kathryn Barnes   | Dallas      | 499.43 |
|   1 | 2017-01-16 | 16688 | Joyce Beck       | Memphis     | 499.41 |
|   2 | 2017-03-02 | 16890 | Tammy Castro     | Memphis     | 497.58 |
|   1 | 2017-12-11 | 18116 | Terrance Reese   | Minneapolis | 499.31 |
|   2 | 2017-04-05 | 17047 | Ruby Boyd        | Minneapolis | 495.98 |
|   1 | 2017-08-30 | 17674 | Dorothy Anderson | Nauvoo      | 497.91 |
|   2 | 2017-12-11 | 18115 | Dorothy Anderson | Nauvoo      | 468.99 |
|   1 | 2017-01-20 | 16706 | John Conner      | Raleigh     | 499.33 |
|   2 | 2016-08-26 | 16082 | Randal Hogan     | Raleigh     | 499.07 |
|   1 | 2017-11-03 | 17970 | Christina Terry  | Wichita     | 499.97 |
|   2 | 2016-10-31 | 16345 | Christina Terry  | Wichita     | 497.83 |
+-----+------------+-------+------------------+-------------+--------+
The only task left now is to try to find out how best to reward these hardworking salespeople. Maybe a gift certificate?

Summary

In this chapter, we explored how <partition_definition>, <order_definition>, and <frame_definition> sections are often used in the OVER clause. We also got to see the ROW_NUMBER and RANK Window Functions in practice, including how to work around some common issues that arise when using them and other Window Functions.
In the next chapter, we’ll continue our exploration of Window Functions with a deeper dive into using them to parse and generate graphs of real-world time-series temperature data and analyze fruit sales across a chain of stores.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset