© Daniel Bartholomew 2017
Daniel BartholomewMariaDB and MySQL Common Table Expressions and Window Functions Revealedhttps://doi.org/10.1007/978-1-4842-3120-3_2

2. Non-recursive Common Table Expressions

Daniel Bartholomew
(1)
Raleigh, North Carolina, USA
 
You already got a taste of non-recursive CTEs in the previous chapter. This chapter will expand upon the previous examples and show more of the things you can do with non-recursive CTEs. In this chapter, we’ll cover some common uses of CTEs and finish with how to convert existing queries that use subqueries into queries that use CTEs.

Before We Begin

As with the previous chapter, the examples in this chapter utilize sample data . In addition to the employees table we used previously, in this chapter we’ll use a table called commissions . This table can be created with the following query :
CREATE TABLE commissions (
  id serial primary key,
  salesperson_id BIGINT(20) NOT NULL,
  commission_id BIGINT(20) NOT NULL,
  commission_amount DECIMAL(12,2) NOT NULL,
  commission_date DATE NOT NULL
);
The data is in a CSV file called bartholomew-ch02.csv . It can be loaded with a query similar to the following (assuming the file is on the computer running MariaDB or MySQL server in the /tmp/ folder):
LOAD DATA INFILE '/tmp/bartholomew-ch02.csv'
  INTO TABLE commissions
  FIELDS TERMINATED BY ','
  OPTIONALLY ENCLOSED BY '"';
Note
See the “Before We Begin” section of Chapter 1 for extra information about loading the files on Windows and working around issues with secure_file_priv.
We’re now ready to begin.

Using CTEs for Year-over-Year Comparisons

In the previous chapter, we introduced the ability to refer to a CTE multiple times in a single query. Let’s explore a more substantial example.
One thing many companies like to track is how sales improve (or not) year-over-year. In our sample commissions table , we track the commissions each of the company’s 55 salespersons have earned and when they earned them. One day, the CEO comes to us and says that he wants to compare how the salespersons are doing from one year to the next. A fairly straightforward traditional SQL query can easily get us the data we want, grouped by salesperson and year:
SELECT
  salesperson_id,
  YEAR(commission_date) AS year,
  SUM(commission_amount) AS total
FROM
  commissions
GROUP BY
  salesperson_id, year;
Our sample data contains commissions data for the years 2016 and 2017, so this query gives us 110 rows—two rows for each of our 55 salespersons. Here’s the truncated result:
+----------------+------+----------+
| salesperson_id | year | total    |
+----------------+------+----------+
|              3 | 2016 | 2249.93  |
|              3 | 2017 | 3449.67  |
|              7 | 2016 | 1088.32  |
|              7 | 2017 | 3197.25  |
|              8 | 2016 | 4514.73  |
|              8 | 2017 | 5178.19  |
|             10 | 2016 | 9433.58  |
|             10 | 2017 | 8479.05  |
...
+----------------+------+----------+
We could call this complete and send the data to the CEO, but by using a CTE we can do much better.
Using our initial query as our <cte_body>, we can select from the data twice, using a WHERE clause to set up the condition to select a given year and the previous year together. Here’s what it could look like:
WITH commissions_year AS (
  SELECT
    salesperson_id,
    YEAR(commission_date) AS year,
    SUM(commission_amount) AS total
  FROM
    commissions
  GROUP BY
    salesperson_id, year
)
SELECT *
FROM
  commissions_year CUR,
  commissions_year PREV
WHERE
  CUR.salesperson_id=PREV.salesperson_id AND
  CUR.year=PREV.year + 1;
After setting up the CTE, which we called commissions_year , we selected from it twice—once as CUR and once as PREV. The WHERE clause is where we match the salesperson_id fields from both and set up the condition that we’re comparing a year with another year that is one more (+1) from it.
This time, the output looks like the following:
+----------------+------+----------+----------------+------+----------+
| salesperson_id | year | total    | salesperson_id | year | total    |
+----------------+------+----------+----------------+------+----------+
|              3 | 2017 | 3449.67  |              3 | 2016 | 2249.93  |
|              7 | 2017 | 3197.25  |              7 | 2016 | 1088.32  |
|              8 | 2017 | 5178.19  |              8 | 2016 | 4514.73  |
|             10 | 2017 | 8479.05  |             10 | 2016 | 9433.58  |
...
+----------------+------+----------+----------------+------+----------+
This presentation is better, but we could do without the duplicated salesperson_id columns , and while we’re at it, we should JOIN with the employees table to get the employee name in the output (something our CEO would appreciate).
We’ll add the JOIN to the FROM clause in our <cte_body>, then we’ll select just the columns we want from the <cte_query> section to be in the output. With those modifications , our complete CTE now looks like this:
WITH commissions_year AS (
  SELECT
    employees.id AS sp_id,
    employees.name AS salesperson,
    YEAR(commission_date) AS year,
    SUM(commission_amount) AS total
  FROM
    commissions LEFT JOIN employees
      ON commissions.salesperson_id = employees.id
  GROUP BY
    sp_id, year
)
SELECT CUR.sp_id, CUR.salesperson, PREV.year, PREV.total, CUR.year, CUR.total
FROM
  commissions_year CUR,
  commissions_year PREV
WHERE
  CUR.sp_id=PREV.sp_id AND
  CUR.year=PREV.year + 1;
And our output now looks like the following:
+-------+---------------------+------+----------+------+----------+
| sp_id | salesperson         | year | total    | year | total    |
+-------+---------------------+------+----------+------+----------+
|     3 | Evelyn Alexander    | 2016 | 2249.93  | 2017 | 3449.67  |
|     7 | John Conner         | 2016 | 1088.32  | 2017 | 3197.25  |
|     8 | Leo Gutierrez       | 2016 | 4514.73  | 2017 | 5178.19  |
|    10 | Ryan Fletcher       | 2016 | 9433.58  | 2017 | 8479.05  |
...
+-------+---------------------+------+----------+------+----------+
Now it is very easy for the CEO to see at a glance that while Evelyn, John, and Leo increased their commissions from 2016 to 2017, Ryan’s commissions are down by around $1,000 from 2016 to 2017. Maybe he needs some coaching from his manager?
And if the CEO comes back and wants a filtered list showing just those salespersons whose sales went down from year to year, we can simply add the following to the end of our <cte_query> section:
AND CUR.total < PREV.total;
An equivalent analytical query using derived tables would be much larger and not nearly as readable and concise.

Comparing Individuals Against Their Group

One annoying issue with using subqueries is when you have to copy and paste them multiple times in your query. These duplicated FROM (SELECT ...) statements are prime locations for errors, especially when something changes and you need to update every single one of them. CTEs provide a way to eliminate this duplication. A given <cte_body> is defined once and tied to a single <cte_name>. Whenever you need it, you just reference the <cte_name>, and if something needs to be updated, you just have to update the <cte_body> in one place.
Using the same base CTE as from the previous example, we can modify the SELECT statement after it to easily perform a different kind of analytical query, one that traditionally would have used duplicated FROM (SELECT...) statements. This time, instead of comparing salespersons to their own performance from one year to the next, we’ll compare them to all salespersons. In particular, in his next company-wide email, the CEO wants to give a shout-out to all of the salespersons who made at least 2 percent of the total commissions earned by all salespersons in the entire company during 2017.
WITH commissions_year AS (
  SELECT
    employees.id AS sp_id,
    employees.name AS salesperson,
    YEAR(commission_date) AS year,
    SUM(commission_amount) AS total
  FROM
    commissions LEFT JOIN employees
      ON commissions.salesperson_id = employees.id
  GROUP BY
    sp_id, year
)
SELECT *
FROM
  commissions_year C1
WHERE
  total > ( SELECT
              0.02*SUM(total)
            FROM
              commissions_year C2
            WHERE
              C2.year = C1.year
              AND C2.year = 2017)
ORDER BY
  total DESC;
The <cte_body> for commissions_ year is unchanged from our previous example, so there’s no need to go over it. The difference is all in the <cte_query> section. Our SELECT statement looks at individual totals that make up at least 2 percent of the total of all commissions for the 2017 year, then orders everything DESC so the top earner is on top.
Doing this sort of query in the traditional way would mean each FROM in the <cte_query> section would be a copy-pasted FROM (SELECT...) statement.
If you’ve been following along with the examples, the output of the preceding code looks like the following:
+-------+------------------+------+----------+
| sp_id | salesperson      | year | total    |
+-------+------------------+------+----------+
|   116 | Christian Reeves | 2017 | 13856.74 |
|    69 | Luis Vaughn      | 2017 | 12570.95 |
|   128 | Stephanie Dawson | 2017 | 12253.44 |
|    38 | Dorothy Anderson | 2017 | 12010.91 |
|    78 | Louis Santiago   | 2017 | 11423.48 |
|   131 | Rene Gibbs       | 2017 | 11147.38 |
|   121 | Christina Terry  | 2017 | 10979.07 |
|    53 | Jennifer Moore   | 2017 | 10967.64 |
|   114 | Veronica Boone   | 2017 | 10651.10 |
|    41 | Terrance Reese   | 2017 | 10219.33 |
|   132 | Alan Carroll     | 2017 | 10066.97 |
|    34 | Bobby French     | 2017 | 9928.69  |
|   105 | Alonzo Page      | 2017 | 9782.69  |
|    66 | Kathryn Barnes   | 2017 | 9433.56  |
|   106 | Bradley Black    | 2017 | 9387.77  |
|   118 | Deborah Peterson | 2017 | 9265.96  |
|    79 | Rafael Sandoval  | 2017 | 9055.54  |
+-------+------------------+------+----------+
Let’s give a round of applause to the members of the 2017 2 Percent Club !

Translating Subqueries into CTEs

Let’s switch gears and talk about the process for taking an existing query and translating it into a CTE. It is fairly simple. To illustrate this, here’s a query that is equivalent to our query from the previous example, but instead of using a CTE, it uses subqueries in the form of two identical FROM (SELECT...) statements:
SELECT *
FROM (
  SELECT
    employees.id AS sp_id,
    employees.name AS salesperson,
    YEAR(commission_date) AS year,
    SUM(commission_amount) AS total
  FROM
    commissions LEFT JOIN employees
      ON commissions.salesperson_id = employees.id
  GROUP BY
    sp_id, year
   ) AS C1
WHERE
  total > ( SELECT
              0.02*SUM(total)
            FROM (
              SELECT
                employees.id AS sp_id,
                employees.name AS salesperson,
                YEAR(commission_date) AS year,
                SUM(commission_amount) AS total
              FROM
                commissions LEFT JOIN employees
                  ON commissions.salesperson_id = employees.id
              GROUP BY
                sp_id, year
              ) AS C2
            WHERE
              C2.year = C1.year
              AND C2.year = 2017)
ORDER BY
  total DESC;
Because of the duplicated FROM (SELECT...) statements, this query is 33 lines long, as opposed to the 25-line CTE that does the same thing. An eight-line difference isn’t much, but more-complex queries could include five, seven, eleven, or even more duplicated subqueries, causing an almost exponential increase in the size of the query. This could quickly turn into a maintenance nightmare if something in the underlying table(s) changes and we need to update the query.
To convert a query with duplicated subqueries into a CTE, there are only three steps (four if you need to repeat the process for additional duplicated subqueries ):
  1. 1.
    Locate the first occurrence of the derived table query and copy it above the SELECT line, wrapping it in WITH <cte_name> AS (and then an ending).
     
  2. 2.
    Replace that first occurrence of the subquery with whatever we put as the <cte_name>.
     
  3. 3.
    Go through the rest of the query and find the additional identical subqueries and replace them with <cte_name> as well.
     
  4. 4.
    (Optional) Repeat the process for any other duplicated subqueries in your query, if any.
     
Not every query will be a perfect fit for this process, but many queries are.

Summary

In this chapter, we expanded upon the examples from the previous chapter and went through a couple more-substantial examples that illustrated how CTEs are superior to subqueries, particularly in how they help us avoid duplicated subqueries that are hard to modify and maintain. We then finished with a discussion of how to take a query that uses subqueries and convert it into a query that uses CTEs instead.
To round out the CTE section of this book, the next chapter will cover what I think is the most exciting part of CTEs—recursion.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset