© Daniel Bartholomew 2017
Daniel BartholomewMariaDB and MySQL Common Table Expressions and Window Functions Revealedhttps://doi.org/10.1007/978-1-4842-3120-3_1

1. Basics of Common Table Expressions

Daniel Bartholomew
(1)
Raleigh, North Carolina, USA
 
Common Table Expressions (CTEs) are one of the new SQL features introduced in MariaDB 10.2 and MySQL 8.0. This chapter will introduce CTEs, describe the two types, and explain the basic syntax. CTEs are named temporary result sets that only last for the duration of the query they are in. In some respects, they are similar to derived tables, but they are more powerful. They can refer to themselves recursively and can be referenced multiple times in the same query. They also enable column grouping and can be used as an alternative to views without our needing the CREATE VIEW permission. CTEs were first introduced as part of the SQL99 standard.

Before We Begin

Before we get into more detail on what CTEs are and what they can do, the examples in this chapter utilize sample data you can use to follow along with the text and experiment with CTEs yourself. The table used in this chapter is called employees and it can be created with the following query:
CREATE TABLE employees (
  id serial primary key,
  name VARCHAR(150) NOT NULL,
  title VARCHAR(100),
  office VARCHAR(100)
);
The data itself is in a CSV file called bartholomew-ch01.csv . It can be loaded with a query similar to the following (assuming the file is on the computer running MariaDB or MySQL server in the /tmp/ folder):
LOAD DATA INFILE '/tmp/bartholomew-ch01.csv'
  INTO TABLE employees
  FIELDS TERMINATED BY ','
  OPTIONALLY ENCLOSED BY '"';
If you are using MySQL 8.0 , then the secure_file_priv setting is on by default. In this case, you will either need to move the file to the location specified in your config or turn off the setting in your my.cnf or my.ini file.
On Linux, the default location for secure_file_priv is /var/lib/mysql-files/, and on Windows it is C:ProgramDataMySQLMySQLServer 8.0Uploads; you’ll need to move the files to that location before running the LOAD DATA command and then modify the command to point at that location instead of at the /tmp/ folder.
You can find out what your local MySQL installation’s secure_file_priv setting is with the following command:
SHOW VARIABLES LIKE 'secure_file_priv';
We’re now ready to begin.
Tip
When using the MySQL command-line client on Windows, you can use Linux-style paths with the LOAD DATA command. It’s also worth mentioning that if you instead choose to use Windows-style paths, you will need to use double backslashes (\) because the backslash character is used to escape other characters. For example, the following are equivalent:
LOAD DATA INFILE '/ProgramData/MySQL/MySQL Server 8.0/Uploads/file.csv'
LOAD DATA INFILE 'C:\ProgramData\MySQL\MySQL Server 8.0\Uploads\file.csv'

What Are Common Table Expressions?

Common Table Expressions are commonly referred to as CTEs. Think of them as the result of a query that has a name you can refer to later on in your query. If a named result set sounds a little like a view or a derived table, that’s because it is, but with some significant differences. We’ll get to those in a second. First, what does a CTE look like?

Basic CTE Syntax

The general syntax for a Common Table Expression is as follows:
WITH <cte_name> AS (
  <cte_body>
)
<cte_query>
The WITH and AS keywords are what distinguish a CTE from a normal query. If you see a query that begins with WITH ... AS then you are looking at a CTE. The parts in angle brackets <> are what you provide. Let’s go over the different parts now:
  • <cte_name> is the name we will use to refer to the CTE later in our query; it can be any valid name, i.e., not a reserved word or function name.
  • <cte_body> is just a SELECT statement that produces a result. This part is wrapped in parentheses ().
  • <cte_query> is where we reference the <cte_name> in a SQL query. For example:
    SELECT <select_criteria> FROM <cte_name> [WHERE ...]
  • <select _criteria> is your normal SELECT query criteria with optional WHERE and other clauses.
This might be a little difficult to visualize, so here’s a basic valid CTE where we define a single <cte_name> and then SELECT from it. This CTE uses the sample data we loaded at the beginning of the chapter, so feel free to run it on your MariaDB or MySQL server.
WITH emp_raleigh AS (
  SELECT * FROM employees
    WHERE office='Raleigh'
)
SELECT * FROM emp_raleigh
  WHERE title != 'salesperson'
  ORDER BY title;
Let’s break down what is happening here. Our <cte_name> is emp_raleigh, and it is a simple SELECT statement that selects every row in the employees table WHERE the office is Raleigh. You can think of this CTE as a view or filter of the employees table. Then, in the <cte_query> section, we use the <cte_name> as part of a simple query that looks for every entry where the employee is not a salesperson, and lastly it orders the results by their job title. Because the <cte_query> uses our emp_raleigh <cte_name>, our results will only come from records in the employees table WHERE office='Raleigh'.
Using our sample data, the result is:
+-----+-----------------+------------+---------+
| id  | name            | title      | office  |
+-----+-----------------+------------+---------+
|  73 | Mark Hamilton   | dba        | Raleigh |
|  77 | Nancy Porter    | dba        | Raleigh |
| 135 | Pauline Neal    | dba        | Raleigh |
|  68 | Edmund Hines    | manager    | Raleigh |
|  28 | Marc Greene     | programmer | Raleigh |
|  96 | Mary Walker     | programmer | Raleigh |
| 100 | Freida Duchesne | programmer | Raleigh |
+-----+-----------------+------------+---------+
So, apart from any salespeople there might be, our Raleigh office looks to be quite technical , just DBAs and programmers, apart from a single manager.

The Motivation for CTEs

The result from our simple example looks a lot like something you might use a derived table (AKA an inline view) or a view to get, both of which have existed in MariaDB and MySQL for years. Why would anyone ever want to use CTEs instead of the more familiar views or derived tables? Here are some reasons.

Temporary

First off, CTEs are temporary. A CTE is defined and used in the same query. A view, on the other hand, is more permanent and can be thought of as a somewhat permanent virtual table. This temporary nature of CTEs can be a good thing. Because the CTE and the query that uses it are all defined together, modifying it to keep up with updated business requirements is easy. Contrast that with a view, which needs to be updated separately from the queries that use it.
This temporary nature is part of the reason why derived tables are so popular; they let you quickly generate a useful temporary result set that you can perform operations on. CTEs build upon that usefulness with a more powerful set of features.

Readable

One big reason to use CTEs is because they are often more readable. Complex views or nested derived table queries can be hard for mere mortals to parse, often requiring reading the query inside-out, back-to-front, or some other unnatural order. Contrast that with CTEs, which generally can be read from top to bottom.
For example, here’s our simple CTE example rewritten as a derived table:
SELECT * FROM (
  SELECT * FROM employees
    WHERE office='Raleigh'
) AS emp_raleigh
WHERE title != 'salesperson'
  ORDER BY title;
And so you don’t have to go back and find it, here is the CTE version again:
WITH emp_raleigh AS (
  SELECT * FROM employees
    WHERE office='Raleigh'
)
SELECT * FROM emp_raleigh
  WHERE title != 'salesperson'
  ORDER BY title;
Unsurprisingly, the output of both queries, using our sample data, is the same. However, when you compare them, the CTE can be understood by simply reading it from beginning to end. To understand the derived table, on the other hand, you need to first read the inner SELECT statement and then jump up to the outer SELECT statement , and then back down to the end. On a simple example like this, the extra difficulty compared to the CTE is minimal, but as a derived table query becomes more complex, the difficulty in reading it goes up exponentially. For a CTE, however, the difficulty goes up in a more linear fashion because you can always just read from beginning to end, naturally.

Using in One or Many Places

Following on from the previous section, another reason to use CTEs is if you need something complex for just one query as opposed to something that will be used many times in many different queries. For example, if your underlying sales table stores invoice dates using a Unix timestamp , but several of your applications expect YYYY-MM-DD whenever they query the table, a view would be an excellent solution; just define the view and have your applications call that. On the other hand, a complex view only used once in a single application might be more maintainable if rewritten as an easier-to-read CTE.

Permissions

When working on your own personal databases on your own workstation or server, your database user generally has the ALL PRIVILEGES WITH GRANT OPTION permissions, which means you can do anything you need or want to your tables and databases, including CREATE, UPDATE, INSERT, DELETE, and so on. Or, you might regularly just log in as the root database user, which automatically has all permissions. Databases used in production, however, generally have more granular access defined. Some users are only able to SELECT from tables in certain databases, while others can make inserts in some tables but not in others, and yet other users are given more or fewer grants depending on their various job functions.
You may find yourself in need of something like a view on a table that you do not have the CREATE VIEW permission on. CTEs only require the SELECT permission, so in this case using a CTE is a great way to get what you need without having to pester one of the DBAs to either create the view you need for you or asking them to grant you the CREATE VIEW permission on the table you need it on, which they may be unable to do because of company policies.

Nesting

CTEs bring several new tricks to our DBA toolbox, one of which is that in each individual <cte_body> we can refer to other CTEs. This solves a big problem with nested derived tables where every level of nesting  greatly increases the complexity.
For example, let’s expand upon our simple CTE example and drill further into our data by defining a second <cte_name> with its accompanying <cte_body> that selects just the DBAs in the Raleigh office:
WITH emp_raleigh AS (
  SELECT * FROM employees
    WHERE office='Raleigh'
),
emp_raleigh_dbas AS (
  SELECT * from emp_raleigh
    WHERE title='dba'
)
SELECT * FROM emp_raleigh_dbas;
Looking at this code, we have our original <cte_name>, emp_ raleigh , and its <cte_body>. We then define a second <cte_name>, emp_raleigh_dbas, and its <cte_body>. emp_raleigh_dbas builds upon emp_raleigh by only looking for records WHERE title='dba'. Lastly, in the <cte_query> section we SELECT everything from emp_raleigh_dbas. Syntactically, this is much more readable than the equivalent query written using a nested derived table.
Using our sample data, the result is as follows:
+-----+---------------+-------+---------+
| id  | name          | title | office  |
+-----+---------------+-------+---------+
|  73 | Mark Hamilton | dba   | Raleigh |
|  77 | Nancy Porter  | dba   | Raleigh |
| 135 | Pauline Neal  | dba   | Raleigh |
+-----+---------------+-------+---------+

Multiplexing

Building upon the ability to define multiple <cte_name>s in the same query with their corresponding <cte_body>s, we have the ability to refer to a given <cte_name> multiple times, either in a following <cte_body> section or in the <cte_query> section. As an example of referring to a single <cte_name> multiple times, here is an anti-self-join that looks for DBAs that are the only DBA at their particular office:
WITH dbas AS (
  SELECT * FROM employees
    WHERE title='dba'
)
SELECT * FROM dbas A1
  WHERE NOT EXISTS (
    SELECT 1 FROM dbas A2
      WHERE
        A2.office=A1.office
      AND
        A2.name <> A1.name
  );
Here, our dbas <cte_name> simply selects all of the DBAs in the company, then in our ending <cte_query> we refer to dbas two times in order to filter out all DBAs except those we are interested in.
Using our sample data, the result is as follows:
+----+---------------+-------+---------+
| id | name          | title | office  |
+----+---------------+-------+---------+
|  6 | Toby Lucas    | dba   | Wichita |
| 16 | Susan Charles | dba   | Nauvoo  |
+----+---------------+-------+---------+
I think management should make sure Toby and Susan visit other offices in the company a few times every year, just so they don’t feel isolated from the other DBAs in the company.

Recursion

The last reason why CTEs are so useful is that they can be recursive. Within their own <cte_body> they can call themselves. This technique provides a lot of power. But let’s not get ahead of ourselves—there’s a whole chapter devoted to these types of CTE queries, so we won’t talk about it more here.

Summary

In this chapter, we covered the basic syntax of CTEs as well as some of the reasons why this feature was added to the SQL standard and is now being added to MariaDB and MySQL. We also went through a couple of simple examples of non-recursive CTEs.
We’ll dive deeper into non-recursive CTEs in the next chapter, then look at recursive CTEs in Chapter 3.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset