The EXPLAIN
statement is the main tool to understand how a statement is executed within a server. In MariaDB 10, this works not only with the SELECT
statements, but also with the UPDATE
and DELETE
statements. The syntax of the EXPLAIN
statement is:
EXPLAIN [EXTENDED] <statement>;
The EXTENDED
clause adds a column to the output and generates a note (which can be seen with the SHOW WARNINGS
command) containing statement
as it has been internally rewritten by the optimizer.
After MariaDB 10, another property of this command was added, shown as follows:
SHOW EXPLAIN FOR <thread_id>;
This command allows us to obtain the execution plan from a running statement. This is useful when a statement is taking a lot of time and we want to know the reason. To see the running queries and related thread IDs, SHOW PROCESSLIST
can be used.
For example, to check how a query is executed and then see how it is rewritten, we can run the following code:
MariaDB [test]> EXPLAIN EXTENDED SELECT a, b FROM t WHERE a = 1 ORDER BY a DESC G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t type: ref possible_keys: idx_a key: idx_a key_len: 5 ref: const rows: 1165569 filtered: 100.00 Extra: 1 row in set, 1 warning (0.00 sec) MariaDB [test]> SHOW WARNINGS G *************************** 1. row *************************** Level: Note Code: 1003 Message: select 'test'.'t'.'a' AS 'a','test'.'t'.'b' AS 'b' from 'test'.'t' where ('test'.'t'.'a' = 1) order by 'test'.'t'.'a' desc 1 row in set (0.00 sec) To see which queries are running, and then check how one of them is executed: MariaDB [(none)]> SHOW PROCESSLIST G *************************** 1. row *************************** Id: 12 User: root Host: localhost db: NULL Command: Query Time: 0 State: Table lock Info: SHOW PROCESSLIST Progress: 0.000 *************************** 2. row *************************** Id: 37 User: root Host: localhost db: test Command: Query Time: 1 State: Sending data Info: SELECT a, b FROM t WHERE a = 1 ORDER BY a DESC Progress: 0.000 3 rows in set (0.00 sec) MariaDB [(none)]> SHOW EXPLAIN FOR 37 G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t type: ref possible_keys: idx_a key: idx_a key_len: 5 ref: const rows: 1165569 Extra: 1 row in set, 1 warning (0.00 sec) MariaDB [(none)]> SHOW WARNINGS; +-------+------+------------------------------------------------+ | Level | Code | Message | +-------+------+------------------------------------------------+ | Note | 1003 | SELECT a, b FROM t WHERE a = 1 ORDER BY a DESC | +-------+------+------------------------------------------------+ 1 row in set (0.00 sec)
If the statement execution ends after we send the SHOW EXPLAIN
command, we will see an error similar to the following:
MariaDB [(none)]> SHOW EXPLAIN FOR 37 G ERROR 1933 (HY000): Target is not running an EXPLAINable command
This is worth mentioning because it may be confusing. The SELECT
, UPDATE
, and DELETE
statements are always EXPLAINable
. The INSERT
statements do not cause this error, but they do not produce a useful result.
In this section, we will analyze the output of the EXPLAIN
command. In the meantime, we will also discuss how some queries are executed, and the most important optimization strategies used by MariaDB.
For the JOIN
and UNION
queries, the output of EXPLAIN
contains one row for each simple SELECT
or UNION
operation. It consists of the following columns:
id
: This column defines a unique identifier for each row.select_type
: This column defines the type of the SELECT
command.table
: This column defines the table read by the SELECT
command.partitions
: This column defines a list of partitions that will be accessed.type
: This column defines the type of JOIN
.possible_key
: This column defines a list of keys that can be used to execute a statement.key
: This column defines the key that the optimizer decides to use. If this value is NULL
, a full table scan is done.key_len
: This column defines the length (size) of the selected key, in bytes.ref
: This column defines the columns used to join two tables.rows
: This column gives an estimate of how many rows will be examined.filtered
: This column gives an estimate of the percentage of rows that will not be returned (shown only with EXPLAIN EXTENDED
).Extra
: This column defines some additional information.Now, let's see how to interpret these columns in some practical cases.
Let's start our series of practical examples with a simple SELECT
statement that involves only one table. Here is the table that we will use:
CREATE TABLE 'user' ( 'email' char(100) NOT NULL, 'username' char(20) NOT NULL, 'password_md5' binary(32) NOT NULL, 'first_name' char(30) NOT NULL, 'last_name' char(30) NOT NULL, 'birth_date' date DEFAULT NULL, 'id' smallint(6) NOT NULL AUTO_INCREMENT, 'sex' char(1) NOT NULL, PRIMARY KEY ('id'), UNIQUE KEY 'email' ('email'), UNIQUE KEY 'username' ('username'), KEY 'idx_birth' ('birth_date'), KEY 'idx_birth_sex' ('birth_date','sex') ) ENGINE=InnoDB
After creating the table, we will run a simple query shown as follows:
MariaDB [test]> EXPLAIN SELECT birth_date, sex, COUNT(*) FROM user GROUP BY birth_date, sexG *************************** 1. row *************************** id: 1 select_type: SIMPLE table: user type: index possible_keys: NULL key: idx_birth_sex key_len: 7 ref: NULL rows: 6 Extra: Using index 1 row in set (0.00 sec)
The value of type
is index
, which means that an index is used to execute the query. The used index is idx_birth_sex
, as showed in the key
column. The possible_keys
column tells us that there were no alternatives. The used index contains all the columns that need to be read to execute the query, so it is a covering index; this is what the Extra
column tells us.
The idx_birth
index was not considered useful here, because it does not contain all the columns referenced in the GROUP BY
column. Since all the data contained in idx_birth
is also contained in idx_birth_sex
, we should probably drop idx_birth_date
. Remember that each index causes an overhead for the write
operations and for the query optimization stage; so, they should not exist if they are useless.
Sometimes, MariaDB needs to transparently create an internal temporary table to execute a query. If data needs to be read more than once, copying it into a temporary table allows the release of table locks immediately after the copy operation. However, if the data to be copied is too big, this can cost a lot, which MariaDB tries to avoid.
Internal temporary tables are used in the following cases:
TEMPTABLE
algorithmUNION
operationsORDER BY
and GROUP BY
clauses are specified and they are not equalJOIN
clause is used and the ORDER BY
or GROUP BY
clause contains columns that are not located in the first read tableDISTINCT
and ORDER BY
are presentIf a temporary table is used, the Extra
column of the EXPLAIN
output contains Using filesort
or Using temporary
.
By default, temporary tables use the Aria storage engine; this usually speeds up the GROUP BY
and DISTINCT
operations. If the compile option of aria_used_for_temp_tables
is set to 0
, the temporary tables use the MyISAM storage engine. We may want to do so if an Aria bug affects us, or if this storage engine is not efficient enough for some of our queries. But these cases will be rare.
Usually, temporary tables are stored in memory. If its size exceeds the tmp_table_size
or max_heap_table_size
server variables, the temporary table is written on disk. There are also some cases when the internal temporary tables are always stored on disk, as follows:
TEXT
or BLOB
column is read.GROUP BY
or DISTINCT
clause contains a column larger than 512 bytes (not characters). This also applies to UNION DISTINCT
.UNION ALL
operations.The
UNION
queries are not very different from the simple SELECT
commands that we discussed earlier. Each SELECT
statement in the UNION
query is optimized independently. Let's see how the UNION
queries are explained, as shown in the following code:
MariaDB [test]> EXPLAIN (SELECT first_name, last_name, birth_date FROM user WHERE last_name LIKE 'A%') UNION (SELECT first_name, last_name, birth_date FROM user WHERE last_name LIKE 'C%'), +------+--------------+------------+-------+---------------+----------+---------+------+------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------+------------+-------+---------------+----------+---------+------+------+-----------------------+ | 1 | PRIMARY | user | range | idx_last | idx_last | 90 | NULL | 1 | Using index condition | | 2 | UNION | user | range | idx_last | idx_last | 90 | NULL | 1 | Using index condition | | NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | | +------+--------------+------------+-------+---------------+----------+---------+------+------+-----------------------+ 3 rows in set (0.00 sec)
We see three records in this output. The first row is of type PRIMARY
; this is the first executed SELECT
command. Then, we have a row of type UNION
; this type is used for all the subsequent queries. The last row is of type UNION RESULT
; this refers to the UNION
operation.
MariaDB can use several index access methods. One of them is called range. It is used to extract intervals of values from an index. The following two examples use the range access method:
MariaDB [test]> EXPLAIN SELECT first_name, last_name, birth_date FROM user WHERE birth_date BETWEEN '1994-01-01' AND '1994-12-31' G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: user type: range possible_keys: idx_birth,idx_birth_sex key: idx_birth key_len: 4 ref: NULL rows: 2 Extra: Using index condition 1 row in set (0.00 sec) MariaDB [test]> EXPLAIN SELECT first_name, last_name, birth_date FROM user WHERE last_name LIKE 'B%' G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: user type: range possible_keys: idx_last key: idx_last key_len: 90 ref: NULL rows: 2 Extra: Using index condition 1 row in set (0.00 sec)
As we can see, the
type
column tells us if the range method is used.
The HASH
indexes allow the range access method too, but each range of values can only consist of a single value, and the comparison must be expressed with the =
or IN
operators.
Another index access method is index_merge
. It is used to scan ranges of values from more than one index. When this method is used, index_merge
is displayed in the type
column.
When performing a JOIN
clause between two or more tables, the order in which the tables are read is often important. For example, if a table contains 10,000 rows, and another table consists of only 1,000 rows, it is preferable that the smaller table be read first. If the WHERE
clause excludes some rows, the matching rows on the greater table will not be searched. The order in which tables are read is the same in which they appear in the output of EXPLAIN
.
Engine-independent statistics can help the optimizer to choose the best order. MariaDB stores histograms representing the distribution of values of indexed and nonindexed columns. Thus, sometimes, the optimizer knows how much a certain comparison is selective; for example, qty=1
may be much more selective than qty=100
, and the histograms are stored to detect this.
It is also possible to force MariaDB to read the tables in the same order they appear in a JOIN
clause by using the STRAIGHT_JOIN
clause.
In relational theory, a semi-join is a SELECT
statement that contains a subquery in the WHERE
clause. Usually, such subqueries are better executed as JOIN
statements, which extract only columns from a table. For example, if we want to know the average of the price of products in a certain category, the following syntax is probably the most intuitive:
SELECT AVG(price) FROM product WHERE cat_id = (SELECT id FROM category WHERE id = 42);
The same result can be obtained with a JOIN
query:
SELECT AVG(p.price) FROM product p LEFT JOIN category c ON p.cat_id = c.id WHERE c.id = 42;
A MariaDB optimization known as table pullout consists of translating semijoins into a complete JOIN
. If this optimization is not applied, the query can be very slow. Table pullout was not supported in MariaDB versions older than 5.3. In modern MariaDB versions, if a subquery is slow, we should check whether the optimizer rewrites it as a JOIN
. If this does not happen, we should try to do it ourselves.
Another important optimization is the
FirstMatch strategy. It is used for IN
subqueries, where the subquery execution may stop as soon as a record is found. This optimization is used when the EXPLAIN
command shows FirstMatch(tableNumber)
in the Extra
column.
LooseScan is a strategy used for particular IN
subqueries, used to read groups of records from a table. For example:
SELECT * FROM Country WHERE Country.code in (SELECT country_code FROM Satellite);
This subquery retrieves all the values of country_code
from Satellite
, many of which are probably identical. This optimization consists of grouping such values, and joining them with the left table. This avoids the production of duplicate record combinations. If this strategy is applied, the Extra
column of the EXPLAIN
output contains LooseScan
.
Older versions of MariaDB as well as MySQL are used to materialize all the derived tables, that is, the subqueries used in a FROM
clause. This may be very slow, especially if the materialized data is large. This is generally avoided in MariaDB whenever possible. In this case, the query is transformed to read the same data without using any derived tables. If the optimization is not possible, the Extra
column of the EXPLAIN
output shows Using temporary
as a result. Even in this case, another optimization is still possible in recent versions of MariaDB by adding a key to the materialized table.