Working with the EXPLAIN statement

The EXPLAIN statement is the main tool to understand how a statement is executed within a server. In MariaDB 10, this works not only with the SELECT statements, but also with the UPDATE and DELETE statements. The syntax of the EXPLAIN statement is:

EXPLAIN [EXTENDED] <statement>;

The EXTENDED clause adds a column to the output and generates a note (which can be seen with the SHOW WARNINGS command) containing statement as it has been internally rewritten by the optimizer.

After MariaDB 10, another property of this command was added, shown as follows:

SHOW EXPLAIN FOR <thread_id>;

This command allows us to obtain the execution plan from a running statement. This is useful when a statement is taking a lot of time and we want to know the reason. To see the running queries and related thread IDs, SHOW PROCESSLIST can be used.

For example, to check how a query is executed and then see how it is rewritten, we can run the following code:

MariaDB [test]> EXPLAIN EXTENDED SELECT a, b FROM t WHERE a = 1 ORDER BY a DESC G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
         type: ref
possible_keys: idx_a
          key: idx_a
      key_len: 5
          ref: const
         rows: 1165569
     filtered: 100.00
        Extra:
1 row in set, 1 warning (0.00 sec)

MariaDB [test]> SHOW WARNINGS G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: select 'test'.'t'.'a' AS 'a','test'.'t'.'b' AS 'b' from 'test'.'t' where ('test'.'t'.'a' = 1) order by 'test'.'t'.'a' desc
1 row in set (0.00 sec)
To see which queries are running, and then check how one of them is executed:
MariaDB [(none)]> SHOW PROCESSLIST G
*************************** 1. row ***************************
      Id: 12
    User: root
    Host: localhost
      db: NULL
 Command: Query
    Time: 0
   State: Table lock
    Info: SHOW PROCESSLIST
Progress: 0.000
*************************** 2. row ***************************
      Id: 37
    User: root
    Host: localhost
      db: test
 Command: Query
    Time: 1
   State: Sending data
    Info: SELECT a, b FROM t WHERE a = 1 ORDER BY a DESC
Progress: 0.000
3 rows in set (0.00 sec)

MariaDB [(none)]> SHOW EXPLAIN FOR 37 G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t
         type: ref
possible_keys: idx_a
          key: idx_a
      key_len: 5
          ref: const
         rows: 1165569
        Extra:
1 row in set, 1 warning (0.00 sec)

MariaDB [(none)]> SHOW WARNINGS;
+-------+------+------------------------------------------------+
| Level | Code | Message                                        |
+-------+------+------------------------------------------------+
| Note  | 1003 | SELECT a, b FROM t WHERE a = 1 ORDER BY a DESC |
+-------+------+------------------------------------------------+
1 row in set (0.00 sec)

If the statement execution ends after we send the SHOW EXPLAIN command, we will see an error similar to the following:

MariaDB [(none)]> SHOW EXPLAIN FOR 37 G
ERROR 1933 (HY000): Target is not running an EXPLAINable command

This is worth mentioning because it may be confusing. The SELECT, UPDATE, and DELETE statements are always EXPLAINable. The INSERT statements do not cause this error, but they do not produce a useful result.

Understanding the output of EXPLAIN

In this section, we will analyze the output of the EXPLAIN command. In the meantime, we will also discuss how some queries are executed, and the most important optimization strategies used by MariaDB.

For the JOIN and UNION queries, the output of EXPLAIN contains one row for each simple SELECT or UNION operation. It consists of the following columns:

  • id: This column defines a unique identifier for each row.
  • select_type: This column defines the type of the SELECT command.
  • table: This column defines the table read by the SELECT command.
  • partitions: This column defines a list of partitions that will be accessed.
  • type: This column defines the type of JOIN.
  • possible_key: This column defines a list of keys that can be used to execute a statement.
  • key: This column defines the key that the optimizer decides to use. If this value is NULL, a full table scan is done.
  • key_len: This column defines the length (size) of the selected key, in bytes.
  • ref: This column defines the columns used to join two tables.
  • rows: This column gives an estimate of how many rows will be examined.
  • filtered: This column gives an estimate of the percentage of rows that will not be returned (shown only with EXPLAIN EXTENDED).
  • Extra: This column defines some additional information.

Now, let's see how to interpret these columns in some practical cases.

Simple SELECT statements

Let's start our series of practical examples with a simple SELECT statement that involves only one table. Here is the table that we will use:

CREATE TABLE 'user' (
  'email' char(100) NOT NULL,
  'username' char(20) NOT NULL,
  'password_md5' binary(32) NOT NULL,
  'first_name' char(30) NOT NULL,
  'last_name' char(30) NOT NULL,
  'birth_date' date DEFAULT NULL,
  'id' smallint(6) NOT NULL AUTO_INCREMENT,
  'sex' char(1) NOT NULL,
  PRIMARY KEY ('id'),
  UNIQUE KEY 'email' ('email'),
  UNIQUE KEY 'username' ('username'),
  KEY 'idx_birth' ('birth_date'),
  KEY 'idx_birth_sex' ('birth_date','sex')
) ENGINE=InnoDB

After creating the table, we will run a simple query shown as follows:

MariaDB [test]> EXPLAIN SELECT birth_date, sex, COUNT(*) FROM user GROUP BY birth_date, sexG
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: user
         type: index
possible_keys: NULL
          key: idx_birth_sex
      key_len: 7
          ref: NULL
         rows: 6
        Extra: Using index
1 row in set (0.00 sec)

The value of type is index, which means that an index is used to execute the query. The used index is idx_birth_sex, as showed in the key column. The possible_keys column tells us that there were no alternatives. The used index contains all the columns that need to be read to execute the query, so it is a covering index; this is what the Extra column tells us.

The idx_birth index was not considered useful here, because it does not contain all the columns referenced in the GROUP BY column. Since all the data contained in idx_birth is also contained in idx_birth_sex, we should probably drop idx_birth_date. Remember that each index causes an overhead for the write operations and for the query optimization stage; so, they should not exist if they are useless.

Internal temporary tables or files

Sometimes, MariaDB needs to transparently create an internal temporary table to execute a query. If data needs to be read more than once, copying it into a temporary table allows the release of table locks immediately after the copy operation. However, if the data to be copied is too big, this can cost a lot, which MariaDB tries to avoid.

Internal temporary tables are used in the following cases:

  • For views that aggregate data, or are defined with the TEMPTABLE algorithm
  • For UNION operations
  • If both the ORDER BY and GROUP BY clauses are specified and they are not equal
  • If a JOIN clause is used and the ORDER BY or GROUP BY clause contains columns that are not located in the first read table
  • If both DISTINCT and ORDER BY are present
  • When a subquery or derived table needs to be materialized

If a temporary table is used, the Extra column of the EXPLAIN output contains Using filesort or Using temporary.

By default, temporary tables use the Aria storage engine; this usually speeds up the GROUP BY and DISTINCT operations. If the compile option of aria_used_for_temp_tables is set to 0, the temporary tables use the MyISAM storage engine. We may want to do so if an Aria bug affects us, or if this storage engine is not efficient enough for some of our queries. But these cases will be rare.

Usually, temporary tables are stored in memory. If its size exceeds the tmp_table_size or max_heap_table_size server variables, the temporary table is written on disk. There are also some cases when the internal temporary tables are always stored on disk, as follows:

  • When a TEXT or BLOB column is read.
  • When a GROUP BY or DISTINCT clause contains a column larger than 512 bytes (not characters). This also applies to UNION DISTINCT.
  • For UNION ALL operations.

The UNION queries

The UNION queries are not very different from the simple SELECT commands that we discussed earlier. Each SELECT statement in the UNION query is optimized independently. Let's see how the UNION queries are explained, as shown in the following code:

MariaDB [test]> EXPLAIN (SELECT first_name, last_name, birth_date FROM user WHERE last_name LIKE 'A%') UNION (SELECT first_name, last_name, birth_date FROM user WHERE last_name LIKE 'C%'),
+------+--------------+------------+-------+---------------+----------+---------+------+------+-----------------------+
| id   | select_type  | table      | type  | possible_keys | key      | key_len | ref  | rows | Extra                 |
+------+--------------+------------+-------+---------------+----------+---------+------+------+-----------------------+
|    1 | PRIMARY      | user       | range | idx_last      | idx_last | 90      | NULL |    1 | Using index condition |
|    2 | UNION        | user       | range | idx_last      | idx_last | 90      | NULL |    1 | Using index condition |
| NULL | UNION RESULT | <union1,2> | ALL   | NULL          | NULL     | NULL    | NULL | NULL |                       |
+------+--------------+------------+-------+---------------+----------+---------+------+------+-----------------------+
3 rows in set (0.00 sec)

We see three records in this output. The first row is of type PRIMARY; this is the first executed SELECT command. Then, we have a row of type UNION; this type is used for all the subsequent queries. The last row is of type UNION RESULT; this refers to the UNION operation.

Simple index access methods

MariaDB can use several index access methods. One of them is called range. It is used to extract intervals of values from an index. The following two examples use the range access method:

MariaDB [test]> EXPLAIN SELECT first_name, last_name, birth_date FROM user WHERE birth_date BETWEEN '1994-01-01' AND '1994-12-31' G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: user
         type: range
possible_keys: idx_birth,idx_birth_sex
          key: idx_birth
      key_len: 4
          ref: NULL
         rows: 2
        Extra: Using index condition
1 row in set (0.00 sec)
MariaDB [test]> EXPLAIN SELECT first_name, last_name, birth_date FROM user WHERE last_name LIKE 'B%' G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: user
         type: range
possible_keys: idx_last
          key: idx_last
      key_len: 90
          ref: NULL
         rows: 2
        Extra: Using index condition
1 row in set (0.00 sec)

As we can see, the type column tells us if the range method is used.

The HASH indexes allow the range access method too, but each range of values can only consist of a single value, and the comparison must be expressed with the = or IN operators.

Another index access method is index_merge. It is used to scan ranges of values from more than one index. When this method is used, index_merge is displayed in the type column.

Index optimizations of the JOIN clause

When performing a JOIN clause between two or more tables, the order in which the tables are read is often important. For example, if a table contains 10,000 rows, and another table consists of only 1,000 rows, it is preferable that the smaller table be read first. If the WHERE clause excludes some rows, the matching rows on the greater table will not be searched. The order in which tables are read is the same in which they appear in the output of EXPLAIN.

Engine-independent statistics can help the optimizer to choose the best order. MariaDB stores histograms representing the distribution of values of indexed and nonindexed columns. Thus, sometimes, the optimizer knows how much a certain comparison is selective; for example, qty=1 may be much more selective than qty=100, and the histograms are stored to detect this.

It is also possible to force MariaDB to read the tables in the same order they appear in a JOIN clause by using the STRAIGHT_JOIN clause.

Optimization of subqueries

In relational theory, a semi-join is a SELECT statement that contains a subquery in the WHERE clause. Usually, such subqueries are better executed as JOIN statements, which extract only columns from a table. For example, if we want to know the average of the price of products in a certain category, the following syntax is probably the most intuitive:

SELECT AVG(price) FROM product WHERE cat_id =
  (SELECT id FROM category WHERE id = 42);

The same result can be obtained with a JOIN query:

SELECT AVG(p.price)
  FROM product p
  LEFT JOIN category c
  ON p.cat_id = c.id
  WHERE c.id = 42;

A MariaDB optimization known as table pullout consists of translating semijoins into a complete JOIN. If this optimization is not applied, the query can be very slow. Table pullout was not supported in MariaDB versions older than 5.3. In modern MariaDB versions, if a subquery is slow, we should check whether the optimizer rewrites it as a JOIN. If this does not happen, we should try to do it ourselves.

Another important optimization is the FirstMatch strategy. It is used for IN subqueries, where the subquery execution may stop as soon as a record is found. This optimization is used when the EXPLAIN command shows FirstMatch(tableNumber) in the Extra column.

LooseScan is a strategy used for particular IN subqueries, used to read groups of records from a table. For example:

SELECT * FROM Country
WHERE Country.code in (SELECT country_code FROM Satellite);

This subquery retrieves all the values of country_code from Satellite, many of which are probably identical. This optimization consists of grouping such values, and joining them with the left table. This avoids the production of duplicate record combinations. If this strategy is applied, the Extra column of the EXPLAIN output contains LooseScan.

Older versions of MariaDB as well as MySQL are used to materialize all the derived tables, that is, the subqueries used in a FROM clause. This may be very slow, especially if the materialized data is large. This is generally avoided in MariaDB whenever possible. In this case, the query is transformed to read the same data without using any derived tables. If the optimization is not possible, the Extra column of the EXPLAIN output shows Using temporary as a result. Even in this case, another optimization is still possible in recent versions of MariaDB by adding a key to the materialized table.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset