If the partitioning expression and the partition type have been carefully selected, most queries will only involve one partition or a set of partitions.
In many cases, the optimizer will find out which partitions are not relevant for the current query. This optimization is called partition pruning.
Also, the user can use a SQL clause to specify the list of partitions that must be used. This is called partition selecting.
However, in MariaDB, queries are never parallelized. Even if the optimizer knows that two partitions must be read, and those partitions are on different disks, the same thread will read them sequentially. In particular, one can expect full table scans and full index scans on partitioned tables to run much faster because of parallelization, but this is not the case.
In MariaDB, partition pruning is possible with RANGE
and LIST
partitioning, but not with RANGE COLUMNS
or LIST COLUMNS
. When a statement references the columns used by the partitioning expression, the optimizer is usually able to determine whether one or more partitions can be excluded by the query plan. Often, it can exclude all partitions but one. In some cases, it can even detect an Impossible WHERE
and avoid executing the query at all.
The optimizer examines the WHERE
clause in the following statements to determine whether partition pruning can be applied:
SELECT
INSERT … SELECT
REPLACE … SELECT
DELETE
UPDATE
For the INSERT
statements, the optimizer examines the inserted values. For the REPLACE
statements, both the WHERE
clause and the new values are examined.
When the optimizer analyzes a WHERE
clause, it can use the following operators for pruning:
=
!=
<
>
<=
>=
BETWEEN
IN
If a partition only contains rows for which the partitioning expression returns NULL
values, IS NULL
and IS NOT NULL
can also be used for pruning.
To obtain a statement execution plan, the EXPLAIN
command can be used. We discussed this in Chapter 3, Optimizing Queries. Now that we are dealing with partition, we need to use an EXPLAIN
extension: the PARTITIONS
option. It adds the partitions' columns to the output of EXPLAIN
. This column contains a list of the partitions that will be used to execute the statement. This extension has been specifically added to check whether and how partition pruning is applied. We will use it in the following examples.
For the following examples, we will use the article table, with a RANGE
partitioning based on the column ID:
CREATE TABLE article ( id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT, date DATE NOT NULL, author VARCHAR(100), language TINYINT UNSIGNED, text TEXT, PRIMARY KEY (id, date) ) ENGINE = InnoDB PARTITION BY RANGE (id) ( PARTITION p0 VALUES LESS THAN (5000), PARTITION p1 VALUES LESS THAN (10000), PARTITION p2 VALUES LESS THAN (15000), PARTITION p3 VALUES LESS THAN (20000) );
Queries that contain a WHERE
condition based on the ID column are very likely to take advantage of partition pruning. Consider the following example:
MariaDB [test]> EXPLAIN PARTITIONS -> SELECT * FROM article WHERE id > 5000 G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: article partitions: p1,p2,p3 … 1 row in set (0.00 sec)
Only the relevant columns are shown here. Since the first partition only contains VALUES LESS THAN (5000)
, the optimizer understands that it cannot contain any relevant row. Thus, p0
is not accessed, which is shown as follows:
MariaDB [test]> EXPLAIN PARTITIONS -> SELECT * FROM article WHERE id BETWEEN 8000 AND 13000 G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: article partitions: p1,p2 … 1 row in set (0.00 sec)
This technique works perfectly with a range of values that are spread over multiple partitions, as shown in this example where p0
and p4
are not accessed:
MariaDB [test]> EXPLAIN PARTITIONS -> SELECT * FROM article WHERE id = 11000 G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: article partitions: p2 … 1 row in set (0.00 sec)
Queries that retrieve a single row, or at least search for a single value, should always be able to only access one partition. For common cases, this is the best case, shown as follows:
MariaDB [test]> EXPLAIN PARTITIONS -> SELECT * FROM article WHERE id = 9999999 G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: NULL partitions: NULL … Extra: Impossible WHERE noticed after reading const tables 1 row in set (0.00 sec)
In this case, no partition can contain the value we are looking for. As clearly stated in the Extra
column, an impossible WHERE
was detected after examining the partitions' definitions. Thus, the query is not executed at all, which is shown as follows:
MariaDB [test]> EXPLAIN PARTITIONS -> SELECT * FROM article WHERE id = 18000 OR id < 10000 G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: article partitions: p0,p1,p3 … 1 row in set (0.00 sec)
This example merely shows that partition pruning also works when more value ranges are queried, even if different operators are used.
We will try some queries on LIST
partitioned tables too, just to demonstrate that partition pruning works with this partitioning type. The table definition we will use is the following:
CREATE TABLE article ( id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT, date DATE NOT NULL, author VARCHAR(100), language TINYINT UNSIGNED, text TEXT, PRIMARY KEY (id, language) ) ENGINE = InnoDB PARTITION BY LIST (language) ( PARTITION pn VALUES IN (NULL), PARTITION p0 VALUES IN (1), PARTITION p1 VALUES IN (2,3), PARTITION p2 VALUES IN (4,5,6,7), PARTITION p3 VALUES IN (8,9,10,11) );
Now, let's try some queries, shown as follows:
MariaDB [test]> EXPLAIN PARTITIONS -> SELECT * FROM article WHERE language = 1 G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: article partitions: p0 … 1 row in set (0.00 sec)
Only the first query contains the value 1
, and the optimizer prunes other partitions away:
MariaDB [test]> EXPLAIN PARTITIONS -> SELECT * FROM article WHERE language IN (1, 10) G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: article partitions: p0,p3 … 1 row in set (0.00 sec)
This time we queried the table for noncontiguous values that cannot be stored on contiguous partitions, shown as follows:
MariaDB [test]> EXPLAIN PARTITIONS -> SELECT * FROM article WHERE language BETWEEN 2 AND 5 G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: article partitions: p1,p2 … 1 row in set (0.00 sec)
Looking for a range of values across multiple partitions also works as expected.
With partition pruning, the optimizer automatically determines which partitions must be accessed to execute a statement. MariaDB 10.0 also provides a way for the user to explicitly declare which partitions must be involved in the query: the PARTITION
clause.
This clause can be specified after any table name, in any of the following SQL statements:
SELECT, including JOIN and UNION queries; CREATE TABLE … SELECT; INSERT … SELECT; INSERT; UPDATE; DELETE; REPLACE; LOAD DATA INFILE; LOAD XML INFILE;
The syntax of the PARTITION
clause is the following:
<table_name> PARTITION (<partition_list>)
The partition_list
variable is a list of one or more partitions, separated by a comma. Subpartitions can also be included in the list, concatenating their name with their main partition's name.
For example, if we have a tab table, the following statement returns all rows from the partition p0
, and the s3
subpartition in the partition p1
:
SELECT * FROM tab PARTITION (p0, p1s3);
If any of the specified partitions or subpartitions do not exist, the whole statement fails with an error similar to the following:
ERROR 1735 (HY000): Unknown partition <partition_name> in table <table_name>
By contrast, existing partitions can be named more than once in any order, and the subpartitions can be named even if their main partition is included in the list.
Since partition pruning is automatic and it was implemented in MariaDB even before partition selection, this feature can seem useless at first glance. However, it can be useful in several ways. For example:
WHERE
clause.DELETE
and UPDATE
, can cause a data loss if the WHERE
clause is not correct. An additional PARTITION
clause can reduce the risks when similar damages occur.PARTITION
clause can be added to statements that add a huge amount of data to prevent the writing of incorrect values. This technique causes the whole bulk insertion to fail if at least one row does not fit one of the specified partitions. The IGNORE
clause does not affect this behavior.