Like the FEDERATEDX
storage engine and the CONNECT MYSQL
table type, SPIDER
allows remote access to tables as if they were on the local server. However, the SPIDER
storage engine has been specifically designed for data sharding. Its main function is to access data from a multitude of servers by querying a single local table.
Data sharding is implemented in SPIDER
using table partitioning. If a SPIDER
table is partitioned, each partition can be linked to a different remote table. The SPIDER
storage engine is suitable with the RANGE
and LIST
partitioning methods, including RANGE COLUMNS
and LIST COLUMNS
.
The SPIDER
storage engine supports both regular SQL transactions and XA transactions, if the remote tables support them too.
The SPIDER
storage engine has been originally designed for MySQL. The version distributed along with MariaDB is slightly modified to take advantage of the MariaDB-specific features.
The original author of SPIDER
is Kentoku Shiba. The project's official site is http://www.spiderformysql.com/.
The SPIDER storage engine is essentially a storage engine that communicates with the local server's optimizer on one side and the remote server on the other side. When the optimizer chooses an execution plan for a query, which involves SPIDER
tables, SPIDER
translates this plan into calls to one or more remote servers, acting like a MariaDB client.
When SPIDER
is asked to insert data into more than one remote server, it internally uses a transaction with a two-phase commit. The problem with a single-phase commit is that it guarantees data integrity only if it involves one server. When all the modifications have been requested, the commit makes them effective. However, imagine that the modifications involve two servers. We send the commands to both servers, and we get no error. Then, we issue a commit on server 1 and it succeeds. Finally, we send the commit to server 2. If this commit fails, we have created an inconsistency. In fact, the changes we requested are already effective on server 1 and they cannot be undone anymore. For this reason, a single-phase commit is not suitable to execute a transaction between multiple servers.
The two-phase commit transaction model is the same as the model used for the XA transactions. XA commands can be sent to SPIDER
by the user because they are fully supported. However, even if a normal transaction is used, SPIDER
uses a two-phase commit to make the changes effective across multiple servers. With this technique, when a server receives a commit, it does not immediately apply the changes. It knows that the transaction is finished, but it waits for a second commit. If any of the remote servers returns an error or is not reachable, SPIDER
rolls back the transaction on each involved server, avoiding data corruption. Only if the first commit succeeds on all the remote servers, SPIDER
sends a second commit to each server. The second commit makes data effective.
When a query involves multiple SPIDER
partitions, or multiple unpartitioned SPIDER
tables, they are broken into multiple threads. A separate thread is used for each remote server that needs to be accessed by the query. Keeping this in mind, the DBA can augment the parallelization of queries by adding more partitions that point to a remote server.
Query results are buffered by SPIDER
until they can be sent to the clients. The incomplete result sets can be stored on the remote servers or on the local server.
The SPIDER
storage engine maintains in-memory statistics on remote tables and indexes. These statistics are updated at regular intervals of time. Like other storage engines, SPIDER
communicates these values to the optimizer so that it can use them to choose good query plans.
The SPIDER
storage engine is compiled in official MariaDB distributions, but it is not enabled by default. Before we start using it, the following steps are necessary:
install_spider.sql
Like all plugins, the SPIDER
storage engine can be installed while the server is running, with the SQL INSTALL
statement.
The exact location of the install_spider.sql
file depends on the MariaDB distribution and the operating system we are using. It is generally located in the share
subdirectory. This file creates the system tables used by SPIDER
, in the mysql
database.
The following example shows a successful installation of SPIDER
:
MariaDB [(none)]> INSTALL SONAME 'ha_spider'; Query OK, 0 rows affected (0.07 sec) MariaDB [(none)]> SOURCE /usr/local/mariadb/share/install_spider.sql;
Then, we can check that the SPIDER
system tables have been installed:
MariaDB [(none)]> SHOW TABLES FROM mysql LIKE '%spider%'; +----------------------------+ | Tables_in_mysql (%spider%) | +----------------------------+ | spider_link_failed_log | | spider_link_mon_servers | | spider_tables | | spider_xa | | spider_xa_failed_log | | spider_xa_member | +----------------------------+ 6 rows in set (0.01 sec)
The SPIDER
storage engine supports a special syntax to specify where the remote tables are located. This syntax is different from the syntaxes used for FEDERATEDX
and CONNECT
, and it must be used in the COMMENT
table option.
The following example shows how to create a simple, unpartitioned SPIDER
table, which connects to a remote table. We will use the user
table, which we already used for some FEDERATEDX
and CONNECT
examples:
CREATE TABLE test.user ( id INTEGER UNSIGNED NOT NULL, username VARCHAR(50), password VARCHAR(50) ) ENGINE = SPIDER, COMMENT = 'user "user1", password "pwd" host "remote_server", port "3306", database "db1", table "user"';
If table
is not specified, it will be the same as the local table name. If database
is not specified, it will be the same database as the one that contains the local table.
We can also provide a defined server name, shown as follows:
CREATE TABLE test.user ( id INTEGER UNSIGNED NOT NULL, username VARCHAR(50), password VARCHAR(50) ) ENGINE = SPIDER, COMMENT = 'srv1';
The SPIDER
storage engine can automatically discover the structure of the remote table and create an identical local table. Thus, we can simply write:
CREATE TABLE test.user ENGINE = SPIDER, COMMENT = 'server "srv1"';
The SPIDER
storage engine tables are really useful when they are partitioned. Each partition can point to a different remote table. The following example shows how to create a SPIDER
partitioned table:
CREATE TABLE test.user ( id INTEGER UNSIGNED NOT NULL, username VARCHAR(50), password VARCHAR(50) ) ENGINE = SPIDER PARTITION BY RANGE (id) ( PARTITION p0 VALUES LESS THAN (1000000) COMMENT = 'server "srv1"', PARTITION p1 VALUES LESS THAN (2000000) COMMENT = 'server "srv2"', PARTITION p2 VALUES LESS THAN (3000000) COMMENT = 'server "srv3"' );
The following table options can be used to create a SPIDER
table, or partition, which connects to a remote server using SSL:
Option name |
Description |
---|---|
|
This is the name or path of the authority certificate |
|
This is the path of the authority certificate's directory |
|
This is the name or path of the certificate |
|
This is a comma-separated list of the encryption algorithms that can be used |
|
This is the path of the public key |
|
If set to |
For further details about SSL connections in MariaDB, see Chapter 5, Users and Connections.
Statements sent by the users to a SPIDER
table can be logged into the general query log, similar to all the other statements. And, of course, if a command generated by SPIDER
returns an error, the remote server can record this error in the error log. This behavior depends on the @@general_log
and @@log_error
server variables, as explained in Chapter 3, Optimizing Queries.
However, when a SPIDER
table queries a remote table, the remote server does not log the command by default. It is possible to log such commands by setting the @@spider_general_log
server variable to ON
in the remote servers. The SPIDER
commands will be written in the general query log.
When a command generated by SPIDER
returns an error on the remote server, the error can also be logged into the local servers by setting the @@spider_log_result_errors
server variable to ON
.
Some
User Defined Functions (UDFs) provide an easy way to execute arbitrary SQL statements against remote servers. Such UDFs are automatically installed with SPIDER
.
Unlike MYSQL CONNECT
tables with a SRCDEF
clause, these functions can return a result set.
The
spider_direct_sql()
function allows us to execute arbitrary SQL statements against a remote MariaDB or MySQL server. The results of the query are copied into a temporary table that needs to be explicitly created before calling this function. Note that the table needs to be temporary.
Consider the following example:
MariaDB [test]> CREATE TEMPORARY TABLE output ( -> v VARCHAR(255) -> ) ENGINE = InnoDB; Query OK, 0 rows affected (0.21 sec) MariaDB [test]> SELECT spider_direct_sql( -> 'SELECT VERSION() AS v', -> 'output', -> 'user "user1", password "pwd", host "remote_server", port "3306"' -> ) AS v; +---+ | v | +---+ | 1 | +---+ 1 row in set (0.06 sec) MariaDB [test]> SELECT * FROM output; +---------------------+ | v | +---------------------+ | 10.0.11-MariaDB-log | +---------------------+ 1 row in set (0.01 sec)
Let's examine the spider_direct_sql()
call. It has three parameters:
SPIDER
tables. It is also possible to specify a defined server name.When
spider_direct_sql()
is called, the current connection stays on hold until the remote query execution is completed and its result set is stored into the specified temporary table. However, sometimes we need to execute a long-running statement, and we do not want the current connection to wait until it is completed. In these cases, we can use the spider_bg_direct_sql()
function. As the name suggests, it executes the query in the background.
The syntax of spider_bg_direct_sql()
is identical to the syntax of spider_direct_sql()
.