MariaDB supports built-in replication; it can be used for several purposes. The most common reason to build a replication environment is to increase data redundancy for improving the fault tolerance. Also, while replication does not replace a good backup plan, a slave data can sometimes be used as a backup for the master in case of data loss. Another use of replication is writing data into the master and spreading the queries through two or more slaves, to improve performance.
In the previous chapter, we discussed backups. Knowing this topic is very important now, because replication, just as with some backup types, is based on the binary log.
In this chapter we will learn:
MariaDB supports built-in replication. This feature is one of the most ancient and an advanced MariaDB feature. The first version of the code saw the light in MySQL 3.23.15, in May 2000. At that time, MySQL did not even include InnoDB, and did not support important features such as views or the UNION
statement. Of course, the first version of replication was quite poor. Basically, a master just logged SQL statements and sent the log entries to the slaves. However, the age of this feature reveals how stable it is nowadays.
MariaDB replication is based on the binary log. The binary log keeps track of the events that modify the databases. The binary log supports three formats:
STATEMENT
ROW
MIXED
With the STATEMENT
format, events are all SQL statements that do or could modify some data. With the ROW
format, events are all modifications that occur as a consequence of such statements. The MIXED
format records the statements when possible, but it can also record the modifications. Depending on the format, we commonly define the replication as
statement-based or row-based. The binary log will be explained in detail in this chapter. It has already been mentioned in Chapter 8, Backup and Disaster Recovery.
MariaDB's built-in replication is called asynchronous replication. This means that there is no need for a permanent connection between the slaves and the master.
It is possible to stop the replication at any moment to obtain a snapshot of the master's data. This can be done, for example, to perform a fast backup. When the slave is started again, it will receive all the events that occurred while the replication was not working. The same happens if the slave had crashed for some reason.
The slave can be queried by clients. Read-heavy workloads can greatly benefit from this feature. Connecting several slaves to one master allows us to distribute queries on the slaves.
Each slave can also be a master. For example, server A can be the master for server B, while B can also be the master for server C. If B crashes, replication to C will be temporarily stopped; however, if B loses data, C can be used as a backup. Also, in our example, C could be a master of A. This kind of configuration forms a ring, and it is called circular replication. It allows modifying or reading data on any server. Data consistency is guaranteed, because each modification will eventually be replicated by all servers in the ring. The main disadvantage is that none of the servers contain a version of the data that is always up-to-date.
MariaDB 10.0 supports multisource replication. This feature allows each slave to replicate data from multiple masters. There is no conflict handling in MariaDB. Thus, the master must contain different data. It is not possible to replicate the same database from two or more masters. Multisource replication allows you to use one machine, or a limited number of machines, to replicate data from several masters. The cost of the hardware can be reduced with this technique.
Ideally, in a replication environment all the masters and slaves should use the same MariaDB version. Replication from an older master to a newer slave does not work. For example, a 5.5 master cannot replicate to a 10.0 slave. Replication from a newer master to an older slave is generally supported, but it may cause problems. MySQL servers can be present in the replication topology.
In this section, we will be provided with information on how replication is implemented in MariaDB. In particular, the read operation will learn which threads are used and which logs are kept. This is necessary before proceeding with the following section, which explains how to set up the master and slave servers in a replication environment, and how to maintain them.
In MariaDB replication, three kinds of threads are used, as shown in the following table:
Where it runs |
Thread name |
---|---|
master |
Binlog dump thread |
slave |
SQL I/O thread |
slave |
Slave SQL thread |
The connections between each slave and a master are requested by the slaves. When a slave is started, it creates the SQL I/O
thread. This thread connects to the master and requests events that must be replicated.
On the master, a
Binlog dump
thread runs. This thread is a daemon that accepts requests from the slave's SQL I/O
threads and sends them the binary log events. In the output of SHOW SLAVE STATUS
, this thread is called Slave_IO_running
. The output of SHOW PROCESSLIST
shows this thread as Binlog Dump
.
The
SQL I/O
thread does not execute events directly. It just writes them in a log on the slave called the
slave relay log.
The
Slave SQL
thread reads the relay log and executes the events in the database.
Before MariaDB 10.0, only one slave SQL thread was started for each slave server. This could lead to poor performance, because normally a master executes the same write operations using several parallel threads. Sometimes a single thread is not sufficient to replicate the master's workload with acceptable performance.
In MariaDB 10.0, a feature called
parallel replication has been introduced. Oracle introduced a similar feature in MySQL 5.6. However, MySQL users should note that MariaDB and MySQL use different implementations of parallel replication, configured in different ways. For example, the most important server variable for MariaDB parallel replication is
@@slave_parallel_threads
, that is not present in MySQL; and the most important server startup option for MySQL parallel replication is --slave-parallel-workers
, which is not present in MariaDB.
Parallel replication consists of starting a pool of threads that is able to apply many events in a parallel way. Each thread in this pool is called a worker thread. Note that not all the events can be applied by parallel threads. MariaDB will still execute some operations sequentially, when this is necessary to correctly replicate the data.
This feature is optional and is not enabled by default. To use it, it is necessary to configure it on the master, by setting the @@slave_parallel_threads
server variable. This value is the number of worker threads that will be started on each slave. As a consequence, all the slaves replicating data from the same master will have the same number of worker threads. Also, if a slave replicates multiple masters, the same number of worker threads must be configured on all the masters.
Slaves need to record information about the replication configuration and current progress. This information must not be lost, even in the event of a crash. So, each slave maintains three logs:
relay
log contains the events that were received by the master's binary logs. As explained previously, this log is written by the slave I/O thread and read by the slave SQL thread, or by pool worker threads if parallel replication is used.master
log stores the information that is necessary to connect to the master, as well as the master's binary logs coordinates. The coordinates consist of a logfile name and the position of the last binary log event that has been received.relay
log info log stores information about the last relay log event that has been applied by the slave SQL thread or the worker threads.The relay
log is always written into files. The master
log information and the relay
log information can be written in files or into system tables in the mysql
database.
Even in multisource replication, each slave has only one log for each type.