Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Understanding MySQL replication

coordinates
Apr 15, 2018

Basic MySQL replication (as implemented in MySQL 5.5) conceptually is very simple -
master server use binlog dump thread to write data changes (row based, statement
based or mixed depending on settings) into binary log and slave use I/O thread to
copy statements from binay log into relay log. Another slave thread - SQL thread -
reads data from relay log and applies them (so that changes made in master server
would appear in slave server too). This basic MySQL replication design is shown in
picture bellow.

SHOW SLAVE STATUS is main command in order to check state of MySQL replication
as it shows information about slave threads and other parameters. Here is example
from actual running slave server (information not related with slave threads is not
shown) with explanation:

Master_Log_File: mysql-bin.001363 - master binlog filename from


which the I/O thread is currently reading.

Read_Master_Log_Pos: 867649780 - position in master binlog


file up to which the I/O thread has read.

Relay_Log_File: slave-relay.000453 - relay log filename from


which the SQL thread is currently reading and executing.

Relay_Log_Pos: 867649926 - position in relay log file up to


which the SQL thread has read and executed.
Relay_Master_Log_File: mysql-bin.001363 - binary log filename
containing the most recent event executed by the SQL thread.

Exec_Master_Log_Pos: 867649780 - position in master binlog file


to which the SQL thread has read and executed

Picture below shows graphical representation how these parameters fits into basic
MySQL replication setup.

These 6 parameters logically can be grouped into three tuples (filename, position)
and these 3 pairs gives us nice overview of slave threads progress:

 (Master_Log_File, Read_Master_Log_Pos) - this pair of coordinates show


information about slave I/O thread state. Slave I/O thread is reading from
binlog mysql-bin.001363 and it has read up to 867649780 position in that
file.
 (Relay_Log_File, Relay_Log_Pos) - this pair of coordinates show
information about slave SQL thread state from relay log perspective. Slave
SQL thread is reading from relay file slave-relay.000453 and has read and
executed statements up to 867649926 position in that file.
 (Relay_Master_Log_File, Exec_Master_Log_Pos) - this pair of coordinates
show information about slave SQL thread state from Master binlog
perspective. Slave SQL thread is reading from relay file slave-relay.000453
and has read and executed statements up to 867649926 position in that
file. This correspond to mysql-bin.001363 binlog file and position
867649780 in master server. That is if we start reading from master binlog
file pointed by Relay_Master_Log_File in position starting from
Exec_Master_Log_Pos and if we start reading from slave relay log file
pointed by Relay_Log_File starting from position Relay_Log_Pos we should
get the same information.

The MySQL replication process works like follow :

 Transactions are executed on the Master


 All transactions are written in Binary Logs files (mysql-bin.xxxxx) as "event" on the
Master
 On a Slave, two threads are dedicated to the replication process
 The I/O thread reads the Master Binary Logs events and writes them to
the relay log in local (mysqld-relay-bin.xxxxx)
 The SQL thread reads the Relay Log events and executes them on the Slave
In the SHOW SLAVE STATUS\G output the important info are:
What is the current Master position?

 Master_Log_File: The current Binary Log of the Master


 Read_Master_Log_Pos: The current Binary Log position of the Master
What is the current Slave position?

 Relay_Log_File: The current Relay Log of the Slave


 Relay_Log_Pos: The current Relay Log position of the Slave
What is the current Slave position regarding to the Master's Binary Logs?

 Relay_Master_Log_File: The Binary Log (on the master) which corresponding to


the actual position of the slave
 Exec_Master_Log_Pos: The Binary Log position (on the master) which
corresponding to the actual position of the slave
Now, if you don't use the Binary Logs on Slaves (for point in time recovery or chained
replication) you can disable it and remove them:
log_bin = 0
To purge your binary logs, use:

PURGE BINARY LOGS;


Note regarding log-slave-updates: by default if you enable Binary Logs on Slave, the Slave
will only writes events executed on the Slave directly, none of the events coming from its
Master will be written in the Slave's Binary Logs. If you want to setup a chained
replication (M -> S/M -> S), you need to tell the Slave to logs the Master events on its
Binary Logs to replicate them on its own Slaves. This options is log-slave-updates.
If you need to enable Binary Logs on Slave the command to see the curent position of
the Slave´s Binary Logs is SHOW MATSER STATUS; you will see the position coresponding
to your files on your directory (on slave).
Note on Binary Logs managment: Do not forget to set a "purge strategy" for your Binary
Logs if you don't want to saturate your disks. The simplest way is to use the
expire_logs_days variable which tell to MySQL to purge its Binary Logs older than this
variable.

Just add a note for PURGE BINARY LOGS;, this is a pretty safe command according to
official document : This statement is safe to run while slaves are replicating. You need
not stop them. If you have an active slave that currently is reading one of the log files
you are trying to delete, this statement does nothing and fails with an error.

MySQL Replication functions with two threads

 IO Thread : Responsible for Maintaining a Connection Back to the Master. It is


used to collect SQL entries recorded in the Master's Binary Logs and storing those
entries in the Slave's Relay Logs.
 SQL Thread : Responsible for Reading the Entries from the Slave's Local Relay
Logs, Executing the SQL, and Rotating the Fully Used Relay Logs.
Your problem has to do with the IO Thread. If there was any intermittency experienced
by the IO Thread, it was most likely figuring out the size of the incoming SQL statement
from the Master but never had the chance to actually read the SQL statement. I can say
with absolute certainty because in the example posted above, the end_log_pos is read in
the header of the SQL statement before the statement itself is actually read. This usually
happens under these circumstances:

 mysqld was restarted on the Master while replicating


 Master DB Server crashes while replicating
 possible dropped packets during the session between Master and Slave

1. Duplicate server-ids on two or more slaves.


Symptoms: MySQL error log on a slave shows the slave thread is connecting/disconnecting
from a master constantly.
Solution: check whether all nodes in the replication have unique server-ids.
2. Dual-master setup, “log_slave_updates” enabled, server-ids changed.
Scenario: you stop MySQL on the first master, then you stop the second one. Afterwards,
you perform some migration or maintenance. Suddenly, you realize that it would be better to
change server-ids to match the end part of new IP addresses. You bring everything back
online and noticed the strange behaviour like some portion of binlogs cycle between masters
due to log_slave_updates enabled. But how come if new server-ids are different? It’s really
could be true when the some data were written to the second master prior its shutdown when
the first one was already off and both nodes started recognizing cycled data as not their own,
thus applying them and passing down the replication as sender’s server-id does not match
server-id written in the binlogs data itself prior shutdown. So we have got an infinite loop.
Solution: simply reset the slave position on one of the masters a new, so finally the “ghost”
binlogs will stop cycling.
3. MySQL options “sync_relay_log”, “sync_relay_log_info”, “sync_master_info”.
Symptoms: according to the “SHOW SLAVE STATUS” output, one time the slave thread
queues events showing some delay, another time waits for master showing 0 lag and so on.
Considering the real master position, indeed there should be XXXXX delay. Another
symptoms are IO saturation and high disk IOPS number but a disk is only busy a half as
shown by pt-diskstats. In my case I was observed 1500 iops on master, 10x more – 15000
iops being 60% busy on slave. You may think it could be a row-based replication
(binlog_format=ROW) and constan

You might also like