MySQL Replication and Caucus
Charles Roth, 25 March 2014
(Techblog top)
Updated 13 September 2014
I. Introduction
MySQL master/slave replication is a fascinating, extremely useful,
but tricky-to-implement bit of business.
This document summarizes my experience in using it to replicate
Caucus sites across multiple servers.
(But the same principles will apply to any application using MySQL.)
II. Paired hosts: a simple architecture
Here's a particular model for using a pair of servers "A" and "B" as
live backups for each other.
A B
--------------- ---------------
| | | |
| MySQL:3306 | | MySQL:3306 |
| (master) ---- --- (master) |
| | \ /| |
| | \ / | |
| | \ / | |
| | \/ | |
| | /\ | |
| MySQL:3307<=====/ \====> MySQL:3307|
| (slave) | | (slave) |
-------------- ---------------
Server A and server B are both running (at least one) Caucus site,
with the corresponding MySQL database(s) in the (master) MySQL server
on the same server (running on port 3306 as usual).
Ideally, either server A or server B is "big enough" to run all of the Caucus
sites simultaneously, or at least for a little while in a pinch, at a lower (slower)
service level.
Both servers are also running a second, completely indendent, instance of
MySQL (the slaves), on port 3307.
The slave instances need less memory, fewer open files, etc. -- all the various parameters
normally set in /etc/my.cnf.
They're using the same MySQL software, but have different processes pointing at
different areas on disk to store the data and configuration.
Once the slave MySQL instances are set up, each is "slaved" to the master
on the other host.
This means that every write (and only the writes) on a master server, also get sent
across the net, and also happen on the slave server on the other host.
Even if the slave server is disconnected or stopped for a while, MySQL replication
automatically knows how to "catch up" as soon as the slave is reconnected.
This also means that if the master server goes down, the slave contains
all of the data on the master, up to within a few seconds of the moment
the master went down.
So there's no need to (as we did in the bad old days) do a mysqldump of the
master server's data every night, and ship it across the net to a backup server.
This model handles live backups of all data in MySQL.
In Caucus and many other apps, there is also a separate (non-MySQL) store for files
that have been uploaded or "attached" to the site.
These will still need to be backed up to the "other" server via rsync
or some other mechanism.
III. Creating the slave server
Setting up the slave MySQL server on each host is simple -- but not always
straight-forward!
Slight differences in the configuration files and "mysqld" service startup scripts
across different Linux distributions can cause some headaches.
Originally I tried to create a shell-script that would automate the process,
but there were just too many variants in different releases of MySQL that
broke the script in too many (some dangerous) ways.
So instead I have a step-by-step description below, that will require
some (minor) adaptation depending on your installation.
It assumes a RedHat or CentOS distribution; you may have to tweak it
for other Linux distros.
- Make sure the slave is stopped. (E.g. in case you are wiping and recreating it.)
("kill -9" it if you have no other way.)
- Remove the old directories and files.
(Assumes the master server script is in /etc/init.d/mysqld.
Some installs may use /etc/init.d/mysql instead.
Adapt accordingly.)
rm -r /var/lib/mysql2
rm /etc/init.d/mysqld2
rm /etc/my2.cnf
rm /usr/local/bin/mysql2
- Create a separate database storage area in /var/lib/mysql2.
mkdir /var/lib/mysql2
chown mysql:mysql /var/lib/mysql2
- Create a new configuration file /etc/my2.cnf.
cp /etc/my.cnf /etc/my2.cnf
Edit /etc/my2.cnf:
- Replace all "/var/lib/mysql" with "/var/lib/mysql2".
- Change the port to 3307. (If no 'port' line, add "port = 3307" underl [mysql].
- Set "innodb-buffer-pool-size" to something reasonable for a slave
(typically much smaller than the master).
- If there's a pid-file specified, make sure it uses "mysql2" instead
of "mysql".
(Some installs place it in /var/run rather than /var/lib/mysql.)
- Similarly, if there are log files specified, make sure they're either
in /var/lib/mysql2 somewhere, or else are named "mysql2-whatever" instead
of "mysql-whatever".
- Create a new service startup file in /etc/init.d/mysqld2.
This script is just a wrapper than runs mysqld_safe, but various installs
set up the necessary arguments in highly variable ways, which complicates
matters.
The goal is to make it start mysqld_safe with the following arguments:
/usr/bin/mysqld_safe --defaults-extra-file=/etc/my2.cnf \
--datadir=/var/lib/mysql2 \
--pid-file=/var/lib/mysql2/mysql.pid \
--port=3307 \
--socket=/var/lib/mysql2/mysql.sock \
--user=mysql
(You may find it easiest to simply script that exactly as shown, modifying
the arguments as necessary to match whatever you put in your /etc/my2.cnf.
It's better to modify the original script if possible, so that you
retain the start/stop options.)
- Initialize the new database storage area, and create the default databases.
mysql_install_db --defaults-file=/etc/my2.cnf --datadir=/var/lib/mysql2 --user=mysql
- Start the slave database server process.
Make sure it runs in the background!
/etc/init.d/mysqld2 start
- Create the MySQL root user with password "password2".
/usr/bin/mysqladmin --no-defaults --socket=/var/lib/mysql2/mysql.sock -u root password 'password2'
- Create a new MySQL client script in /usr/local/bin/mysql2 that talks to
the slave server.
It should contain:
mysql --socket=/var/lib/mysql2/mysql.sock $@
Give it read-execute-all permission, e.g. 755.
So after running the script, you should be able to connect to the slave server
using 'password2' with
mysql2 -u root -p
The most common problem I've encountered is that, on some distributions,
you may have to edit /etc/init.d/mysqld2 to ensure that
mysqld_safe and/or msyql_install_db have the "--defaults-file=/etc/my2.cnf" as
the very first argument.
IV. Set up the master
You will need to stop the master MySQL server entirely for a while, potentially a couple
of hours, in order to set up master-slave replication.
Plan the timing accordingly.
(All of the commands below are performed on the master.)
- Make sure that the firewall on the servers allows the slave to talk to
port 3306 on the master.
- Edit /etc/my.cnf, and enable binary logging for your databases.
Unfortunately, MySQL has only limited capability to specify the
databases you want to replicate.
There are basically two choices: replicate everything except
a specific set of databases, or replicate only a specific set.
The former can be wasteful; the latter inflexible (really hard to add
new replicated databases later on).
- For the first choice (which I recommend),
your /etc/my.cnf should look something like this:
[mysqld]
... (various normal MySQL configuration parameters)
...
server-id=1
log-bin=mysql-bin
binlog-ignore-db=mysql
binlog-ignore-db=information_schema
binlog-ignore-db=(any and all databases for the host itself!)
In this approach, I will normally make all of my to-be-replicated
databases have a common prefix, e.g. "caucus_".
Then I look at all of the other databases that I have on the
master server, and explicitly binlog-ignore-db them.
(This is where I wish MySQL would let me say "only replicate
the databases starting with 'caucus_'".
Lacking that, I have to take this approach to effectively do just that.)
Note that ignoring the mysql database is really important!
Otherwise the next MySQL permissions change on the master is
going to try and replicate itself to the slave, which
can bring the whole relationship to a screeching halt!
- For the second choice, when you're absolutely sure that
you know which databases you want to replicate, your
/etc/my.cnf should look like:
[mysqld]
... (various normal MySQL configuration parameters)
...
server-id=1
log-bin=mysql-bin
binlog-do-db=databaseName1
binlog-do-db=databaseName2
- Make sure all applications that talk to MySQL are stopped!
- Restart MySQL ("service mysqld restart")
- Create the 'replication' userid, and get the MySQL master status:
mysql -u root -p
grant replication slave on *.* to replicator@'slaveserver.com' identified by 'slavepassword';
flush privileges;
show master status;
quit;
- Dump the already-existing databases that you want to replicate.
If you're using IV.2.a ("replicate everything but"),
dump everything except 'mysql' and 'information_schema', and
any database that you listed under binlog-ignore-db.
(If you add new databases later, after replication is running,
the "replicate everything-but" method in IV.2.a will automatically copy them over.)
If you're using IV.2.b ("only replicate specific databases"), just
dump those databases.
In either case, the best way to dump a set of databases is:
mysqldump -u root -p --databases databaseName1 databaseName2 >dump.sql
Transfer dump.sql to the slave server.
(You'll probably want to compress it first.)
V. Set up the Slave
On the slave server:
- Configure the slave. Edit /etc/my2.cnf, and under the [mysqld] section, add:
server-id=2
master-host = masterserver.com
master-user = replicator
master-password = slavepassword
master-connect-retry = 60
Then restart the slave ("service mysqld2 restart").
If the slave fails to start, removes the lines above, and put them into a new file
/var/lib/mysql2/mysql-master.info.
Add a new line (to my2.cnf) that says:
master-info-file = /var/lib/mysql2/mysql-master.info
and make mysql the owner of that file:
chown mysql /var/lib/mysql2/mysql-master.info
and restart the slave.
(Some newer versions of MySQL removed the master-host variable
from my2.cnf entirely, and require the master.info file instead.)
- Load the databases:
mysql2 -u root -p
source ./dump.sql;
quit
- Tell the slave to start listening to the master:
mysql2 -u root -p
slave stop;
reset slave;
change master to master_host='masterhost.com',
master_user='replicator',
master_password='slavepassword',
master_log_file='(as shown in master status)',
master_log_pos=(as shown in master status);
start slave;
show slave status\G;
Most importantly, you should see "Slave_IO_Running: Yes" and
"Slave_SQL_Running: Yes".
Anything else means something is wrong (e.g. the firewall is blocking
the slave, the wrong ports are in use, etc. etc.)
If you don't see a successful connection, you can try connecting directly
to the remote master to see what is wrong.
E.g.
mysql -u replicator -h masterhost.com -p
(Note mysql, not mysql2.)
Examine the error messages, the mysql-error.log file, etc.
If mysql complains about not being able to connect,
but the host it's complaining about is the local box (not
the intended remote master), you may not have set up reverse DNS
for your box properly.
You may be able to work around this using IP addresses instead
of hostnames.
If that doesn't help, see
dev.mysql.com/doc/refman/5.5/en/replication-problems.html.
VI. Restart master applications
Finally, restart the applications (Caucus, etc.) on the master.
It's a good idea to do something in an application on the master,
that will create or update a MySQL table row in a known way.
Then look for the corresponding change on the slave MySQL server
within a few seconds.
VII. Testing
For a critical application, I like to have an automated daily test
that replication is continuing to work properly.
For example, on the master, I set up a cron job that does something like:
/usr/bin/mysql -u someUserid --password=somePassword someDatabase \
-e "update zzz_replication set now=now();"
where zzz_replication is a table with one row, and one DATETIME field called 'now'.
On the slave, I have a similar cron job, that runs a minute or three later,
that looks to see if the 'now' field has been updated properly.
It looks something like this:
addr=myEmail@myServer.com
echo "use someDatabase; select now(); select now from zzz_replication;" | \
/usr/local/bin/mysql2 -u someUserid --password=somePassword | \
( read
read now
now=${now:0:15}
read
read then
then=${then:0:15}
if test "$now" = "$then"; then
echo "master to slave repl ok" | mail -s "master to slave repl ok" $addr
else
echo "FAIL: master to slave repl" | mail -s "FAIL: master to slave repl" $addr
fi
)
Within a 10-minute window, this verifies that the update happened, and emails
me either way.
(This has the bonus of also telling you if the clocks on the two servers drift too far apart.)