MySQL Replication and Caucus

Charles Roth, 25 March 2014       (Techblog top)
Updated 13 September 2014

I. Introduction
MySQL master/slave replication is a fascinating, extremely useful, but tricky-to-implement bit of business.  This document summarizes my experience in using it to replicate Caucus sites across multiple servers.  (But the same principles will apply to any application using MySQL.)

II. Paired hosts: a simple architecture
Here's a particular model for using a pair of servers "A" and "B" as live backups for each other.

             A                       B       
      ---------------         ---------------
      |             |         |             |
      | MySQL:3306  |         |  MySQL:3306 |
      |  (master) ----        --- (master)  |
      |             | \      /|             |
      |             |  \    / |             |
      |             |   \  /  |             |
      |             |    \/   |             |
      |             |    /\   |             |
      | MySQL:3307<=====/  \====> MySQL:3307|
      |   (slave)   |         |    (slave)  |
      --------------          ---------------

Server A and server B are both running (at least one) Caucus site, with the corresponding MySQL database(s) in the (master) MySQL server on the same server (running on port 3306 as usual).

Ideally, either server A or server B is "big enough" to run all of the Caucus sites simultaneously, or at least for a little while in a pinch, at a lower (slower) service level.

Both servers are also running a second, completely indendent, instance of MySQL (the slaves), on port 3307.  The slave instances need less memory, fewer open files, etc. -- all the various parameters normally set in /etc/my.cnf.  They're using the same MySQL software, but have different processes pointing at different areas on disk to store the data and configuration.

Once the slave MySQL instances are set up, each is "slaved" to the master on the other host.  This means that every write (and only the writes) on a master server, also get sent across the net, and also happen on the slave server on the other host.  Even if the slave server is disconnected or stopped for a while, MySQL replication automatically knows how to "catch up" as soon as the slave is reconnected.

This also means that if the master server goes down, the slave contains all of the data on the master, up to within a few seconds of the moment the master went down.  So there's no need to (as we did in the bad old days) do a mysqldump of the master server's data every night, and ship it across the net to a backup server.

This model handles live backups of all data in MySQL.  In Caucus and many other apps, there is also a separate (non-MySQL) store for files that have been uploaded or "attached" to the site.  These will still need to be backed up to the "other" server via rsync or some other mechanism. 

III. Creating the slave server
Setting up the slave MySQL server on each host is simple -- but not always straight-forward!  Slight differences in the configuration files and "mysqld" service startup scripts across different Linux distributions can cause some headaches.

Originally I tried to create a shell-script that would automate the process, but there were just too many variants in different releases of MySQL that broke the script in too many (some dangerous) ways.  So instead I have a step-by-step description below, that will require some (minor) adaptation depending on your installation.  It assumes a RedHat or CentOS distribution; you may have to tweak it for other Linux distros. 

  1. Make sure the slave is stopped.  (E.g. in case you are wiping and recreating it.) ("kill -9" it if you have no other way.)
  2. Remove the old directories and files.  (Assumes the master server script is in /etc/init.d/mysqld.  Some installs may use /etc/init.d/mysql instead.  Adapt accordingly.)
       rm -r /var/lib/mysql2
       rm    /etc/init.d/mysqld2
       rm    /etc/my2.cnf
       rm    /usr/local/bin/mysql2
    
  3. Create a separate database storage area in /var/lib/mysql2.
       mkdir  /var/lib/mysql2
       chown mysql:mysql /var/lib/mysql2
    
  4. Create a new configuration file /etc/my2.cnf.
       cp /etc/my.cnf /etc/my2.cnf
    
    Edit /etc/my2.cnf:

  5. Create a new service startup file in /etc/init.d/mysqld2.  This script is just a wrapper than runs mysqld_safe, but various installs set up the necessary arguments in highly variable ways, which complicates matters.

    The goal is to make it start mysqld_safe with the following arguments:

       /usr/bin/mysqld_safe --defaults-extra-file=/etc/my2.cnf \
              --datadir=/var/lib/mysql2                        \
              --pid-file=/var/lib/mysql2/mysql.pid             \
              --port=3307                                      \
              --socket=/var/lib/mysql2/mysql.sock              \
              --user=mysql
    

    (You may find it easiest to simply script that exactly as shown, modifying the arguments as necessary to match whatever you put in your /etc/my2.cnf.  It's better to modify the original script if possible, so that you retain the start/stop options.)

  6. Initialize the new database storage area, and create the default databases.
       mysql_install_db --defaults-file=/etc/my2.cnf --datadir=/var/lib/mysql2 --user=mysql
    
  7. Start the slave database server process.  Make sure it runs in the background!
       /etc/init.d/mysqld2 start
    
  8. Create the MySQL root user with password "password2".
       /usr/bin/mysqladmin --no-defaults --socket=/var/lib/mysql2/mysql.sock -u root password 'password2'
    
  9. Create a new MySQL client script in /usr/local/bin/mysql2 that talks to the slave server.  It should contain:
       mysql --socket=/var/lib/mysql2/mysql.sock $@
    
    Give it read-execute-all permission, e.g. 755.
So after running the script, you should be able to connect to the slave server using 'password2' with
   mysql2 -u root -p

The most common problem I've encountered is that, on some distributions, you may have to edit /etc/init.d/mysqld2 to ensure that mysqld_safe and/or msyql_install_db have the "--defaults-file=/etc/my2.cnf" as the very first argument.

IV. Set up the master
You will need to stop the master MySQL server entirely for a while, potentially a couple of hours, in order to set up master-slave replication.  Plan the timing accordingly.  (All of the commands below are performed on the master.)

  1. Make sure that the firewall on the servers allows the slave to talk to port 3306 on the master.
  2. Edit /etc/my.cnf, and enable binary logging for your databases. 

    Unfortunately, MySQL has only limited capability to specify the databases you want to replicate.  There are basically two choices: replicate everything except a specific set of databases, or replicate only a specific set.  The former can be wasteful; the latter inflexible (really hard to add new replicated databases later on).

    1. For the first choice (which I recommend), your /etc/my.cnf should look something like this:
          [mysqld]
          ...     (various normal MySQL configuration parameters)
          ...
          server-id=1
          log-bin=mysql-bin
          binlog-ignore-db=mysql
          binlog-ignore-db=information_schema
          binlog-ignore-db=(any and all databases for the host itself!)
      
      In this approach, I will normally make all of my to-be-replicated databases have a common prefix, e.g. "caucus_".  Then I look at all of the other databases that I have on the master server, and explicitly binlog-ignore-db them.  (This is where I wish MySQL would let me say "only replicate the databases starting with 'caucus_'".  Lacking that, I have to take this approach to effectively do just that.)

      Note that ignoring the mysql database is really important!  Otherwise the next MySQL permissions change on the master is going to try and replicate itself to the slave, which can bring the whole relationship to a screeching halt!

    2. For the second choice, when you're absolutely sure that you know which databases you want to replicate, your /etc/my.cnf should look like:
          [mysqld]
          ...     (various normal MySQL configuration parameters)
          ...
          server-id=1
          log-bin=mysql-bin
          binlog-do-db=databaseName1
          binlog-do-db=databaseName2
      
  3. Make sure all applications that talk to MySQL are stopped!
  4. Restart MySQL ("service mysqld restart")
  5. Create the 'replication' userid, and get the MySQL master status:
       mysql -u root -p
       grant replication slave on *.* to replicator@'slaveserver.com' identified by 'slavepassword';
       flush privileges;
       show master status;
       quit;
    
  6. Dump the already-existing databases that you want to replicate. 

    If you're using IV.2.a ("replicate everything but"), dump everything except 'mysql' and 'information_schema', and any database that you listed under binlog-ignore-db.  (If you add new databases later, after replication is running, the "replicate everything-but" method in IV.2.a will automatically copy them over.)

    If you're using IV.2.b ("only replicate specific databases"), just dump those databases.

    In either case, the best way to dump a set of databases is:

       mysqldump -u root -p --databases databaseName1 databaseName2 >dump.sql
    
    Transfer dump.sql to the slave server.  (You'll probably want to compress it first.)

V. Set up the Slave
On the slave server:

  1. Configure the slave.  Edit /etc/my2.cnf, and under the [mysqld] section, add:
       server-id=2
       master-host = masterserver.com
       master-user = replicator
       master-password = slavepassword
       master-connect-retry = 60
    
    Then restart the slave ("service mysqld2 restart").

    If the slave fails to start, removes the lines above, and put them into a new file /var/lib/mysql2/mysql-master.info.  Add a new line (to my2.cnf) that says:

       master-info-file = /var/lib/mysql2/mysql-master.info
    
    and make mysql the owner of that file:
       chown mysql /var/lib/mysql2/mysql-master.info
    
    and restart the slave.

    (Some newer versions of MySQL removed the master-host variable from my2.cnf entirely, and require the master.info file instead.)

  2. Load the databases:
       mysql2 -u root -p
       source ./dump.sql;
       quit
    
  3. Tell the slave to start listening to the master:
       mysql2 -u root -p
       slave stop;
       reset slave;
       change master to master_host='masterhost.com',
       master_user='replicator',
       master_password='slavepassword',
       master_log_file='(as shown in master status)',
       master_log_pos=(as shown in master status);
    
       start slave;
       show slave status\G;
    
    Most importantly, you should see "Slave_IO_Running: Yes" and "Slave_SQL_Running: Yes".  Anything else means something is wrong (e.g. the firewall is blocking the slave, the wrong ports are in use, etc. etc.)

    If you don't see a successful connection, you can try connecting directly to the remote master to see what is wrong.  E.g.

       mysql -u replicator -h masterhost.com -p
    
    (Note mysql, not mysql2.)  Examine the error messages, the mysql-error.log file, etc.  If mysql complains about not being able to connect, but the host it's complaining about is the local box (not the intended remote master), you may not have set up reverse DNS for your box properly.  You may be able to work around this using IP addresses instead of hostnames.

    If that doesn't help, see dev.mysql.com/doc/refman/5.5/en/replication-problems.html.

VI. Restart master applications
Finally, restart the applications (Caucus, etc.) on the master.

It's a good idea to do something in an application on the master, that will create or update a MySQL table row in a known way.  Then look for the corresponding change on the slave MySQL server within a few seconds.

VII. Testing
For a critical application, I like to have an automated daily test that replication is continuing to work properly.

For example, on the master, I set up a cron job that does something like:

   /usr/bin/mysql -u someUserid --password=somePassword someDatabase \
                  -e "update zzz_replication set now=now();"
where zzz_replication is a table with one row, and one DATETIME field called 'now'.

On the slave, I have a similar cron job, that runs a minute or three later, that looks to see if the 'now' field has been updated properly.  It looks something like this:

   addr=myEmail@myServer.com
   
   echo "use someDatabase; select now(); select now from zzz_replication;" | \
        /usr/local/bin/mysql2 -u someUserid --password=somePassword  | \
      ( read
        read now
        now=${now:0:15}
   
        read
        read then
        then=${then:0:15}
   
        if test "$now" = "$then"; then
           echo "master to slave repl ok"    | mail -s "master to slave repl ok"    $addr
        else
           echo "FAIL: master to slave repl" | mail -s "FAIL: master to slave repl" $addr
        fi
      )
Within a 10-minute window, this verifies that the update happened, and emails me either way.  (This has the bonus of also telling you if the clocks on the two servers drift too far apart.)