
Having configured PostgreSQL 9.3 master/slave replication from bits and pieces of documentation on the Internet, I feel like a post is in order documenting my experience for others.
Installing PostgreSQL
I am not going to get into too much detail here. Installation instructions are abundant and are slightly different by operating system. I am assuming you are already past this stage.
Prepare Master
Before we can enable replication, we need to lay some ground work. First, we need a replication user. On the “master” host, log on as the user running postgres (typically postgres
) and do this:
psql -c "CREATE USER replication REPLICATION LOGIN CONNECTION LIMIT 1 ENCRYPTED PASSWORD 'replication';"
You may select a more secure password if you wish. Now edit pg_hba.conf
:
vi pg_hba.conf
Add the following lines:
host replication replication 1.2.3.4/32 md5 # This is your master CIDR/IP host replication replication 1.2.3.5/32 md5 # This is your slave CIDR/IP
Next edit postgresql.conf
:
vi /var/lib/pgsql/9.3/data/postgresql.conf
Modify the following lines to enable replication:
wal_level = hot_standby max_wal_senders = 5 # Recommended to allocate 5 per slave hot_standby = on wal_keep_segments = 1000 # Each segment is 16Megs.
Some explanation is in order. When I first configured replication, I set max_wal_senders=1
and wal_keep_segments=8
. That was woefully inadequate and needless to say the slave could not keep up with the replication. After some investigation, I decided to raise that to the maximum I can afford in terms of storage and then monitor the backlog.
After you enable replication on master, it needs to be restarted.
Enabling Archiving
As per this Stack Overflow post, you may also want to configure the following:
When the slave can’t pull the WAL segment directly from the master, it will attempt to use the restorecommand to load it. You can configure the slave to automatically remove segments using the archivecleanup_command setting.
# on master archive_mode = on archive_command = 'cp %p /path_to/archive/%f' # on slave restore_command = 'cp /path_to/archive/%f "%p"'
I only learned of this feature recently and I have not explored it in great detail. Fortunately, the size of my master data set is reasonable and the network connection between master and slave is fast enough that I can rebuild a failed master or slave in a matter of minutes. For now, I am leaving exploration of archiving for later and I will update this post.
Prepare Slave
Before you proceed, the slave host must be able to connect to the master. You may want to use telnet
command to make sure you can connect to the right port. I am leaving that as an exercise for the reader.
Before the slave is operational you need to replication the initial database. Make sure the PostgreSQL process is not running before you proceed as it may result in a corrupted initial backup. Log on to the slave machine as postgres
user, go into the postgres data directory and execute the following:
pg_basebackup -h 1.2.3.4 -D . --username=replication --password
This may run for some time depending on the size of your data set. Rember how we agreed above that 1.2.3.4
is your master IP, so replace it with the right numbers.
Now, let’s enable replication:
vi recovery.conf
This file probably does not exist yet, so you will create a blank file that should like following:
standby_mode = 'on' primary_conninfo = 'host=1.2.3.4 port=5432 user=replication password=replication' trigger_file = '/tmp/postgresql.trigger.5432'
Remember to replace 1.2.3.4
and replication
with the IP of your master and the password you’ve actually chosen. trigger_file
is just a file the slave will watch for. Should your master fail and you need to activate the slave as a new master, you simply need to create that file. Immediately, the slave will assume the master is dead and activate itself. We will discuss that topic in a moment.
Test replication
As postgres user on master using psql
command:
CREATE TABLE TEST (test VARCHAR(40)); INSERT INTO test VALUES ('bar'); INSERT INTO test VALUES ('baz'); INSERT INTO test VALUES ('bat');
As postgres user on slave using psql
command:
SELECT * FROM TEST;
You should see bar
, baz
, and bat
just like you inserted them on the master.
Monitoring replication
You can tell how much data is pending to be sent to the slave by running this query against the master:
SELECT application_name, client_addr, pg_size_pretty(pg_xlog_location_diff(pg_current_xlog_location(), sent_location)) FROM pg_stat_replication;
You can tell how far behind the slave is by running this query against the slave:
select now() - pg_last_xact_replay_timestamp() AS replication_delay;
How the failover and recovery work
The way failover works is this:
- Master failure is detected.
- Trigger file on the slave is created. This can be automated, but there appears to be no standard mechanism for this.
- The slave is now the new master.
- Your application needs to be aware that the configuration has changed and must switch to use the new master. Unfortunately, standard JDBC driver does not support this. I personally think that failover is a sensitive activity and there is no generic scenario for this.
- Your old failed master is now the new slave. You must rebuild it by performing
pg_basebackup
as if you are configuring a new slave.
So eventually, after this tutorial Slave become Master and Master become Slave. Is that correct understanding?
Yes, after failover the old master is the new slave and vice versa