failover with pgpool

Attentive readers (or maybe just me) will recall a post I did a few months ago that outlined how to set up streaming replication with postgresql 9.0. Even more attentive readers will recall a big caveat in the post, around the need for additional software to trigger the failover from the Primary database server to the Secondary, and the necessary changes to IP addresses and such like.

I’m now going to introduce one method of handling this, in such a way that the end users hopefully don’t notice, or need to change their database connections. The package I’m going to talk about is PgPool, which is a very powerful middleware package for PostgreSQL, doing load balancing and connection pooling as well as handling failover. It is only available for linux at present, unfortunately.

The basic premise for using PgPool for failover is that your client software packages connect to PgPool as if it was the database. They connect in exactly the same way, in other words using an IP address, Port, Username, Password and Database name. The only difference is that the IP address and Port are the ones for PgPool rather than the database. In the event that the primary database server fails, PgPool handles the automatic promotion of the secondary server, and because the IP address hasn’t changed, the client (hopefully) doesn’t notice.

You can find some extremely comprehensive instructions for setting up PgPool in a number of different configurations on the website so I won’t go through them here. I went for this approach, which outlines setting up three servers- one to hold pgpool, one to be the primary database server, and one to be the secondary. The install is fairly complex, for example you need to be able to setup passwordless ssh between the servers so that they can communicate securely with each other.

I’ve noted some of the gotchas that caused me some issues below. They may not all be relevant to your setup, and will not make the blindest bit of sense unless you read the instructions first:

You will need to configure your database servers to use Hot Standby and Streaming Replication rather than Warm Standby.
Install Postgresql-server-dev-9.x (substitute the correct version of postgresql) on each of your servers to ensure that PgPool picks up the correct version of all the environment variables it needs.
You need to run install_functions.sh in each of your databases. To do this properly you may need to create a postgresql user called “root” so that when you run the command ‘sudo ./install_functions.sh’ it runs the enclosed psql commands without problems. Alternatively edit install_functions.sh to use the correct psql credentials for your environment.
You need to be able to use passwordless ssh between your three servers as either the database user or the user that your webserver runs as (if you’re using the web interface PgPoolAdmin). In my case, the webserver is apache, which tends to use the username ‘www-data’. The easiest approach is to create ‘www-data’ users on each of the servers, but you can configure ssh to use a different user instead. Whatever you do, test to make sure that the servers can all communicate with each other!
Test out the shell scripts for creating the basebackup and failover manually- this will show you any errors relating to permissions, usernames and such like. These will need to be tested (and possibly edited) on both database servers. I had also had to edit pgpool_remote_start to include the correct location of the pg_ctl binary- this appears to be distribution-dependent in linux.
The log files are your friends! When testing the web interface for connecting and disconnecting to servers, promoting to primary, or recovering, watch the log files on each database server using ‘tail -f’ to see what actually happens.
PgPool is set up by default to use port 9999, but you should change that to the default 5432, if you have no other install of postgresql on your PgPool server. This will avoid problems if you have client software that is not configured to use a non-default port.

Having set all of this up, my next plan is to include some email functionality in failover.sh so that it fires out an email to the database administrator when it’s triggered. Whilst it’s nice to know that all of this happens silently, it would also be useful to know there’s a problem! Note that I have seen this approach outlined in great detail somewhere out there on the internets but I can’t find the post right now. If and when I do, I’ll edit this post to include the link.