Read Replicas with containers
Read replicas are a really cool thing databases do can do. Not only do you get high availability but you also get an active spare you know is in good working condition, most of the time…maybe.
This next part is super useful for testing and learning.
The docker-compose yaml below will spin up two Postgres instances on your machine and create two bind volumes with the each db cluster. So at that stage you’ve got two empty db clusters that are doing nothing.
There’s a bit of work with the initialization for each container volume. The containers each expect you to start them and allow them sudo rights on the volume area. Which is totally cool. However, if you want to monkey around with the postgres.conf file then you need to stop both containers,
chown -R your_username:your_group
those volume locations. And then update the user, so your user id is seen as root on those file systems and on your local machine you can go poking about the pgdata directory without any issues or needing to sudo in.
Also, I’m using timescale here and it only does streaming replication. There ins’t logical replication due to hyper table partitions, instead you get the whole pig…as they say in the DB world I’m told.
Replicating docker timescale / postgres containers
version: '3.9'
services:
#This is the primary db.
db:
image: timescale/timescaledb-ha:pg16
environment:
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
networks:
- bridge
# id here, after first start,
# start with 0
user: {your id}:0
volumes:
- $PWD/timescale/home/:/home/postgres/pgdata/data
ports:
- 5432:5432
db-replica:
image: timescale/timescaledb-ha:pg16
environment:
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
networks:
- bridge
# id here, after first start
# start with 0
user: {your id}:0
#Really important trick!
# entrypoint: /bin/bash
# tty: true
# stdin_open: true
volumes:
- $PWD/replica/home/:/home/postgres/pgdata/data
ports:
- 5433:5432 #<5433!!!!!>
networks:
bridge:
driver: bridge
So at this point you have two databases which you can stuff around with but they don’t know anything about each other.
The next thing to do is establish the replica. Which is a little tricky.
Follow these instructions until you get to the replica section.
Timescale DB - configure replication
Then, before you start working on the replica, start the db-replica instance with this bit of yaml included.
entrypoint: /bin/bash
tty: true
stdin_open: true
This section of yaml in the replica container definition in the code stops postgres starting by starting bash up instead. It’ll just keep the container chugging along. Then via another terminal you can just log in with docker exec -it db-replica-1 /bin/bash
From here you can now delete the db cluster at /home/postgres/pgdata with a rm -rf ./data
Then go back to the replica instructions and you’ll be able to do the magic.
pg_basebackup -h db \
-D <DATA_DIRECTORY on the container!> \
-U repuser -vP -W
Anyhow, this is a rad little trick because it gives you two little db’s that replicate without having to both with a complex db setup or running VMs etc.
What would be nice, is if you could get a docker postgres/timescale container that allowed you to start it as a replica from the go and you could just pass it all the config needed via an environment variable set.
Anyway, it’s winter and i didn’t have the energy for that. But all of this is really useful.