ente/infra/copycat-db
2024-03-14 22:33:00 +05:30
..
src Import the code for copycat-db 2024-03-14 17:20:00 +05:30
.gitignore Import the code for copycat-db 2024-03-14 17:20:00 +05:30
copycat-db.sample.env Update paths 2024-03-14 17:29:11 +05:30
copycat-db.service Update paths 2024-03-14 17:29:11 +05:30
copycat-db.timer Import the code for copycat-db 2024-03-14 17:20:00 +05:30
Dockerfile Import the code for copycat-db 2024-03-14 17:20:00 +05:30
README.md npx prettier --config ../docs/.prettierrc.json --write '**/*.md' 2024-03-14 22:33:00 +05:30
test.sh Import the code for copycat-db 2024-03-14 17:20:00 +05:30

Copycat DB

Copycat DB is a service to take a backup of our database. It uses the Scaleway CLI to take backups of the database, and uploads them to an offsite bucket.

This bucket has an object lock configured, so backups cannot be deleted before expiry. Conversely, the service also deletes backups older than some threshold when it creates a new one to avoid indefinite retention.

In production the service runs as a cron job, scheduled using a systemd timer.

These backups are in addition to the regular snapshots that we take, and are meant as a second layer of replication. For more details, see our Reliability and Replication Specification.

Quick help

View service status (it gets invoked as a timer automatically, doesn't need to be started/stopped manually):

sudo systemctl status copycat-db

View logs locally (they'll also be available on Grafana):

sudo tail /root/var/logs/copycat-db.log

Name

The name copycat-db is a riff on "copycat", which is what we call our museum instance that does the object replication. This one replicates the DB, so, copycat-db.

Required environment variables

SCW_CONFIG_PATH

Path to the config.yaml used by Scaleway CLI.

This contains the credentials and the default region to use when trying to create and download the database dump.

If needed, this config file can be generated by running the following commands on a shell prompt in the container (using ./test.sh sh)

scw init
scw config dump
SCW_RDB_INSTANCE_ID

The UUID of the Scalway RDB instance that we wish to backup. If this is missing, then the Docker image falls back to using pg_dump (as outlined next).

PGUSER, PGPASSWORD, PGHOST

Not needed in production when taking a backup (since we use the Scaleway CLI to take backups in production).

These are used when testing a backup using pg_dump, and when restoring backups.

RCLONE_CONFIG

Location of the config file, that contains the destination bucket where you want to use to save the backups, and the credentials to to access it.

Specifically, the config file contains two remotes:

  • The bucket itself, where data will be stored.

  • A "crypt" remote that wraps the bucket by applying client side encryption.

The configuration file will contain (lightly) obfuscated versions of the password, and as long as we have the configuration file we can continue using rclone to download and decrypt the plaintext. Still, it is helpful to retain the original password too separately so that the file can be recreated if needed.

A config file can be generated using ./test.sh sh

rclone config
rclone config show

When generating the config, we keep file (and directory) name encryption off.

Note that rclone creates a backup of the config file, so Docker needs to have write access to the directory where it is mounted.

RCLONE_DESTINATION

Name of the (crypt) remote to which the dump should be saved. Example: db-backup-crypt:.

Note that this will not include the bucket - the bucket name will be part of the remote that the crypt remote wraps.

Logging

The service logs to its standard out/error. The systemd unit is configured to route these to /var/logs/copycat-db.log.

Local testing

The provided test.sh script can be used to do a smoke test for building and running the image. For example,

./test.sh bin/bash

gives us a shell prompt inside the built and running container.

For more thorough testing, run this service as part of a local test-cluster.

Restoring

The service also knows how to restore the latest backup into a Postgres instance. This functionality by a separate service (Phoenix) to periodically verify that the backups are restorable.

To invoke this, use "./restore.sh" as the command when running the container (e.g. ./test.sh ./restore.sh). This will restore the latest backup into the Postgres instance whose credentials are provided via the various PG* environment variables.

Preparing the bucket

The database dumps are stored in a bucket that has object lock enabled (compliance mode), and has a default bucket level retention time of 30 days.

Deploying

Ensure that promtail is running, and is configured to scrape /root/var/logs/copycat-db.log.

Create that the config and log destination directories

sudo mkdir -p /root/var/config/scw
sudo mkdir -p /root/var/config/rclone
sudo mkdir -p /root/var/logs

Create the env, scw and rclone configuration files

sudo tee /root/copycat-db.env
sudo tee /root/var/config/scw/copycat-db-config.yaml
sudo tee /root/var/config/rclone/copycat-db-rclone.conf

Add the service definition, and start the service

scp copycat-db.{service,timer} instance:

sudo mv copycat-db.{service,timer} /etc/systemd/system
sudo systemctl daemon-reload

To start the cron job

sudo systemctl start copycat-db.timer

The timer will trigger the service on the specified schedule. In addition, if you wish to force the job to service immediately

sudo systemctl start copycat-db.service

Updating

To update, run the GitHub workflow to build and push the latest image to our Docker Registry, then restart the systemd service on the instance

sudo systemctl restart copycat-db