Import the code for copycat-db

This commit is contained in:
Manav Rathi 2024-03-14 17:15:37 +05:30
parent 6e9893e5c5
commit 3e38d56579
No known key found for this signature in database
11 changed files with 370 additions and 0 deletions

View file

@ -0,0 +1,29 @@
name: Build and push Docker image
# Enable manual run
- release
# Sequence of patterns matched against refs/tags
- "v*" # Push events to matching v*, i.e. v4.2.0
runs-on: ubuntu-latest
- uses: actions/checkout@v4
name: Check out code
- uses: mr-smithers-excellent/docker-build-push@v6
name: Build & Push
image: ente/copycat-db
tags: ${GITHUB_SHA}, latest
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}

infra/copycat-db/.gitignore vendored Normal file
View file

@ -0,0 +1,2 @@

View file

@ -0,0 +1,34 @@
FROM ubuntu:latest
RUN apt-get update && apt-get install -y curl gnupg
RUN apt-get install -y tini
# Install pg_dump (via Postgres client)
# We don't need it for production backups, but this is useful for local testing.
apt-get install -y lsb-release && \
sh -c 'echo "deb $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list' && \
curl | apt-key add - && \
apt-get update && \
apt-get -y install postgresql-client-12
# Install SCW CLI
# Latest release:
export VERSION="2.26.0" && \
curl -o /usr/local/bin/scw -L "${VERSION}/scaleway-cli_${VERSION}_linux_amd64" && \
chmod +x /usr/local/bin/scw
RUN apt-get install -y jq
# Install rclone
RUN apt-get install -y unzip
RUN curl | bash
COPY src /
ENTRYPOINT ["tini", "--"]
CMD [ "/" ]

infra/copycat-db/ Normal file
View file

@ -0,0 +1,144 @@
## Introduction
Copycat DB is a [service]( to take a backup of
our database. It uses the Scaleway CLI to take backups of the database, and
uploads them to an offsite bucket.
This bucket has an object lock configured, so backups cannot be deleted before
expiry. Conversely, the service also deletes backups older than some threshold
when it creates a new one to avoid indefinite retention.
In production the service runs as a cron job, scheduled using a systemd timer.
## Required environment variables
Path to the `config.yaml` used by Scaleway CLI.
This contains the credentials and the default region to use when trying to
create and download the database dump.
If needed, this config file can be generated by running the following commands
on a shell prompt in the container (using `./ sh`)
scw init
scw config dump
The UUID of the Scalway RDB instance that we wish to backup. If this is missing,
then the Docker image falls back to using `pg_dump` (as outlined next).
Not needed in production when taking a backup (since we use the Scaleway CLI to
take backups in production).
These are used when testing a backup using `pg_dump`, and when restoring backups.
Location of the config file, that contains the destination bucket where you want
to use to save the backups, and the credentials to to access it.
Specifically, the config file contains two remotes:
* The bucket itself, where data will be stored.
* A "crypt" remote that wraps the bucket by applying client side encryption.
The configuration file will contain (lightly) obfuscated versions of the
password, and as long as we have the configuration file we can continue using
rclone to download and decrypt the plaintext. Still, it is helpful to retain the
original password too separately so that the file can be recreated if needed.
A config file can be generated using `./ sh`
rclone config
rclone config show
When generating the config, we keep file (and directory) name encryption off.
Note that rclone creates a backup of the config file, so Docker needs to have
write access to the directory where it is mounted.
Name of the (crypt) remote to which the dump should be saved. Example:
Note that this will not include the bucket - the bucket name will be part of the
remote that the crypt remote wraps.
##### Logging
The service logs to its standard out/error. The systemd unit is configured to
route these to `/var/logs/copycat-db.log`.
## Local testing
The provided `` script can be used to do a smoke test for building and
running the image. For example,
./ bin/bash
gives us a shell prompt inside the built and running container.
For more thorough testing, run this service as part of a local test-cluster.
## Restoring
The service also knows how to restore the latest backup into a Postgres
instance. This functionality is used to periodically verify that the backups are
To invoke this, use "./" as the command when running the container
(e.g. `./ ./`). This will restore the latest backup into the
Postgres instance whose credentials are provided via the various `PG*`
environment variables.
## Preparing the bucket
The database dumps are stored in a bucket that has object lock enabled
(Compliance mode), and has a default bucket level retention time of 30 days.
## Deploying
Ensure that promtail is running, and is configured to scrape
Create that the config and log destination directories
sudo mkdir -p /root/var/config/scw
sudo mkdir -p /root/var/config/rclone
sudo mkdir -p /root/var/logs
Create the env, scw and rclone configuration files
sudo tee /root/copycat-db.env
sudo tee /root/var/config/scw/copycat-db-config.yaml
sudo tee /root/var/config/rclone/copycat-db-rclone.conf
Add the service definition, and start the service
scp copycat-db.{service,timer} instance:
sudo mv copycat-db.{service,timer} /etc/systemd/system
sudo systemctl daemon-reload
To enable the cron job
sudo systemctl enable --now copycat-db.timer
The timer will trigger the service on the specified schedule. In addition, if
you wish to force the job to service immediately
sudo systemctl start copycat-db.service
## Updating
To update, run the [Github action](.github/workflows/ci.yaml) to push the latest
image to our Docker Registry, then restart the systemd service on the instance
sudo systemctl restart copycat-db

View file

@ -0,0 +1,13 @@
### Service logs
tail -f -n 100 /root/var/logs/copycat-db.log
### Backup timeout
If you are seeing time-out from scw while waiting for backup, usually just stopping the [service](.copycat-db.service) and letting the [daily timer](./copycat-db.timer) restart it later works
sudo systemctl stop copycat-db.service

View file

@ -0,0 +1,8 @@

View file

@ -0,0 +1,20 @@
# Don't automatically restart if it fails more than 6 times in 24 hours.
ExecStartPre=docker pull
ExecStartPre=-docker stop copycat-db
ExecStartPre=-docker rm copycat-db
ExecStart=docker run --name copycat-db \
--env-file /root/copycat-db.env \
-v /root/var:/var \

View file

@ -0,0 +1,8 @@
Description=Schedule copycat-db

infra/copycat-db/src/ Executable file
View file

@ -0,0 +1,50 @@
set -o errexit
set -o xtrace
NOWS="$(date +%s)"
# Scaleway backup names cannot contain dots
# Calculate an expiry time 1 month from now
EXPIRYS="$(( 30 * 24 * 60 * 60 + $NOWS ))"
# Convert it to the ISO 8601 format that SCW CLI understands
# Note that GNU date uses "-d" and an "@" to pass an epoch (macOS uses "-r").
EXPIRY="$(date -Iseconds --utc --date "@$EXPIRYS")"
if test -z "$SCW_RDB_INSTANCE_ID"
# A required SCW related environment variable hasn't been specified. This is
# expected when running the script locally for testing. Fallback to using
# pg_dump for creating the backup.
pg_dump -Fc ente_db > $BACKUP_FILE
# We need to export a backup first after creating it, before it can be
# downloaded.
# Further, our backups currently take longer than the default 20 minute
# timeout for the export set by Scaleway, and end up failing:
# {"error":"scaleway-sdk-go: waiting for database backup failed: timeout after 20m0s"}
# To avoid this we need to add a custom wait here ourselves instead of using
# the convenience `--wait` flag for the export command provided by Scaleway.
BACKUP_ID=$(scw rdb backup create instance-id=$SCW_RDB_INSTANCE_ID \
name=$BACKUP_NAME expires-at=$EXPIRY \
database-name=ente_db -o json | jq -r '.id')
scw rdb backup wait $BACKUP_ID timeout=5h
scw rdb backup download output=$BACKUP_FILE \
$(scw rdb backup export $BACKUP_ID --wait -o json | jq -r '.id')
# Delete older backups
rclone delete --log-level INFO --min-age 30d $RCLONE_DESTINATION
set +o xtrace
echo "copycat-db: backup complete: $BACKUP_FILE"

infra/copycat-db/src/ Executable file
View file

@ -0,0 +1,42 @@
set -o errexit
set -o xtrace
# Find the name of the latest backup
# The backup file name contains the epoch, so we can just sort.
BACKUP_FILE=$(rclone lsf --include 'db-*.custom' --files-only $RCLONE_DESTINATION | sort | tail -1)
# Download it
rclone copy --log-level INFO "${RCLONE_DESTINATION}${BACKUP_FILE}" .
# Restore from it
# This create a database named rdb on Postgres - this is only used for the
# initial connection, the actual ente_db database will be created once the
# restore starts.
# Flags:
# * no-owner: recreates the schema using the current user, not the one that was
# used for the export.
# * no-privileges: skip the assignment of roles (this way we do not have to
# recreate all the users from the original database before proceeding with the
# restore)
createdb rdb || true
pg_restore -d rdb --create --no-privileges --no-owner --exit-on-error "$BACKUP_FILE"
# Delete any tokens that were in the backup
psql -d ente_db -c 'delete from tokens'
# Delete any push tokens that were in the backup
psql -d ente_db -c 'delete from push_tokens'
# Delete some more temporary data that might've come up in the backup
psql -d ente_db -c 'delete from queue'
psql -d ente_db -c 'delete from temp_objects'
set +o xtrace
echo "copycat-db: restore complete: $BACKUP_FILE"

infra/copycat-db/ Executable file
View file

@ -0,0 +1,20 @@
set -o xtrace
set -o errexit
docker rmi "ente/$PROJECT" || true
docker build --tag "ente/$PROJECT" .
# Interactively run the container.
# By passing "$@", we allow any arguments passed to to be forwarded to
# the image (useful for testing out things, e.g. `./ sh`).
docker run \
--interactive --tty --rm \
--env-file copycat-db.env \
--name "$PROJECT" \
"ente/$PROJECT" \