Skip to content

Restoring from backup

Audience

PLANA staff. Customer-facing recovery information is at Plana extras → Backups.

PLANA takes a daily logical pg_dump of every tenant DB plus a tarball of its filestore, both to Exoscale SOS. This runbook covers restoring a tenant from one of those backups.

When to use this

  • Data corruption on the live tenant
  • Accidental delete the customer can't recover via Odoo's undo
  • Pre-upgrade smoke test on a clone
  • Forensic investigation after an incident

Backup layout in SOS

s3://plana-pulse-backups/
└── {subdomain}/
    ├── backup-YYYYMMDDTHHMMSS.dump.gz      # logical pg_dump
    ├── filestore-YYYYMMDDTHHMMSS.tar.gz    # filestore tarball
    └── insurance/                          # pre-destructive operations
        └── backup-YYYYMMDDTHHMMSS.dump.gz

Retention:

TierDaily backupsInsurance backups
Starter30 days7 days minimum
Pro90 days7 days minimum
Enterprise1 year7 days minimum

List available backups for a tenant:

bash
aws s3 ls "s3://plana-pulse-backups/acme/" \
  --endpoint-url=https://sos-bg-sof-1.exo.io

Two restore paths

PathUse whenDB target
In-place restoreTenant is broken; replace its dataThe tenant's own DB on pg01
Side-by-side cloneInvestigation, pre-upgrade staging, no downtime acceptableA new staging DB on pg01

In-place is destructive. Always start with side-by-side unless the customer has explicitly accepted the data-loss window between "last good backup" and "now".

Side-by-side clone (the default)

1. Pick the backup

bash
# List backups newest first
aws s3 ls "s3://plana-pulse-backups/acme/" \
  --endpoint-url=https://sos-bg-sof-1.exo.io | sort -r | head -10

Pick the timestamp you want to restore from. Example: backup-20260529T020000.dump.gz.

2. Apply the EnvironmentRestore CR

PLANA's restore path is declarative — Crossplane's EnvironmentRestore XR emits a Job that streams the backup from SOS to pg01.

yaml
apiVersion: planapulse.com/v1alpha1
kind: EnvironmentRestore
metadata:
  name: acme-staging-20260529
spec:
  sourceSlug: acme
  sourceBackup: backup-20260529T020000.dump.gz
  targetDb: acme-staging.planapulse.app
  targetNamespace: plana-odoo-18
  restoreFilestore: true
bash
kubectl apply -f acme-staging-restore.yaml
kubectl get environmentrestore acme-staging-20260529 -w

3. Watch the Job

bash
kubectl -n plana-odoo-18 get job -l environmentrestore=acme-staging-20260529
kubectl -n plana-odoo-18 logs -l environmentrestore=acme-staging-20260529 -f

The Job streams the dump directly from SOS to psql --create — no intermediate disk for tenants larger than the worker's /tmp. Typical runtime: 1–3 minutes per GB of dump.

4. Bring up a temporary HTTPRoute

Apply a one-off HTTPRoute pointing acme-staging.planapulse.app at worker-odoo:8069 in the same namespace. Once Crossplane has reconciled, the staging tenant is reachable at https://acme-staging.planapulse.app/web/login.

5. Smoke and hand off

Test the restored tenant. When done with the staging clone:

bash
kubectl delete environmentrestore acme-staging-20260529
# The Composition cleans up: drops the staging DB, removes the HTTPRoute,
# clears the filestore subdirectory.

In-place restore

This is the dangerous one. Always take an insurance backup first, even if you are restoring because the live tenant is broken.

1. Insurance backup of the live (possibly broken) DB

bash
kubectl -n backup create job --from=cronjob/acme-backup \
  acme-pre-restore-$(date +%Y%m%d%H%M%S)
kubectl -n backup logs job/acme-pre-restore-XXXXXXXX -f

Confirm it landed in s3://plana-pulse-backups/acme/insurance/.

2. Freeze the tenant

bash
kubectl -n plana-odoo-18 scale deploy worker-odoo --replicas=0

Wait for pods to terminate. This stops writes during the restore. Note that this affects every tenant in the namespace — coordinate the maintenance window accordingly. For a single-tenant restore on a shared worker pool, prefer the side-by-side clone instead.

3. Drop and recreate the DB

bash
psql -h pg01.planapulse.com -U plana -c \
  "DROP DATABASE IF EXISTS \"acme.planapulse.app\" WITH (FORCE)"

4. Restore from SOS

Use the same EnvironmentRestore CR pattern, but with the target DB equal to the live DB:

yaml
spec:
  sourceSlug: acme
  sourceBackup: backup-20260529T020000.dump.gz
  targetDb: acme.planapulse.app    # the live name
  targetNamespace: plana-odoo-18
  restoreFilestore: true
bash
kubectl apply -f acme-inplace-restore.yaml

5. Clear the live filestore subdirectory

bash
kubectl -n plana-odoo-18 exec deploy/worker-odoo -- \
  rm -rf /var/lib/odoo/filestore/acme.planapulse.app/*

Then let the EnvironmentRestore Composition extract the filestore tarball back into place.

6. Verify and unfreeze

bash
# Confirm DB is back
psql -h pg01 -U plana -d acme.planapulse.app -c "SELECT COUNT(*) FROM res_users"

# Confirm filestore is back
kubectl -n plana-odoo-18 exec deploy/worker-odoo -- \
  ls /var/lib/odoo/filestore/acme.planapulse.app | wc -l

# Unfreeze
kubectl -n plana-odoo-18 scale deploy worker-odoo --replicas=3
kubectl -n plana-odoo-18 rollout status deploy worker-odoo

7. Smoke

bash
curl -sI "https://acme.planapulse.app/web/health"
curl -s  "https://acme.planapulse.app/web/login" | grep -q 'Log in' && echo OK

8. Notify the customer

Use the workspace's Matrix room. Include:

  • The backup timestamp used (so they know what window of data was lost)
  • The window of downtime (start time, end time)
  • The retention of the insurance backup (7 days minimum)

Restoring a deleted tenant

If the tenant's PLANAClient CR has been deleted (rare — usually we soft-delete), the restore is a two-step:

  1. Recreate the PLANAClient CR (or TenantEnvironment if more fine-grained).
  2. Wait for the new tenant DB to be created (it will be empty).
  3. Apply an EnvironmentRestore CR pointing at the SOS backup with target = the new live DB name.

Common pitfalls

1. Restoring to a DB that already exists

Forgetting to DROP before the in-place restore causes psql --create to fail or to import into an existing schema, leaving you with a hybrid broken state. Always DROP (or restore side-by-side first).

2. Filestore tarball missing

Older backups may not have a filestore tarball (the filestore-backup CronJob was added later than the DB-backup CronJob). If restoreFilestore: true and the tarball is missing, the Job will fail. Drop the flag or recover the filestore from the latest snapshot you DO have.

3. Asset bundle 500s after restore

If you restore a v18 backup but the worker is running v18 with a newer base image, asset hashes diverge and /web/assets/* returns 500. Force asset regeneration:

bash
psql -h pg01 -U plana -d acme.planapulse.app -c \
  "DELETE FROM ir_attachment WHERE name LIKE '/web/assets/%'"
kubectl -n plana-odoo-18 rollout restart deploy worker-odoo

The worker rebuilds the assets on first request.

Where to read more

© PLANA Digital Ltd.