Restoring from backup
Audience
PLANA staff. Customer-facing recovery information is at Plana extras → Backups.
PLANA takes a daily logical pg_dump of every tenant DB plus a tarball of its filestore, both to Exoscale SOS. This runbook covers restoring a tenant from one of those backups.
When to use this
- Data corruption on the live tenant
- Accidental delete the customer can't recover via Odoo's undo
- Pre-upgrade smoke test on a clone
- Forensic investigation after an incident
Backup layout in SOS
s3://plana-pulse-backups/
└── {subdomain}/
├── backup-YYYYMMDDTHHMMSS.dump.gz # logical pg_dump
├── filestore-YYYYMMDDTHHMMSS.tar.gz # filestore tarball
└── insurance/ # pre-destructive operations
└── backup-YYYYMMDDTHHMMSS.dump.gzRetention:
| Tier | Daily backups | Insurance backups |
|---|---|---|
| Starter | 30 days | 7 days minimum |
| Pro | 90 days | 7 days minimum |
| Enterprise | 1 year | 7 days minimum |
List available backups for a tenant:
aws s3 ls "s3://plana-pulse-backups/acme/" \
--endpoint-url=https://sos-bg-sof-1.exo.ioTwo restore paths
| Path | Use when | DB target |
|---|---|---|
| In-place restore | Tenant is broken; replace its data | The tenant's own DB on pg01 |
| Side-by-side clone | Investigation, pre-upgrade staging, no downtime acceptable | A new staging DB on pg01 |
In-place is destructive. Always start with side-by-side unless the customer has explicitly accepted the data-loss window between "last good backup" and "now".
Side-by-side clone (the default)
1. Pick the backup
# List backups newest first
aws s3 ls "s3://plana-pulse-backups/acme/" \
--endpoint-url=https://sos-bg-sof-1.exo.io | sort -r | head -10Pick the timestamp you want to restore from. Example: backup-20260529T020000.dump.gz.
2. Apply the EnvironmentRestore CR
PLANA's restore path is declarative — Crossplane's EnvironmentRestore XR emits a Job that streams the backup from SOS to pg01.
apiVersion: planapulse.com/v1alpha1
kind: EnvironmentRestore
metadata:
name: acme-staging-20260529
spec:
sourceSlug: acme
sourceBackup: backup-20260529T020000.dump.gz
targetDb: acme-staging.planapulse.app
targetNamespace: plana-odoo-18
restoreFilestore: truekubectl apply -f acme-staging-restore.yaml
kubectl get environmentrestore acme-staging-20260529 -w3. Watch the Job
kubectl -n plana-odoo-18 get job -l environmentrestore=acme-staging-20260529
kubectl -n plana-odoo-18 logs -l environmentrestore=acme-staging-20260529 -fThe Job streams the dump directly from SOS to psql --create — no intermediate disk for tenants larger than the worker's /tmp. Typical runtime: 1–3 minutes per GB of dump.
4. Bring up a temporary HTTPRoute
Apply a one-off HTTPRoute pointing acme-staging.planapulse.app at worker-odoo:8069 in the same namespace. Once Crossplane has reconciled, the staging tenant is reachable at https://acme-staging.planapulse.app/web/login.
5. Smoke and hand off
Test the restored tenant. When done with the staging clone:
kubectl delete environmentrestore acme-staging-20260529
# The Composition cleans up: drops the staging DB, removes the HTTPRoute,
# clears the filestore subdirectory.In-place restore
This is the dangerous one. Always take an insurance backup first, even if you are restoring because the live tenant is broken.
1. Insurance backup of the live (possibly broken) DB
kubectl -n backup create job --from=cronjob/acme-backup \
acme-pre-restore-$(date +%Y%m%d%H%M%S)
kubectl -n backup logs job/acme-pre-restore-XXXXXXXX -fConfirm it landed in s3://plana-pulse-backups/acme/insurance/.
2. Freeze the tenant
kubectl -n plana-odoo-18 scale deploy worker-odoo --replicas=0Wait for pods to terminate. This stops writes during the restore. Note that this affects every tenant in the namespace — coordinate the maintenance window accordingly. For a single-tenant restore on a shared worker pool, prefer the side-by-side clone instead.
3. Drop and recreate the DB
psql -h pg01.planapulse.com -U plana -c \
"DROP DATABASE IF EXISTS \"acme.planapulse.app\" WITH (FORCE)"4. Restore from SOS
Use the same EnvironmentRestore CR pattern, but with the target DB equal to the live DB:
spec:
sourceSlug: acme
sourceBackup: backup-20260529T020000.dump.gz
targetDb: acme.planapulse.app # the live name
targetNamespace: plana-odoo-18
restoreFilestore: truekubectl apply -f acme-inplace-restore.yaml5. Clear the live filestore subdirectory
kubectl -n plana-odoo-18 exec deploy/worker-odoo -- \
rm -rf /var/lib/odoo/filestore/acme.planapulse.app/*Then let the EnvironmentRestore Composition extract the filestore tarball back into place.
6. Verify and unfreeze
# Confirm DB is back
psql -h pg01 -U plana -d acme.planapulse.app -c "SELECT COUNT(*) FROM res_users"
# Confirm filestore is back
kubectl -n plana-odoo-18 exec deploy/worker-odoo -- \
ls /var/lib/odoo/filestore/acme.planapulse.app | wc -l
# Unfreeze
kubectl -n plana-odoo-18 scale deploy worker-odoo --replicas=3
kubectl -n plana-odoo-18 rollout status deploy worker-odoo7. Smoke
curl -sI "https://acme.planapulse.app/web/health"
curl -s "https://acme.planapulse.app/web/login" | grep -q 'Log in' && echo OK8. Notify the customer
Use the workspace's Matrix room. Include:
- The backup timestamp used (so they know what window of data was lost)
- The window of downtime (start time, end time)
- The retention of the insurance backup (7 days minimum)
Restoring a deleted tenant
If the tenant's PLANAClient CR has been deleted (rare — usually we soft-delete), the restore is a two-step:
- Recreate the
PLANAClientCR (orTenantEnvironmentif more fine-grained). - Wait for the new tenant DB to be created (it will be empty).
- Apply an
EnvironmentRestoreCR pointing at the SOS backup with target = the new live DB name.
Common pitfalls
1. Restoring to a DB that already exists
Forgetting to DROP before the in-place restore causes psql --create to fail or to import into an existing schema, leaving you with a hybrid broken state. Always DROP (or restore side-by-side first).
2. Filestore tarball missing
Older backups may not have a filestore tarball (the filestore-backup CronJob was added later than the DB-backup CronJob). If restoreFilestore: true and the tarball is missing, the Job will fail. Drop the flag or recover the filestore from the latest snapshot you DO have.
3. Asset bundle 500s after restore
If you restore a v18 backup but the worker is running v18 with a newer base image, asset hashes diverge and /web/assets/* returns 500. Force asset regeneration:
psql -h pg01 -U plana -d acme.planapulse.app -c \
"DELETE FROM ir_attachment WHERE name LIKE '/web/assets/%'"
kubectl -n plana-odoo-18 rollout restart deploy worker-odooThe worker rebuilds the assets on first request.
Where to read more
- Provisioning a tenant
- Upgrading a tenant
- Architecture → Data stores — backup locations and retention
- Plana extras → Backups — customer-facing version of this content