Audit logging
Audience
PLANA staff investigating incidents, customer security reviewers asking "what can you tell me about access".
PLANA records every state-changing Kubernetes API call to Loki through an in-cluster ValidatingAdmissionWebhook. The audit log covers everything that modifies cluster state — who did it, what changed, when.
What gets captured
| Capture | Detail |
|---|---|
| Who | Kubernetes user / ServiceAccount that made the call |
| What | Resource type + name + namespace |
| Action | Create / Update / Delete |
| Diff | The before/after of the resource (for updates) |
| Source IP | Where the API request came from |
| Timestamp | RFC3339 with millisecond precision |
| Object UID | For correlation across events |
Read-only operations (get, list, watch) are not captured — the volume would be unmanageable and reads don't change state.
The mechanism
An in-cluster Python webhook (audit-webhook) is registered as a ValidatingAdmissionWebhook matching all CREATE, UPDATE, DELETE operations on every resource. The webhook:
- Receives the admission request
- Logs the relevant fields to stdout in structured JSON
- Returns
allowed: true(it's an audit hook, not an admission gate) - Promtail picks up the stdout log and ships to Loki
The webhook lives in audit-webhook namespace, is reconciled by Flux, and is filter-heavy — uninteresting noise (e.g. Lease updates from the scheduler) is dropped at the webhook before reaching Loki.
Retention
| Tier | Loki retention |
|---|---|
| All clusters | 90 days rolling |
90 days is the regulatory minimum we've committed to. Longer retention is available on request for Enterprise customers (archive to SOS as JSONL).
Querying
LogQL queries against the audit-webhook log stream:
# All deletes in plana-odoo-18 in the last hour
{namespace="audit-webhook"} |= "DELETE" | json | namespace="plana-odoo-18" | __error__=""
# Everything chudomir did in the last day
{namespace="audit-webhook"} | json | user="chudomir@plana.solutions"
# All ConfigMap changes in crossplane-system
{namespace="audit-webhook"} | json
| resource="configmaps" | namespace="crossplane-system"Available in Grafana → Explore → Loki.
What we look at routinely
| Query | Frequency | Purpose |
|---|---|---|
| All DELETEs of HTTPRoutes | Every alert | Catch accidental tenant route removal |
| ClusterRoleBinding changes | Daily review | RBAC drift detection |
| Secret modifications outside SOPS reconcile | Daily review | Detect manual secret tampering |
| Compositions modified outside Flux | Hourly | Crossplane Composition drift |
| Failed admission webhook calls | Alert | Webhook reliability |
When customers ask for an audit excerpt
Procedure:
- Customer requests audit of actions on their resources for a date range via Matrix
- PLANA engineer queries Loki:
{namespace="audit-webhook"} | json | namespace=~"plana-odoo.*" | "subject"=~"<customer-slug>.*" - Export to JSONL
- Hand to customer via Matrix
Typical turnaround: 1 business day.
What's NOT in this audit log
- Read operations —
get,list,watchare not logged - Customer data inside their tenant — that's logged by their own PLANA Business Cloud audit (Odoo's
mail.messagelog on each record) - BOS user actions — logged in
PLANA:executions:{workspace}with 30-day retention; mirrored to Loki for 90 days - Network flow data — Calico would log it; we don't ship those logs to Loki today (too high-volume)
Where to read more
- Threat model
- Compliance — what regulators expect
- Secrets management — secret changes are audited
- Operations → Incident postmortems — when audit logs become evidence