Flux GitOps
Audience
PLANA staff. The single most important workflow rule on the platform: manifests in infra/k8s/ are owned by Flux. Edit them in git, not in the cluster.
PLANA uses Flux v2 to reconcile a designated subtree of the plana-pulse/infra repo into the cluster. Anything declared there is the authoritative source of truth; the cluster is just a rendered copy.
What Flux watches
| Source | Path | Reconcile interval |
|---|---|---|
git.planapulse.com/plana-pulse/infra (branch main) | infra/k8s/ | 1 minute |
Inside infra/k8s/, several Kustomization resources own their own subtree:
| Kustomization | Watches | Owns |
|---|---|---|
rbac-system | infra/k8s/rbac-system/ | Cluster-level RBAC (ClusterRoles, ClusterRoleBindings, ServiceAccounts) |
network-policies | infra/k8s/network-policies/ | All cluster-wide NetworkPolicies |
platform-routes | infra/k8s/platform-routes/ | HTTPRoutes for shared services |
crossplane | infra/k8s/crossplane/ | XRDs + Compositions (NOT XRs) |
monitoring | infra/k8s/monitoring/ | Prometheus, Grafana, Alertmanager values |
docs | infra/k8s/docs/ | docs-portal Deployment |
| (and more, one per product family) |
Anything outside infra/k8s/ — including ad-hoc Jobs, debug pods, the XRs that operators apply day-to-day — is not managed by Flux. Flux will ignore it.
The rule
Never
kubectl applya resource that lives underinfra/k8s/. Edit the manifest in git, merge tomain, let Flux reconcile.
The reconciliation interval is 1 minute, so an out-of-band edit will be reverted within ~60 seconds. We have lost work this way; we have generated production incidents this way. The rule is rigid for good reason.
Three real cases where the rule was broken
| Date | What was applied directly | Result |
|---|---|---|
| 2026-05-13 (Wave 5 cleanup) | Cluster-scoped resources deleted alongside namespaced via --all flag | 10 ClusterRoleBindings lost; the SKS konnectivity outage that followed took 7 days to root-cause |
| 2026-05-21 | worker-odoo replicas: 0 was the source-of-truth in a YAML file applied via kubectl apply to bump a quota; tenants went 503 | Pinned to replicas: 3 afterwards, PR #11 |
| 2026-05-22 (v18 cutover) | HTTPRoute backendRefs.namespace edited directly | Crossplane reverted within seconds; the right fix was to patch the PLANAClient XR |
How to make changes the right way
1. Clone or pull infra
cd ~/projects/plana-pulse/infra
git pull2. Edit the manifest
$EDITOR k8s/<area>/<file>.yamlFor Crossplane Compositions and XRDs, prefer narrow, well-scoped edits. For NetworkPolicies, run them by the security reviewer.
3. Validate locally
# Render the Kustomization
kustomize build k8s/<area>/
# Compare against the cluster
kustomize build k8s/<area>/ | kubectl diff -f -kubectl diff will exit non-zero if the rendered manifests differ from the cluster — that is expected when you intentionally want a change. Read the diff to confirm it's only what you want.
4. PR
git checkout -b feat/<short-name>
git add k8s/<area>/
git commit -m "<area>: <one-line summary>
<rationale>
Authored-by: PLANA Digital <dev@planapulse.ai>"
git push -u origin feat/<short-name>Then open a PR on Forgejo. Tag a reviewer. Merge to main.
5. Wait for reconcile
After merge:
flux get kustomization <kustomization-name>
flux logs --kind=Kustomization --name=<kustomization-name> --tail=50You should see Applied revision: main@sha1:<sha> within ~60 seconds.
6. Verify in the cluster
kubectl describe <kind>/<name> | head -30When Flux is paused
Sometimes you need to test a Composition change in the cluster without Flux reverting it. The pattern:
# Pause the Kustomization
flux suspend kustomization crossplane
# Apply your test change
kubectl apply -f my-test-composition.yaml
# ... test ...
# Resume — Flux will revert your test
flux resume kustomization crossplaneNever leave Flux suspended. When you're done, resume immediately — even at the end of the day. A suspended Kustomization means the cluster drifts silently from the git source of truth.
Resource ownership
The kustomize rendering applies a label app.kubernetes.io/managed-by=flux to every reconciled resource. This is the easiest way to tell at a glance whether a resource is Flux-owned:
kubectl get <resource> <name> -o yaml | grep managed-byIf it's labelled flux, edit it in git. If it's not labelled (or labelled differently — crossplane for Composition-emitted resources), the appropriate workflow depends on the owner.
managed-by label | Edit via |
|---|---|
flux | Git PR to infra/k8s/ |
crossplane | Patch the parent XR |
| (none) | Direct kubectl edit (rare — ad-hoc resources) |
Drift detection
A nightly drift-check Job compares the rendered Kustomizations against the cluster state and emits an alert to the Matrix #alerts room if it finds an unexplained delta. This catches:
- Manual
kubectl applysomeone forgot to revert - Manual
kubectl edit(the change is silent until drift-check) - Compositions emitting a resource that conflicts with a Flux-managed one
When the alert fires, the resolution is almost always "revert the manual change, recreate it via git".
Bootstrapping Flux from scratch
In the worst-case scenario (cluster rebuilt from nothing), the bootstrap procedure is in infra/docs/runbooks/flux-bootstrap.md. Summary:
- Build SKS cluster
flux installflux create source git plana-pulse-infra --url=ssh://git@git.planapulse.com:22/plana-pulse/infra --branch=main --secret-ref=flux-sshflux create kustomization rbac-system --source=GitRepository/plana-pulse-infra --path=./k8s/rbac-system --prune=true --interval=1m- Repeat for each Kustomization
After bootstrap, Flux pulls every resource into the cluster autonomously.
What Flux does NOT do
- It does not run XRs (PLANAClient, TenantUpgrade, etc.) — those are applied by operators or CI. Flux owns the Compositions; operators own the instances.
- It does not provision the SKS cluster itself — that's OpenTofu in
infra/terraform/. - It does not manage SOPS secrets directly; secrets are decrypted into the cluster via the SOPS-secrets-operator that reads the encrypted YAML and produces a
Secret. - It does not deploy
docs-portalimages — that's the.forgejo/workflows/pipeline.ymlworkflow with akubectl set imagestep. (This is one of the few legacy push-deploy paths still in use; the long-term plan is to flip it to GitOps via FluxCD ImageUpdate.)
Where to read more
- Crossplane — why XRs are not managed by Flux even though Compositions are
- Operations → CI/CD — the push-deploy paths that complement Flux
- Operations → Incident postmortems — the cluster-wide outages where breaking the Flux rule was the root cause