Skip to content

Flux GitOps

Audience

PLANA staff. The single most important workflow rule on the platform: manifests in infra/k8s/ are owned by Flux. Edit them in git, not in the cluster.

PLANA uses Flux v2 to reconcile a designated subtree of the plana-pulse/infra repo into the cluster. Anything declared there is the authoritative source of truth; the cluster is just a rendered copy.

What Flux watches

SourcePathReconcile interval
git.planapulse.com/plana-pulse/infra (branch main)infra/k8s/1 minute

Inside infra/k8s/, several Kustomization resources own their own subtree:

KustomizationWatchesOwns
rbac-systeminfra/k8s/rbac-system/Cluster-level RBAC (ClusterRoles, ClusterRoleBindings, ServiceAccounts)
network-policiesinfra/k8s/network-policies/All cluster-wide NetworkPolicies
platform-routesinfra/k8s/platform-routes/HTTPRoutes for shared services
crossplaneinfra/k8s/crossplane/XRDs + Compositions (NOT XRs)
monitoringinfra/k8s/monitoring/Prometheus, Grafana, Alertmanager values
docsinfra/k8s/docs/docs-portal Deployment
(and more, one per product family)

Anything outside infra/k8s/ — including ad-hoc Jobs, debug pods, the XRs that operators apply day-to-day — is not managed by Flux. Flux will ignore it.

The rule

Never kubectl apply a resource that lives under infra/k8s/. Edit the manifest in git, merge to main, let Flux reconcile.

The reconciliation interval is 1 minute, so an out-of-band edit will be reverted within ~60 seconds. We have lost work this way; we have generated production incidents this way. The rule is rigid for good reason.

Three real cases where the rule was broken

DateWhat was applied directlyResult
2026-05-13 (Wave 5 cleanup)Cluster-scoped resources deleted alongside namespaced via --all flag10 ClusterRoleBindings lost; the SKS konnectivity outage that followed took 7 days to root-cause
2026-05-21worker-odoo replicas: 0 was the source-of-truth in a YAML file applied via kubectl apply to bump a quota; tenants went 503Pinned to replicas: 3 afterwards, PR #11
2026-05-22 (v18 cutover)HTTPRoute backendRefs.namespace edited directlyCrossplane reverted within seconds; the right fix was to patch the PLANAClient XR

How to make changes the right way

1. Clone or pull infra

bash
cd ~/projects/plana-pulse/infra
git pull

2. Edit the manifest

bash
$EDITOR k8s/<area>/<file>.yaml

For Crossplane Compositions and XRDs, prefer narrow, well-scoped edits. For NetworkPolicies, run them by the security reviewer.

3. Validate locally

bash
# Render the Kustomization
kustomize build k8s/<area>/

# Compare against the cluster
kustomize build k8s/<area>/ | kubectl diff -f -

kubectl diff will exit non-zero if the rendered manifests differ from the cluster — that is expected when you intentionally want a change. Read the diff to confirm it's only what you want.

4. PR

bash
git checkout -b feat/<short-name>
git add k8s/<area>/
git commit -m "<area>: <one-line summary>

<rationale>

Authored-by: PLANA Digital <dev@planapulse.ai>"
git push -u origin feat/<short-name>

Then open a PR on Forgejo. Tag a reviewer. Merge to main.

5. Wait for reconcile

After merge:

bash
flux get kustomization <kustomization-name>
flux logs --kind=Kustomization --name=<kustomization-name> --tail=50

You should see Applied revision: main@sha1:<sha> within ~60 seconds.

6. Verify in the cluster

bash
kubectl describe <kind>/<name> | head -30

When Flux is paused

Sometimes you need to test a Composition change in the cluster without Flux reverting it. The pattern:

bash
# Pause the Kustomization
flux suspend kustomization crossplane

# Apply your test change
kubectl apply -f my-test-composition.yaml

# ... test ...

# Resume — Flux will revert your test
flux resume kustomization crossplane

Never leave Flux suspended. When you're done, resume immediately — even at the end of the day. A suspended Kustomization means the cluster drifts silently from the git source of truth.

Resource ownership

The kustomize rendering applies a label app.kubernetes.io/managed-by=flux to every reconciled resource. This is the easiest way to tell at a glance whether a resource is Flux-owned:

bash
kubectl get <resource> <name> -o yaml | grep managed-by

If it's labelled flux, edit it in git. If it's not labelled (or labelled differently — crossplane for Composition-emitted resources), the appropriate workflow depends on the owner.

managed-by labelEdit via
fluxGit PR to infra/k8s/
crossplanePatch the parent XR
(none)Direct kubectl edit (rare — ad-hoc resources)

Drift detection

A nightly drift-check Job compares the rendered Kustomizations against the cluster state and emits an alert to the Matrix #alerts room if it finds an unexplained delta. This catches:

  • Manual kubectl apply someone forgot to revert
  • Manual kubectl edit (the change is silent until drift-check)
  • Compositions emitting a resource that conflicts with a Flux-managed one

When the alert fires, the resolution is almost always "revert the manual change, recreate it via git".

Bootstrapping Flux from scratch

In the worst-case scenario (cluster rebuilt from nothing), the bootstrap procedure is in infra/docs/runbooks/flux-bootstrap.md. Summary:

  1. Build SKS cluster
  2. flux install
  3. flux create source git plana-pulse-infra --url=ssh://git@git.planapulse.com:22/plana-pulse/infra --branch=main --secret-ref=flux-ssh
  4. flux create kustomization rbac-system --source=GitRepository/plana-pulse-infra --path=./k8s/rbac-system --prune=true --interval=1m
  5. Repeat for each Kustomization

After bootstrap, Flux pulls every resource into the cluster autonomously.

What Flux does NOT do

  • It does not run XRs (PLANAClient, TenantUpgrade, etc.) — those are applied by operators or CI. Flux owns the Compositions; operators own the instances.
  • It does not provision the SKS cluster itself — that's OpenTofu in infra/terraform/.
  • It does not manage SOPS secrets directly; secrets are decrypted into the cluster via the SOPS-secrets-operator that reads the encrypted YAML and produces a Secret.
  • It does not deploy docs-portal images — that's the .forgejo/workflows/pipeline.yml workflow with a kubectl set image step. (This is one of the few legacy push-deploy paths still in use; the long-term plan is to flip it to GitOps via FluxCD ImageUpdate.)

Where to read more

© PLANA Digital Ltd.