Kubernetes Cluster

How our GKE clusters are set up: one Autopilot cluster per environment, namespace-based tenancy, Workload Identity for credentials, and a GitOps split between Config Sync (platform config) and Cloud Deploy (application rollouts).

Cluster model

One Autopilot cluster per environment

u2i-nonprod and u2i-prod, both in europe-west1. Autopilot manages nodes for us — there are no node pools to size or patch, GKE schedules pods on right-sized nodes automatically.

Shared, not per-app

Every application runs as a namespace on the shared cluster for its environment, rather than getting a dedicated cluster. This keeps cluster count low and makes cross-cutting policy (network, RBAC, Config Sync) apply uniformly.

Private nodes, restricted control plane

Nodes have no public IPs and egress through Cloud NAT. The Kubernetes API server only accepts connections from an authorized network list.

Namespaces & isolation

Naming convention

{app}-dev, {app}-qa, {app}-prod for the standard lifecycle stages, and {app}-pr-{N} for ephemeral pull-request preview environments.

Namespace as the tenancy boundary

RBAC, resource quotas, and network policy are all scoped per namespace. An app team only ever gets access to its own app's namespaces, not the cluster at large.

GitOps reconciliation

Namespaces and their baseline resources (RBAC, ServiceAccounts, quotas) are declared in the u2i-tenant-infra repo and reconciled onto the cluster by Config Sync — nobody kubectl-applies namespace scaffolding by hand.

Identity: Workload Identity

No service account keys on the cluster

Every pod that needs GCP access runs under a Kubernetes ServiceAccount bound to a GCP service account via Workload Identity Federation. The pod authenticates as that GSA with no key material stored anywhere.

Per-app GSAs

Application workloads use a GSA scoped to that app (e.g. permissions to read its own Secret Manager secrets or write to its own GCS bucket) rather than a broad, shared identity.

GitOps delivery: Config Sync & Cloud Deploy

Fleet + Config Sync

Both clusters are Fleet members. Config Sync continuously reconciles cluster-wide and per-tenant configuration (namespaces, RBAC, quotas) from an OCI artifact published out of the u2i-tenant-infra repo.

Cloud Deploy for application workloads

Application Deployments themselves are rolled out by Cloud Deploy, not Config Sync — it renders each app's Helm chart via Skaffold and applies it with a per-app {app}-cloud-deploy service account. Nonprod rolls out automatically; production requires a manual approval gate.

Binary Authorization

Enforced at the Fleet level — only container images that pass the configured attestation policy can be admitted onto the cluster.

Traffic & scaling

Gateway API for ingress

Apps register a route via a shared Gateway rather than each managing its own Ingress/load balancer. The app's Helm chart declares a parentRef into the platform's Gateway.

KEDA for autoscaling

Horizontal scaling beyond plain CPU/memory (e.g. queue depth, custom metrics) is handled by KEDA ScaledObjects, templated from the shared workload chart.

Secrets & storage on the cluster

External Secrets Operator (ESO)

Apps declare an ExternalSecret resource; ESO pulls the actual value from GCP Secret Manager at sync time using the pod's Workload Identity, so no secret value is ever committed to Git or held in a raw Kubernetes Secret manifest.

GCS via the GCSFuse CSI driver

Apps that need bucket-backed storage (e.g. RetroTool's uploaded board backgrounds) mount a GCS bucket directly on the pod using GKE's native gcsfuse.csi.storage.gke.io driver — annotated on the pod spec, no sidecar container involved. Cache sizes are tuned per environment in Helm values.

Config Connector for GCP resources

Buckets and other GCP resources an app needs are declared as Kubernetes-native CRDs (e.g. StorageBucket) and reconciled into real GCP resources by Config Connector, keeping infra declarations next to the workloads that use them.

Glossary

GKE Autopilot: Google's fully-managed GKE mode. Google handles node provisioning, sizing, and patching; you only declare workloads and their resource requests.
Namespace: A Kubernetes-native partition inside a cluster. Used here as the per-app, per-environment isolation boundary ({app}-dev, {app}-prod, etc.) instead of separate clusters.
Workload Identity (Federation): The mechanism that lets a Kubernetes ServiceAccount authenticate as a GCP service account without any key file — GKE issues short-lived tokens tied to the pod's identity.
Fleet: Google Cloud's grouping of GKE clusters for centrally-managed features — used here to enable Config Sync and Binary Authorization consistently across u2i-nonprod and u2i-prod.
Config Sync: A GitOps operator that continuously reconciles cluster configuration from a Git-backed source (published as an OCI artifact) — owns namespace scaffolding and tenant-wide policy, not individual app Deployments.
Cloud Deploy: Google's managed continuous-delivery service. Executes the actual rollout of an app's rendered manifests to a target cluster, with staged promotion (dev → qa → prod) and manual approval gates.
Skaffold: A CLI/config tool that renders a Helm chart (or raw manifests) with environment-specific values. Cloud Deploy calls it during the "render" phase of a rollout.
Helm chart: A packaged, templated set of Kubernetes manifests. Each app owns a chart under helm/{app-name}/, with per-environment values files.
Gateway API: The successor to Ingress for routing external traffic into a cluster. Apps attach to a shared Gateway via a parentRef instead of provisioning their own load balancer.
KEDA: Kubernetes Event-Driven Autoscaling. Scales workloads based on external metrics (queue length, custom metrics), beyond plain CPU/memory-based HPA.
External Secrets Operator (ESO): An operator that syncs values from an external secret store (GCP Secret Manager) into Kubernetes at runtime, referenced via an ExternalSecret resource — no plaintext secrets in Git.
GCSFuse CSI driver: GKE's native Container Storage Interface driver (gcsfuse.csi.storage.gke.io) for mounting a GCS bucket as a filesystem directly into a pod, without a sidecar container.
Config Connector: A Kubernetes operator that lets you declare GCP resources (e.g. a GCS bucket) as Kubernetes CRDs, reconciling them into real cloud resources — infra-as-Kubernetes-manifests instead of separate Terraform for per-app resources.
Binary Authorization: An admission control policy that only allows container images meeting a defined attestation policy to be deployed onto the cluster.