Deploying a New App
Everything is Git-ops driven: infrastructure changes and production deploys both go through a Cloud Deploy approval gate. This walks through onboarding a brand-new application end to end.
Prefer a filled-in example? See Deployment Example: RetroTool for a real deployment with actual repos, files, resource names, and commands.
1. Containerize the app
Start on the app side: make it a container the platform can run on GKE Autopilot. The image is built by the platform's Cloud Build (so it is automatically Binary Authorization-attested) and pushed to {app}-images. Autopilot enforces a hardened runtime contract, and the chart's Deployment pins a non-root securityContext — so the image has to actually run as UID 1001.
- Run as non-root UID/GID 1001 — the chart sets runAsNonRoot: true, runAsUser/runAsGroup/fsGroup: 1001. Create that user in the Dockerfile (e.g. adduser -D -u 1001 app) and USER it.
- No elevated privileges: the container runs with allowPrivilegeEscalation: false, all Linux capabilities dropped, and seccompProfile RuntimeDefault. Don't rely on root-only paths or extra caps.
- Listen on the port the chart targets — 3000 for web apps (containerPort name: http) — and bind to all interfaces (0.0.0.0, e.g. via the HOSTNAME env var), not just localhost, or the cluster can't reach the pod. Honor the PORT env var.
- Use tini (or another real init) as PID 1 for correct signal handling and zombie reaping.
- Set resource requests/limits — Autopilot bills and schedules on requests; they are required, not optional.
- Keep it small and production-only (multi-stage build, slim base such as node:22-alpine, NODE_ENV=production).
FROM node:22-alpine WORKDIR /usr/server/app RUN apk add --no-cache tini RUN adduser -D -u 1001 app COPY package.json package-lock.json ./ RUN npm ci COPY . . RUN npm run build ENV NODE_ENV=production USER app EXPOSE 3000 ENTRYPOINT ["/sbin/tini"] CMD ["npm", "run", "start"]Useful commands
docker build -t {app}:local . docker run -d --name {app}-test -p {host-port}:3000 {app}:local curl -s -o /dev/null -w "%{http_code}\n" localhost:{host-port}/ docker exec {app}-test id -u # expect 1001, not 0 docker rm -f {app}-test2. Structure the app repository
Alongside the Dockerfile from step 1, add a workload Helm chart (depending on the shared gke-tenant-workload chart) with per-environment value overrides, plus the Cloud Build definitions for the dev, qa, and preview flows.
{app-repo}/ ├── Dockerfile ├── helm/{app-name}/ │ ├── Chart.yaml # depends on gke-tenant-workload │ ├── values.yaml │ ├── values/{dev,qa,preview,prod}.yaml │ └── templates/ └── deploy/ ├── cloudbuild/{dev,qa,preview}.yaml ├── skaffold.yaml # used by Cloud Deploy's render phase └── k8s/reposync-{env}.yaml3. Write the Helm chart
A Helm chart is a templated bundle of Kubernetes manifests — it declares how your app runs on the cluster. You don't write it all from scratch: your chart declares a dependency on the shared gke-tenant-workload chart, which supplies the platform-standard pieces (ServiceAccount + Workload Identity, the gateway ListenerSet/Certificate, ExternalSecrets, KEDA autoscaling). You add a thin chart — a Deployment, Service and HTTPRoute plus one values file per environment — and inherit the conventions. Nobody runs `helm install` by hand: at build time Cloud Build packages the chart and pushes it as an OCI artifact to the config-sync registry (oci://us-central1-docker.pkg.dev/u2i-bootstrap/config-sync); the RepoSync manifest points Config Sync at that chart + version, and Config Sync pulls it and applies it into your namespace.
- Declare the gke-tenant-workload dependency (pulled from the u2i-bootstrap/helm-charts OCI registry via `helm dependency build`; CI does the same at build time).
- Set the identity values under global: — appName, namespace, environment, stage, projectId, clusterName, clusterLocation, gatewayMode — in every environment's values file.
- Provide hostnames (the public URL) and serviceAccount.enabled: true, and enable a certificate (clusterIssuer for eg, certMap for managed-lb).
- Choose gatewayMode per environment: eg (in-cluster Envoy, nonprod dev/qa) or managed-lb (Cloud LB, prod).
- Add the templates the subchart doesn't render: a Deployment (with the non-root securityContext matching your image), a ClusterIP Service, and an HTTPRoute using the gke-tenant-workload.parentRef helper.
- Enable KEDA scale-to-zero in nonprod (a workloads entry with keda.maxReplicas + scaledownPeriod) so idle envs cost nothing; leave it off in prod (keda.enabled: false) so it stays always-on.
- When KEDA manages a workload, do NOT also set a static replicas on the Deployment — gate it out, otherwise Config Sync keeps resetting the count and fights the autoscaler.
- Keep one values file per environment under values/ (dev, qa, preview, prod).
# Chart.yaml — depend on the shared platform chart dependencies: - name: gke-tenant-workload version: 0.11.2 repository: oci://europe-west1-docker.pkg.dev/u2i-bootstrap/helm-charts # values/dev.yaml — the platform contract lives under the subchart key gke-tenant-workload: global: appName: {app} namespace: {app}-dev environment: dev stage: dev gatewayMode: eg # managed-lb in prod projectId: c-u2i-nonprod clusterName: u2i-nonprod clusterLocation: europe-west1 hostnames: [dev.{app}.u2i.dev] serviceAccount: { enabled: true } certificate: { enabled: true, clusterIssuer: letsencrypt-prod } workloads: # KEDA scale-to-zero (nonprod only) - name: {app} serviceName: {app} port: 80 keda: { maxReplicas: 1, scaledownPeriod: 600 }Useful commands
gcloud auth print-access-token | \ helm registry login europe-west1-docker.pkg.dev -u oauth2accesstoken --password-stdin helm dependency build helm/{app} helm template {app} helm/{app} -f helm/{app}/values/{env}.yaml4. Register the app in the foundation config
Now switch to the infrastructure side. Add an entry for the app under the tenant's apps map in u2i-gcp-infrastructure/foundation/4-tenants/terramate.tm.hcl, then run `terramate generate` to regenerate the derived terraform.tfvars — commit both files. Only set approver_group if that Google group already exists; otherwise use null and wire the prod gate later (binding a non-existent group breaks the IAM apply) — the same pattern compensation and u2i-comp-portal use.
- App key: max 20 characters. Foundation generates a {key}-deploy-sa service account, and GCP caps service-account IDs at 30 chars, so 30 − len("-deploy-sa") = 20. (Example: u2i-gcp-platform-docs is 21 → one over; use gcp-platform-docs instead.)
- The app key need NOT equal the GitHub repo name — github_repo is a separate field (e.g. key retrotool-app, repo retrotool). Keep the key short and DNS-safe; it also becomes namespaces, registry names, and the hostname base.
- While you're in this repo, also add a google_artifact_registry_repository_iam_member granting {app}-ci reader on shared-images-nonprod's docker-images repo, in foundation/4-tenants/artifact-registry-access.tf. This isn't created automatically by the app entry above — every app that actually builds has its own hand-added copy of this grant. Skip it and the first build fails at step 0 pulling the shared build-lib image with an 'artifactregistry.repositories.downloadArtifacts denied' error — easy to add now, annoying to debug later.
{app-key} = { display_name = "..." github_repo = "{repo-name}" branch = "main" create_ar_repos = true # gcp-{app}-prodsupport@u2i.com once that group exists; else null approver_group = null } # foundation/4-tenants/artifact-registry-access.tf — not created by the # entry above; add it in the same PR or the first build fails at step 0. resource "google_artifact_registry_repository_iam_member" "shared_images_{app}_ci_reader" { project = local.shared_images_project_id location = local.shared_images_region repository = local.shared_images_repo role = "roles/artifactregistry.reader" member = "serviceAccount:${google_service_account.app_ci["{app-key}"].email}" }Useful commands
terramate generate # regenerate terraform.tfvars cd foundation/4-tenants terraform init -backend=false -input=false && terraform validate5. Apply the foundation change
Applying runs through two separate Cloud Build runs — never locally. It is worth knowing which build does what:
- PR opened → pr-terraform-validation fires and runs terraform plan, posting a terraform-plan/foundation check. It shows the diff and catches errors (this is what flags a too-long service-account name, an invalid value, etc.) — but it does NOT apply anything.
- Merge to main → a second build (terramate-docker-images) fires and creates a release on the terramate-infrastructure Cloud Deploy delivery pipeline.
- That release waits at a manual approval gate; approving the rollout is what actually runs terraform apply. So a green PR check means "the plan is valid", not "it's live".
- The apply then provisions, via foundation/4-tenants/app-cicd.tf: three Cloud Build triggers ({app}-dev-trigger, {app}-qa-deployment, {app}-preview-deployment), the {app}-ci and {app}-deploy-sa service accounts, Artifact Registry repos ({app}-images, {app}-cache) in nonprod and prod, and two Cloud Deploy pipelines ({app}-dev-pipeline, {app}-qa-prod-pipeline).
- A first-time apply for a brand-new app can fail on the preview trigger with "user does not have impersonation permission on ... {app}-ci@..." — this is IAM propagation lag, not a real permissions gap (the {app}-ci service account was created earlier in the same apply and hasn't fully propagated yet). Everything else in the apply has already been created; just retry.
- Retrying the failed job (gcloud deploy rollouts retry-job) needs the clouddeploy.operator role — approver alone (what you use to approve the rollout) is not enough. If you don't have it via PAM, the simplest fix is to re-run the merge trigger instead (gcloud builds triggers run terramate-docker-images --branch=main) to create a fresh release and approve that one; Terraform is idempotent so it only creates what's still missing.
Useful commands
# PR status and the terraform-plan check gh pr checks {pr-number} --repo {org}/{repo} gh pr view {pr-number} --repo {org}/{repo} --json state,mergeable,mergeStateStatus # after merge: find the release + rollout it created gcloud deploy releases list --delivery-pipeline={pipeline} --region={region} --project={project} --limit=3 gcloud deploy rollouts list --release={release} --delivery-pipeline={pipeline} --region={region} --project={project} gcloud deploy rollouts describe {rollout} --release={release} --delivery-pipeline={pipeline} --region={region} --project={project} # inspect the underlying Cloud Build run gcloud builds describe {build-id} --project={project} --region={region} # retry a failed job (needs clouddeploy.operator, not just approver) gcloud deploy rollouts retry-job {rollout} --release={release} --delivery-pipeline={pipeline} \ --region={region} --project={project} --phase-id={phase} --job-id={job}6. Declare namespaces, identity & buckets in u2i-tenant-infra
The namespaces the app runs in, its Kubernetes-to-GCP Workload Identity bindings, and any GCS buckets are declared in the separate u2i-tenant-infra repo (reconciled by Config Sync), not by the app's own pipeline. Add a subchart under chart/charts/apps/ depending on gke-tenant-foundation (+ gke-tenant-storage if it needs buckets), and enable it in k8s/{env}/rootsync.yaml. If the app needs its own public hostname, this is also where its DNS subzone gets created (gke-tenant-foundation.dns.publicZone) — see the next step.
Useful commands
# no kubectl access to a private cluster? verify via the real GCP resources instead gcloud iam service-accounts describe {app}-k8s@{project}.iam.gserviceaccount.com --project={project} gcloud iam service-accounts get-iam-policy {app}-k8s@{project}.iam.gserviceaccount.com --project={project} gcloud dns managed-zones describe {app}-u2i-dev --project={project} # if you do have cluster access gcloud container clusters get-credentials {cluster} --region {region} --project {project} kubectl get namespace {app} {app}-dev {app}-qa kubectl get rootsync u2i-infra -n config-management-system -o jsonpath='{.status.sync.commit}'7. Delegate the app's DNS subzone from the root zone
A hostname is not automatic — it has to be declared. u2i-gcp-infrastructure's foundation/6-dns/main.tf owns the root u2i.dev zone and delegates each app's subzone to it via an NS record. That delegation reads the nameserver set from the app's live subzone (data.google_dns_managed_zone, not a hardcoded value) — so it can only be added after step 6 has actually created the subzone; the data source would fail to resolve otherwise. Also add the app's hostname to domainFilters in u2i-tenant-infra's rootsync-external-dns.yaml, so per-environment A records (dev./qa.*) publish automatically from each HTTPRoute. This step only matters for public reachability — verifying the pod itself comes up (next step) doesn't need it; kubectl port-forward works without any DNS in place.
Useful commands
# confirm the delegation record in the root zone gcloud dns record-sets list --zone={root-zone} --project={dns-project} --name="{app}.u2i.dev." # confirm it resolves publicly (not just internally) dig +short NS {app}.u2i.dev dig +short dev.{app}.u2i.dev @8.8.8.8 # empty until a real HTTPRoute exists (step 8)8. Build & release: branch, tag, or PR
A push to main fires {app}-dev-trigger, which builds the image, pushes it to {app}-images, and creates a release in {app}-dev-pipeline (deploying to the {app}-dev namespace). Cutting a version tag (v*) fires {app}-qa-deployment, which creates a release in {app}-qa-prod-pipeline.
Useful commands
# watch the build the trigger just fired gcloud builds list --project={project} --filter="substitutions.TRIGGER_NAME={app}-dev-trigger" --limit=3 gcloud builds log {build-id} --project={project} # verify the pod without needing DNS gcloud container clusters get-credentials {cluster} --region {region} --project {project} kubectl get pods -n {app}-dev kubectl logs -n {app}-dev deploy/{app} -f kubectl port-forward -n {app}-dev svc/{app} {local-port}:809. Automatic qa/nonprod rollout, manual prod approval
Cloud Deploy rolls the release out to the qa (nonprod) target automatically. Promotion to the prod target is gated behind a manual approval from the approver_group configured in step 4. The {app}-deploy-sa service account executes the rollout against the target cluster.
Useful commands
gcloud deploy rollouts list --release={release} --delivery-pipeline={app}-qa-prod-pipeline \ --region={region} --project={project} gcloud deploy rollouts approve {rollout} --release={release} --delivery-pipeline={app}-qa-prod-pipeline \ --region={region} --project={project} kubectl get pods -n {app}-prod10. Wire up secrets and storage as needed
Secrets are pulled from GCP Secret Manager via the External Secrets Operator, using Workload Identity — no keys are checked in. If the app needs a GCS bucket (e.g. for user uploads), provision it via gke-tenant-storage (step 6); the pod mounts it directly through the gcsfuse.csi.storage.gke.io CSI driver.
11. (Optional) PR preview environments
Opening a PR fires {app}-preview-deployment (deploy/cloudbuild/preview.yaml), which builds a preview image, spins up a dynamic {app}-pr-{N} namespace, and applies the manifests directly for a per-PR review URL. The preview values enable KEDA with a shorter scaledown (previews are idle almost always) and render a cross-namespace ReferenceGrant so the dynamic namespace's route can reach the KEDA interceptor — static envs get that grant from u2i-tenant-infra, but a per-PR namespace must bring its own. Closing the PR triggers cleanup of the namespace.
Useful commands
gcloud builds list --project={project} --filter="substitutions.TRIGGER_NAME={app}-preview-deployment" --limit=3 kubectl get pods -n {app}-pr-{N} dig +short pr-{N}.{app}.u2i.dev @8.8.8.8