Deploying a New App

Everything is Git-ops driven: infrastructure changes and production deploys both go through a Cloud Deploy approval gate. This walks through onboarding a brand-new application end to end.

Prefer a filled-in example? See Deployment Example: RetroTool for a real deployment with actual repos, files, resource names, and commands.

  1. 1. Containerize the app

    Start on the app side: make it a container the platform can run on GKE Autopilot. The image is built by the platform's Cloud Build (so it is automatically Binary Authorization-attested) and pushed to {app}-images. Autopilot enforces a hardened runtime contract, and the chart's Deployment pins a non-root securityContext — so the image has to actually run as UID 1001.

    • Run as non-root UID/GID 1001 — the chart sets runAsNonRoot: true, runAsUser/runAsGroup/fsGroup: 1001. Create that user in the Dockerfile (e.g. adduser -D -u 1001 app) and USER it.
    • No elevated privileges: the container runs with allowPrivilegeEscalation: false, all Linux capabilities dropped, and seccompProfile RuntimeDefault. Don't rely on root-only paths or extra caps.
    • Listen on the port the chart targets — 3000 for web apps (containerPort name: http) — and bind to all interfaces (0.0.0.0, e.g. via the HOSTNAME env var), not just localhost, or the cluster can't reach the pod. Honor the PORT env var.
    • Use tini (or another real init) as PID 1 for correct signal handling and zombie reaping.
    • Set resource requests/limits — Autopilot bills and schedules on requests; they are required, not optional.
    • Keep it small and production-only (multi-stage build, slim base such as node:22-alpine, NODE_ENV=production).
    FROM node:22-alpine
    WORKDIR /usr/server/app
    RUN apk add --no-cache tini
    RUN adduser -D -u 1001 app
    
    COPY package.json package-lock.json ./
    RUN npm ci
    COPY . .
    RUN npm run build
    ENV NODE_ENV=production
    
    USER app
    EXPOSE 3000
    ENTRYPOINT ["/sbin/tini"]
    CMD ["npm", "run", "start"]

    Useful commands

    docker build -t {app}:local .
    docker run -d --name {app}-test -p {host-port}:3000 {app}:local
    curl -s -o /dev/null -w "%{http_code}\n" localhost:{host-port}/
    docker exec {app}-test id -u          # expect 1001, not 0
    docker rm -f {app}-test
  2. 2. Structure the app repository

    Alongside the Dockerfile from step 1, add a workload Helm chart (depending on the shared gke-tenant-workload chart) with per-environment value overrides, plus the Cloud Build definitions for the dev, qa, and preview flows.

    {app-repo}/
    ├── Dockerfile
    ├── helm/{app-name}/
    │   ├── Chart.yaml               # depends on gke-tenant-workload
    │   ├── values.yaml
    │   ├── values/{dev,qa,preview,prod}.yaml
    │   └── templates/
    └── deploy/
        ├── cloudbuild/{dev,qa,preview}.yaml
        ├── skaffold.yaml            # used by Cloud Deploy's render phase
        └── k8s/reposync-{env}.yaml
  3. 3. Write the Helm chart

    A Helm chart is a templated bundle of Kubernetes manifests — it declares how your app runs on the cluster. You don't write it all from scratch: your chart declares a dependency on the shared gke-tenant-workload chart, which supplies the platform-standard pieces (ServiceAccount + Workload Identity, the gateway ListenerSet/Certificate, ExternalSecrets, KEDA autoscaling). You add a thin chart — a Deployment, Service and HTTPRoute plus one values file per environment — and inherit the conventions. Nobody runs `helm install` by hand: at build time Cloud Build packages the chart and pushes it as an OCI artifact to the config-sync registry (oci://us-central1-docker.pkg.dev/u2i-bootstrap/config-sync); the RepoSync manifest points Config Sync at that chart + version, and Config Sync pulls it and applies it into your namespace.

    • Declare the gke-tenant-workload dependency (pulled from the u2i-bootstrap/helm-charts OCI registry via `helm dependency build`; CI does the same at build time).
    • Set the identity values under global: — appName, namespace, environment, stage, projectId, clusterName, clusterLocation, gatewayMode — in every environment's values file.
    • Provide hostnames (the public URL) and serviceAccount.enabled: true, and enable a certificate (clusterIssuer for eg, certMap for managed-lb).
    • Choose gatewayMode per environment: eg (in-cluster Envoy, nonprod dev/qa) or managed-lb (Cloud LB, prod).
    • Add the templates the subchart doesn't render: a Deployment (with the non-root securityContext matching your image), a ClusterIP Service, and an HTTPRoute using the gke-tenant-workload.parentRef helper.
    • Enable KEDA scale-to-zero in nonprod (a workloads entry with keda.maxReplicas + scaledownPeriod) so idle envs cost nothing; leave it off in prod (keda.enabled: false) so it stays always-on.
    • When KEDA manages a workload, do NOT also set a static replicas on the Deployment — gate it out, otherwise Config Sync keeps resetting the count and fights the autoscaler.
    • Keep one values file per environment under values/ (dev, qa, preview, prod).
    # Chart.yaml — depend on the shared platform chart
    dependencies:
      - name: gke-tenant-workload
        version: 0.11.2
        repository: oci://europe-west1-docker.pkg.dev/u2i-bootstrap/helm-charts
    
    # values/dev.yaml — the platform contract lives under the subchart key
    gke-tenant-workload:
      global:
        appName: {app}
        namespace: {app}-dev
        environment: dev
        stage: dev
        gatewayMode: eg            # managed-lb in prod
        projectId: c-u2i-nonprod
        clusterName: u2i-nonprod
        clusterLocation: europe-west1
      hostnames: [dev.{app}.u2i.dev]
      serviceAccount: { enabled: true }
      certificate: { enabled: true, clusterIssuer: letsencrypt-prod }
      workloads:                   # KEDA scale-to-zero (nonprod only)
        - name: {app}
          serviceName: {app}
          port: 80
          keda: { maxReplicas: 1, scaledownPeriod: 600 }

    Useful commands

    gcloud auth print-access-token | \
      helm registry login europe-west1-docker.pkg.dev -u oauth2accesstoken --password-stdin
    helm dependency build helm/{app}
    helm template {app} helm/{app} -f helm/{app}/values/{env}.yaml
  4. 4. Register the app in the foundation config

    Now switch to the infrastructure side. Add an entry for the app under the tenant's apps map in u2i-gcp-infrastructure/foundation/4-tenants/terramate.tm.hcl, then run `terramate generate` to regenerate the derived terraform.tfvars — commit both files. Only set approver_group if that Google group already exists; otherwise use null and wire the prod gate later (binding a non-existent group breaks the IAM apply) — the same pattern compensation and u2i-comp-portal use.

    • App key: max 20 characters. Foundation generates a {key}-deploy-sa service account, and GCP caps service-account IDs at 30 chars, so 30 − len("-deploy-sa") = 20. (Example: u2i-gcp-platform-docs is 21 → one over; use gcp-platform-docs instead.)
    • The app key need NOT equal the GitHub repo name — github_repo is a separate field (e.g. key retrotool-app, repo retrotool). Keep the key short and DNS-safe; it also becomes namespaces, registry names, and the hostname base.
    • While you're in this repo, also add a google_artifact_registry_repository_iam_member granting {app}-ci reader on shared-images-nonprod's docker-images repo, in foundation/4-tenants/artifact-registry-access.tf. This isn't created automatically by the app entry above — every app that actually builds has its own hand-added copy of this grant. Skip it and the first build fails at step 0 pulling the shared build-lib image with an 'artifactregistry.repositories.downloadArtifacts denied' error — easy to add now, annoying to debug later.
    {app-key} = {
      display_name    = "..."
      github_repo     = "{repo-name}"
      branch          = "main"
      create_ar_repos = true
      # gcp-{app}-prodsupport@u2i.com once that group exists; else null
      approver_group  = null
    }
    
    # foundation/4-tenants/artifact-registry-access.tf — not created by the
    # entry above; add it in the same PR or the first build fails at step 0.
    resource "google_artifact_registry_repository_iam_member" "shared_images_{app}_ci_reader" {
      project    = local.shared_images_project_id
      location   = local.shared_images_region
      repository = local.shared_images_repo
      role       = "roles/artifactregistry.reader"
      member     = "serviceAccount:${google_service_account.app_ci["{app-key}"].email}"
    }

    Useful commands

    terramate generate                                  # regenerate terraform.tfvars
    cd foundation/4-tenants
    terraform init -backend=false -input=false && terraform validate
  5. 5. Apply the foundation change

    Applying runs through two separate Cloud Build runs — never locally. It is worth knowing which build does what:

    • PR opened → pr-terraform-validation fires and runs terraform plan, posting a terraform-plan/foundation check. It shows the diff and catches errors (this is what flags a too-long service-account name, an invalid value, etc.) — but it does NOT apply anything.
    • Merge to main → a second build (terramate-docker-images) fires and creates a release on the terramate-infrastructure Cloud Deploy delivery pipeline.
    • That release waits at a manual approval gate; approving the rollout is what actually runs terraform apply. So a green PR check means "the plan is valid", not "it's live".
    • The apply then provisions, via foundation/4-tenants/app-cicd.tf: three Cloud Build triggers ({app}-dev-trigger, {app}-qa-deployment, {app}-preview-deployment), the {app}-ci and {app}-deploy-sa service accounts, Artifact Registry repos ({app}-images, {app}-cache) in nonprod and prod, and two Cloud Deploy pipelines ({app}-dev-pipeline, {app}-qa-prod-pipeline).
    • A first-time apply for a brand-new app can fail on the preview trigger with "user does not have impersonation permission on ... {app}-ci@..." — this is IAM propagation lag, not a real permissions gap (the {app}-ci service account was created earlier in the same apply and hasn't fully propagated yet). Everything else in the apply has already been created; just retry.
    • Retrying the failed job (gcloud deploy rollouts retry-job) needs the clouddeploy.operator role — approver alone (what you use to approve the rollout) is not enough. If you don't have it via PAM, the simplest fix is to re-run the merge trigger instead (gcloud builds triggers run terramate-docker-images --branch=main) to create a fresh release and approve that one; Terraform is idempotent so it only creates what's still missing.

    Useful commands

    # PR status and the terraform-plan check
    gh pr checks {pr-number} --repo {org}/{repo}
    gh pr view {pr-number} --repo {org}/{repo} --json state,mergeable,mergeStateStatus
    
    # after merge: find the release + rollout it created
    gcloud deploy releases list --delivery-pipeline={pipeline} --region={region} --project={project} --limit=3
    gcloud deploy rollouts list --release={release} --delivery-pipeline={pipeline} --region={region} --project={project}
    gcloud deploy rollouts describe {rollout} --release={release} --delivery-pipeline={pipeline} --region={region} --project={project}
    
    # inspect the underlying Cloud Build run
    gcloud builds describe {build-id} --project={project} --region={region}
    
    # retry a failed job (needs clouddeploy.operator, not just approver)
    gcloud deploy rollouts retry-job {rollout} --release={release} --delivery-pipeline={pipeline} \
      --region={region} --project={project} --phase-id={phase} --job-id={job}
  6. 6. Declare namespaces, identity & buckets in u2i-tenant-infra

    The namespaces the app runs in, its Kubernetes-to-GCP Workload Identity bindings, and any GCS buckets are declared in the separate u2i-tenant-infra repo (reconciled by Config Sync), not by the app's own pipeline. Add a subchart under chart/charts/apps/ depending on gke-tenant-foundation (+ gke-tenant-storage if it needs buckets), and enable it in k8s/{env}/rootsync.yaml. If the app needs its own public hostname, this is also where its DNS subzone gets created (gke-tenant-foundation.dns.publicZone) — see the next step.

    Useful commands

    # no kubectl access to a private cluster? verify via the real GCP resources instead
    gcloud iam service-accounts describe {app}-k8s@{project}.iam.gserviceaccount.com --project={project}
    gcloud iam service-accounts get-iam-policy {app}-k8s@{project}.iam.gserviceaccount.com --project={project}
    gcloud dns managed-zones describe {app}-u2i-dev --project={project}
    
    # if you do have cluster access
    gcloud container clusters get-credentials {cluster} --region {region} --project {project}
    kubectl get namespace {app} {app}-dev {app}-qa
    kubectl get rootsync u2i-infra -n config-management-system -o jsonpath='{.status.sync.commit}'
  7. 7. Delegate the app's DNS subzone from the root zone

    A hostname is not automatic — it has to be declared. u2i-gcp-infrastructure's foundation/6-dns/main.tf owns the root u2i.dev zone and delegates each app's subzone to it via an NS record. That delegation reads the nameserver set from the app's live subzone (data.google_dns_managed_zone, not a hardcoded value) — so it can only be added after step 6 has actually created the subzone; the data source would fail to resolve otherwise. Also add the app's hostname to domainFilters in u2i-tenant-infra's rootsync-external-dns.yaml, so per-environment A records (dev./qa.*) publish automatically from each HTTPRoute. This step only matters for public reachability — verifying the pod itself comes up (next step) doesn't need it; kubectl port-forward works without any DNS in place.

    Useful commands

    # confirm the delegation record in the root zone
    gcloud dns record-sets list --zone={root-zone} --project={dns-project} --name="{app}.u2i.dev."
    
    # confirm it resolves publicly (not just internally)
    dig +short NS {app}.u2i.dev
    dig +short dev.{app}.u2i.dev @8.8.8.8   # empty until a real HTTPRoute exists (step 8)
  8. 8. Build & release: branch, tag, or PR

    A push to main fires {app}-dev-trigger, which builds the image, pushes it to {app}-images, and creates a release in {app}-dev-pipeline (deploying to the {app}-dev namespace). Cutting a version tag (v*) fires {app}-qa-deployment, which creates a release in {app}-qa-prod-pipeline.

    Useful commands

    # watch the build the trigger just fired
    gcloud builds list --project={project} --filter="substitutions.TRIGGER_NAME={app}-dev-trigger" --limit=3
    gcloud builds log {build-id} --project={project}
    
    # verify the pod without needing DNS
    gcloud container clusters get-credentials {cluster} --region {region} --project {project}
    kubectl get pods -n {app}-dev
    kubectl logs -n {app}-dev deploy/{app} -f
    kubectl port-forward -n {app}-dev svc/{app} {local-port}:80
  9. 9. Automatic qa/nonprod rollout, manual prod approval

    Cloud Deploy rolls the release out to the qa (nonprod) target automatically. Promotion to the prod target is gated behind a manual approval from the approver_group configured in step 4. The {app}-deploy-sa service account executes the rollout against the target cluster.

    Useful commands

    gcloud deploy rollouts list --release={release} --delivery-pipeline={app}-qa-prod-pipeline \
      --region={region} --project={project}
    gcloud deploy rollouts approve {rollout} --release={release} --delivery-pipeline={app}-qa-prod-pipeline \
      --region={region} --project={project}
    kubectl get pods -n {app}-prod
  10. 10. Wire up secrets and storage as needed

    Secrets are pulled from GCP Secret Manager via the External Secrets Operator, using Workload Identity — no keys are checked in. If the app needs a GCS bucket (e.g. for user uploads), provision it via gke-tenant-storage (step 6); the pod mounts it directly through the gcsfuse.csi.storage.gke.io CSI driver.

  11. 11. (Optional) PR preview environments

    Opening a PR fires {app}-preview-deployment (deploy/cloudbuild/preview.yaml), which builds a preview image, spins up a dynamic {app}-pr-{N} namespace, and applies the manifests directly for a per-PR review URL. The preview values enable KEDA with a shorter scaledown (previews are idle almost always) and render a cross-namespace ReferenceGrant so the dynamic namespace's route can reach the KEDA interceptor — static envs get that grant from u2i-tenant-infra, but a per-PR namespace must bring its own. Closing the PR triggers cleanup of the namespace.

    Useful commands

    gcloud builds list --project={project} --filter="substitutions.TRIGGER_NAME={app}-preview-deployment" --limit=3
    kubectl get pods -n {app}-pr-{N}
    dig +short pr-{N}.{app}.u2i.dev @8.8.8.8