Deploying a New App

Everything is Git-ops driven: infrastructure changes and production deploys both go through a Cloud Deploy approval gate. This walks through onboarding a brand-new application end to end.

Prefer a filled-in example? See Deployment Example: RetroTool for a real deployment with actual repos, files, resource names, and commands.

1. Containerize the app
Start on the app side: make it a container the platform can run on GKE Autopilot. The image is built by the platform's Cloud Build (so it is automatically Binary Authorization-attested) and pushed to {app}-images. Autopilot enforces a hardened runtime contract, and the chart's Deployment pins a non-root securityContext — so the image has to actually run as UID 1001.
- Run as non-root UID/GID 1001 — the chart sets runAsNonRoot: true, runAsUser/runAsGroup/fsGroup: 1001. Create that user in the Dockerfile (e.g. adduser -D -u 1001 app) and USER it.
- No elevated privileges: the container runs with allowPrivilegeEscalation: false, all Linux capabilities dropped, and seccompProfile RuntimeDefault. Don't rely on root-only paths or extra caps.
- Listen on the port the chart targets — 3000 for web apps (containerPort name: http) — and bind to all interfaces (0.0.0.0, e.g. via the HOSTNAME env var), not just localhost, or the cluster can't reach the pod. Honor the PORT env var.
- Use tini (or another real init) as PID 1 for correct signal handling and zombie reaping.
- Set resource requests/limits — Autopilot bills and schedules on requests; they are required, not optional.
- Keep it small and production-only (multi-stage build, slim base such as node:22-alpine, NODE_ENV=production).
```
FROM node:22-alpine
WORKDIR /usr/server/app
RUN apk add --no-cache tini
RUN adduser -D -u 1001 app

COPY package.json package-lock.json ./
RUN npm ci
COPY . .
RUN npm run build
ENV NODE_ENV=production

USER app
EXPOSE 3000
ENTRYPOINT ["/sbin/tini"]
CMD ["npm", "run", "start"]
```
Useful commands
```
docker build -t {app}:local .
docker run -d --name {app}-test -p {host-port}:3000 {app}:local
curl -s -o /dev/null -w "%{http_code}\n" localhost:{host-port}/
docker exec {app}-test id -u          # expect 1001, not 0
docker rm -f {app}-test
```

2. Structure the app repository

Alongside the Dockerfile from step 1, add a workload Helm chart (depending on the shared gke-tenant-workload chart) with per-environment value overrides, plus the Cloud Build definitions for the dev, qa, and preview flows.

{app-repo}/
├── Dockerfile
├── helm/{app-name}/
│   ├── Chart.yaml               # depends on gke-tenant-workload
│   ├── values.yaml
│   ├── values/{dev,qa,preview,prod}.yaml
│   └── templates/
└── deploy/
    ├── cloudbuild/{dev,qa,preview}.yaml
    ├── skaffold.yaml            # used by Cloud Deploy's render phase
    └── k8s/reposync-{env}.yaml

3. Write the Helm chart
A Helm chart is a templated bundle of Kubernetes manifests — it declares how your app runs on the cluster. You don't write it all from scratch: your chart declares a dependency on the shared gke-tenant-workload chart, which supplies the platform-standard pieces (ServiceAccount + Workload Identity, the gateway ListenerSet/Certificate, ExternalSecrets, KEDA autoscaling). You add a thin chart — a Deployment, Service and HTTPRoute plus one values file per environment — and inherit the conventions. Nobody runs `helm install` by hand: at build time Cloud Build packages the chart and pushes it as an OCI artifact to the config-sync registry (oci://us-central1-docker.pkg.dev/u2i-bootstrap/config-sync); the RepoSync manifest points Config Sync at that chart + version, and Config Sync pulls it and applies it into your namespace.
- Declare the gke-tenant-workload dependency (pulled from the u2i-bootstrap/helm-charts OCI registry via `helm dependency build`; CI does the same at build time).
- Set the identity values under global: — appName, namespace, environment, stage, projectId, clusterName, clusterLocation, gatewayMode — in every environment's values file.
- Provide hostnames (the public URL) and serviceAccount.enabled: true, and enable a certificate (clusterIssuer for eg, certMap for managed-lb).
- Choose gatewayMode per environment: eg (in-cluster Envoy, nonprod dev/qa) or managed-lb (Cloud LB, prod).
- Add the templates the subchart doesn't render: a Deployment (with the non-root securityContext matching your image), a ClusterIP Service, and an HTTPRoute using the gke-tenant-workload.parentRef helper.
- Enable KEDA scale-to-zero in nonprod (a workloads entry with keda.maxReplicas + scaledownPeriod) so idle envs cost nothing; leave it off in prod (keda.enabled: false) so it stays always-on.
- When KEDA manages a workload, do NOT also set a static replicas on the Deployment — gate it out, otherwise Config Sync keeps resetting the count and fights the autoscaler.
- Keep one values file per environment under values/ (dev, qa, preview, prod).
```
# Chart.yaml — depend on the shared platform chart
dependencies:
  - name: gke-tenant-workload
    version: 0.11.2
    repository: oci://europe-west1-docker.pkg.dev/u2i-bootstrap/helm-charts

# values/dev.yaml — the platform contract lives under the subchart key
gke-tenant-workload:
  global:
    appName: {app}
    namespace: {app}-dev
    environment: dev
    stage: dev
    gatewayMode: eg            # managed-lb in prod
    projectId: c-u2i-nonprod
    clusterName: u2i-nonprod
    clusterLocation: europe-west1
  hostnames: [dev.{app}.u2i.dev]
  serviceAccount: { enabled: true }
  certificate: { enabled: true, clusterIssuer: letsencrypt-prod }
  workloads:                   # KEDA scale-to-zero (nonprod only)
    - name: {app}
      serviceName: {app}
      port: 80
      keda: { maxReplicas: 1, scaledownPeriod: 600 }
```
Useful commands
```
gcloud auth print-access-token | \
  helm registry login europe-west1-docker.pkg.dev -u oauth2accesstoken --password-stdin
helm dependency build helm/{app}
helm template {app} helm/{app} -f helm/{app}/values/{env}.yaml
```
4. Register the app in the foundation config
Now switch to the infrastructure side. Add an entry for the app under the tenant's apps map in u2i-gcp-infrastructure/foundation/4-tenants/terramate.tm.hcl, then run `terramate generate` to regenerate the derived terraform.tfvars — commit both files. Only set approver_group if that Google group already exists; otherwise use null and wire the prod gate later (binding a non-existent group breaks the IAM apply) — the same pattern compensation and u2i-comp-portal use.
- App key: max 20 characters. Foundation generates a {key}-deploy-sa service account, and GCP caps service-account IDs at 30 chars, so 30 − len("-deploy-sa") = 20. (Example: u2i-gcp-platform-docs is 21 → one over; use gcp-platform-docs instead.)
- The app key need NOT equal the GitHub repo name — github_repo is a separate field (e.g. key retrotool-app, repo retrotool). Keep the key short and DNS-safe; it also becomes namespaces, registry names, and the hostname base.
- While you're in this repo, also add a google_artifact_registry_repository_iam_member granting {app}-ci reader on shared-images-nonprod's docker-images repo, in foundation/4-tenants/artifact-registry-access.tf. This isn't created automatically by the app entry above — every app that actually builds has its own hand-added copy of this grant. Skip it and the first build fails at step 0 pulling the shared build-lib image with an 'artifactregistry.repositories.downloadArtifacts denied' error — easy to add now, annoying to debug later.
```
{app-key} = {
  display_name    = "..."
  github_repo     = "{repo-name}"
  branch          = "main"
  create_ar_repos = true
  # gcp-{app}-prodsupport@u2i.com once that group exists; else null
  approver_group  = null
}

# foundation/4-tenants/artifact-registry-access.tf — not created by the
# entry above; add it in the same PR or the first build fails at step 0.
resource "google_artifact_registry_repository_iam_member" "shared_images_{app}_ci_reader" {
  project    = local.shared_images_project_id
  location   = local.shared_images_region
  repository = local.shared_images_repo
  role       = "roles/artifactregistry.reader"
  member     = "serviceAccount:${google_service_account.app_ci["{app-key}"].email}"
}
```
Useful commands
```
terramate generate                                  # regenerate terraform.tfvars
cd foundation/4-tenants
terraform init -backend=false -input=false && terraform validate
```
5. Apply the foundation change
Applying runs through two separate Cloud Build runs — never locally. It is worth knowing which build does what:
- PR opened → pr-terraform-validation fires and runs terraform plan, posting a terraform-plan/foundation check. It shows the diff and catches errors (this is what flags a too-long service-account name, an invalid value, etc.) — but it does NOT apply anything.
- Merge to main → a second build (terramate-docker-images) fires and creates a release on the terramate-infrastructure Cloud Deploy delivery pipeline.
- That release waits at a manual approval gate; approving the rollout is what actually runs terraform apply. So a green PR check means "the plan is valid", not "it's live".
- The apply then provisions, via foundation/4-tenants/app-cicd.tf: three Cloud Build triggers ({app}-dev-trigger, {app}-qa-deployment, {app}-preview-deployment), the {app}-ci and {app}-deploy-sa service accounts, Artifact Registry repos ({app}-images, {app}-cache) in nonprod and prod, and two Cloud Deploy pipelines ({app}-dev-pipeline, {app}-qa-prod-pipeline).
- A first-time apply for a brand-new app can fail on the preview trigger with "user does not have impersonation permission on ... {app}-ci@..." — this is IAM propagation lag, not a real permissions gap (the {app}-ci service account was created earlier in the same apply and hasn't fully propagated yet). Everything else in the apply has already been created; just retry.
- Retrying the failed job (gcloud deploy rollouts retry-job) needs the clouddeploy.operator role — approver alone (what you use to approve the rollout) is not enough. If you don't have it via PAM, the simplest fix is to re-run the merge trigger instead (gcloud builds triggers run terramate-docker-images --branch=main) to create a fresh release and approve that one; Terraform is idempotent so it only creates what's still missing.
Useful commands
```
# PR status and the terraform-plan check
gh pr checks {pr-number} --repo {org}/{repo}
gh pr view {pr-number} --repo {org}/{repo} --json state,mergeable,mergeStateStatus

# after merge: find the release + rollout it created
gcloud deploy releases list --delivery-pipeline={pipeline} --region={region} --project={project} --limit=3
gcloud deploy rollouts list --release={release} --delivery-pipeline={pipeline} --region={region} --project={project}
gcloud deploy rollouts describe {rollout} --release={release} --delivery-pipeline={pipeline} --region={region} --project={project}

# inspect the underlying Cloud Build run
gcloud builds describe {build-id} --project={project} --region={region}

# retry a failed job (needs clouddeploy.operator, not just approver)
gcloud deploy rollouts retry-job {rollout} --release={release} --delivery-pipeline={pipeline} \
  --region={region} --project={project} --phase-id={phase} --job-id={job}
```

6. Declare namespaces, identity & buckets in u2i-tenant-infra

The namespaces the app runs in, its Kubernetes-to-GCP Workload Identity bindings, and any GCS buckets are declared in the separate u2i-tenant-infra repo (reconciled by Config Sync), not by the app's own pipeline. Add a subchart under chart/charts/apps/ depending on gke-tenant-foundation (+ gke-tenant-storage if it needs buckets), and enable it in k8s/{env}/rootsync.yaml. If the app needs its own public hostname, this is also where its DNS subzone gets created (gke-tenant-foundation.dns.publicZone) — see the next step.

Useful commands

# no kubectl access to a private cluster? verify via the real GCP resources instead
gcloud iam service-accounts describe {app}-k8s@{project}.iam.gserviceaccount.com --project={project}
gcloud iam service-accounts get-iam-policy {app}-k8s@{project}.iam.gserviceaccount.com --project={project}
gcloud dns managed-zones describe {app}-u2i-dev --project={project}

# if you do have cluster access
gcloud container clusters get-credentials {cluster} --region {region} --project {project}
kubectl get namespace {app} {app}-dev {app}-qa
kubectl get rootsync u2i-infra -n config-management-system -o jsonpath='{.status.sync.commit}'

7. Delegate the app's DNS subzone from the root zone
A hostname is not automatic — it has to be declared. u2i-gcp-infrastructure's foundation/6-dns/main.tf owns the root u2i.dev zone and delegates each app's subzone to it via an NS record. That delegation reads the nameserver set from the app's live subzone (data.google_dns_managed_zone, not a hardcoded value) — so it can only be added after step 6 has actually created the subzone; the data source would fail to resolve otherwise. Also add the app's hostname to domainFilters in u2i-tenant-infra's rootsync-external-dns.yaml, so per-environment A records (dev./qa.*) publish automatically from each HTTPRoute. This step only matters for public reachability — verifying the pod itself comes up (next step) doesn't need it; kubectl port-forward works without any DNS in place.
Useful commands
```
# confirm the delegation record in the root zone
gcloud dns record-sets list --zone={root-zone} --project={dns-project} --name="{app}.u2i.dev."

# confirm it resolves publicly (not just internally)
dig +short NS {app}.u2i.dev
dig +short dev.{app}.u2i.dev @8.8.8.8   # empty until a real HTTPRoute exists (step 8)
```

8. Build & release: branch, tag, or PR

A push to main fires {app}-dev-trigger, which builds the image, pushes it to {app}-images, and creates a release in {app}-dev-pipeline (deploying to the {app}-dev namespace). Cutting a version tag (v*) fires {app}-qa-deployment, which creates a release in {app}-qa-prod-pipeline.

Useful commands

# watch the build the trigger just fired
gcloud builds list --project={project} --filter="substitutions.TRIGGER_NAME={app}-dev-trigger" --limit=3
gcloud builds log {build-id} --project={project}

# verify the pod without needing DNS
gcloud container clusters get-credentials {cluster} --region {region} --project {project}
kubectl get pods -n {app}-dev
kubectl logs -n {app}-dev deploy/{app} -f
kubectl port-forward -n {app}-dev svc/{app} {local-port}:80

9. Automatic qa/nonprod rollout, manual prod approval
Cloud Deploy rolls the release out to the qa (nonprod) target automatically. Promotion to the prod target is gated behind a manual approval from the approver_group configured in step 4. The {app}-deploy-sa service account executes the rollout against the target cluster.
Useful commands
```
gcloud deploy rollouts list --release={release} --delivery-pipeline={app}-qa-prod-pipeline \
  --region={region} --project={project}
gcloud deploy rollouts approve {rollout} --release={release} --delivery-pipeline={app}-qa-prod-pipeline \
  --region={region} --project={project}
kubectl get pods -n {app}-prod
```
10. Wire up secrets and storage as needed
Secrets are pulled from GCP Secret Manager via the External Secrets Operator, using Workload Identity — no keys are checked in. If the app needs a GCS bucket (e.g. for user uploads), provision it via gke-tenant-storage (step 6); the pod mounts it directly through the gcsfuse.csi.storage.gke.io CSI driver.
11. (Optional) PR preview environments
Opening a PR fires {app}-preview-deployment (deploy/cloudbuild/preview.yaml), which builds a preview image, spins up a dynamic {app}-pr-{N} namespace, and applies the manifests directly for a per-PR review URL. The preview values enable KEDA with a shorter scaledown (previews are idle almost always) and render a cross-namespace ReferenceGrant so the dynamic namespace's route can reach the KEDA interceptor — static envs get that grant from u2i-tenant-infra, but a per-PR namespace must bring its own. Closing the PR triggers cleanup of the namespace.
Useful commands
```
gcloud builds list --project={project} --filter="substitutions.TRIGGER_NAME={app}-preview-deployment" --limit=3
kubectl get pods -n {app}-pr-{N}
dig +short pr-{N}.{app}.u2i.dev @8.8.8.8
```

Deploying a New App

1. Containerize the app

2. Structure the app repository

3. Write the Helm chart

4. Register the app in the foundation config

5. Apply the foundation change

6. Declare namespaces, identity & buckets in u2i-tenant-infra

7. Delegate the app's DNS subzone from the root zone

8. Build & release: branch, tag, or PR

9. Automatic qa/nonprod rollout, manual prod approval

10. Wire up secrets and storage as needed

11. (Optional) PR preview environments