Skip to content

Packaged API + UI Control Plane

agent-bom now ships a Helm-packaged control plane for teams that want the API and dashboard inside their own Kubernetes environment instead of running custom Deployment manifests by hand.

If you still need to choose a path, start with Deployment Overview. Use this page after you already know you want the chart itself, either through the reference installer or your own Helm layering.

This is the right path when you want:

  • the API and UI in your own cluster
  • same-origin browser traffic through your own ingress
  • Postgres, ClickHouse, SSO, and secrets kept in your own environment
  • the scanner CronJob and optional runtime monitor packaged alongside the control plane
  • production operator defaults without pretending there is a managed vendor plane
  • a clean split between the API/runtime image and the standalone UI image

When you also need Terraform ownership for the AWS baseline outside the cluster, pair this chart with the Terraform AWS Baseline module. Terraform should own RDS, S3, IAM/IRSA, and Secrets Manager; Helm should own the in-cluster Deployments, CronJobs, and ExternalSecret objects.

The reverse path is now explicit too:

export AWS_REGION="<your-aws-region>"
agent-bom teardown \
  --cluster-name agent-bom-prod \
  --region "$AWS_REGION" \
  --namespace agent-bom \
  --release agent-bom \
  --dry-run

That helper tears down the chart first and the product-owned Terraform baseline second, while leaving platform-owned cluster infrastructure alone.

Chart removal now includes packaged Helm pre/post-delete hooks that clean up product-owned in-cluster leftovers such as generated ExternalSecret target secrets, CronJobs, Jobs, and PVCs before the Terraform baseline is destroyed.

What the chart deploys

When you set controlPlane.enabled=true, the Helm chart can package:

  • API Deployment + Service
  • UI Deployment + Service
  • same-origin Ingress that routes API paths to the API service and / to the UI
  • scanner CronJob
  • optional runtime monitor DaemonSet with a dedicated service account and no automounted service-account token by default

The image split is intentional:

  • agentbom/agent-bom runs the API, scanner jobs, gateway, proxy-related entrypoints, and other non-browser workloads
  • agentbom/agent-bom-ui runs the standalone browser UI that sits behind the same ingress or a separate UI service

Enterprise deployment topology

The canonical self-hosted topology and runtime/data-flow diagrams now live in Deployment Overview.

Use that page when you need to answer:

  • what runs where in customer-controlled infrastructure
  • which components are core vs optional per rollout
  • how scans, fleet, proxy, gateway, and exports flow back into the control plane

This page stays focused on the Helm-packaged control-plane shape itself:

  • what the chart deploys
  • how same-origin ingress is wired
  • what defaults are secure by design
  • how to install and operate the packaged API + UI control plane

For the runtime operator guides behind this flow, see:

Same-origin default

The UI runtime contract from #1452 is what makes this honest.

By default the chart leaves NEXT_PUBLIC_API_URL blank in the UI pod, so the browser uses relative paths:

  • /v1/*
  • /health
  • /docs
  • /redoc
  • /openapi.json
  • /ws/*

The packaged ingress routes those paths to the API service and everything else to the UI service. That means:

  • one hostname
  • no CORS setup for the default path
  • no UI image rebuild per environment

If you want cross-origin instead, set controlPlane.ui.env so NEXT_PUBLIC_API_URL points at the API host you own.

Secure-by-default boundaries

The chart packages the control plane, but it does not quietly weaken the runtime model.

  • API and UI pods run with automountServiceAccountToken: false
  • the optional monitor DaemonSet now also runs with automountServiceAccountToken: false and its own service-account path instead of inheriting the shared chart identity
  • the discovery service account and IRSA path stay attached to the scanner
  • the API still refuses non-loopback startup without AGENT_BOM_API_KEY, OIDC, SAML-issued session keys, or an explicit insecure override
  • multi-replica API deployments now require a PostgreSQL-backed shared rate-limit store; the chart exports the replica floor and the API fails closed instead of silently falling back to process-local limits
  • same-origin ingress avoids default CORS sprawl
  • network policy stays enabled, with configurable ingress restrictions

Minimal values example

Create a Secret with the database URL and auth settings you actually use:

kubectl create secret generic agent-bom-control-plane \
  -n agent-bom \
  --from-literal=AGENT_BOM_POSTGRES_URL='postgresql://agent_bom:...@postgres-rw:5432/agent_bom' \
  --from-literal=AGENT_BOM_API_KEY='replace-me'

Then install with a values file like:

controlPlane:
  enabled: true
  api:
    envFrom:
      - secretRef:
          name: agent-bom-control-plane
  ingress:
    enabled: true
    className: nginx
    hosts:
      - host: agent-bom.internal.example.com

scanner:
  enabled: true
  allNamespaces: true
  serviceAccount:
    annotations:
      eks.amazonaws.com/role-arn: arn:aws:iam::REPLACE_ME_ACCOUNT_ID:role/REPLACE_ME_AGENT_BOM_DISCOVERY_ROLE

serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::REPLACE_ME_ACCOUNT_ID:role/REPLACE_ME_AGENT_BOM_DISCOVERY_ROLE

The chart supports component-specific service-account overrides for scanner, gateway, and backup jobs. If you omit the component-specific annotations, they inherit the shared serviceAccount.annotations block.

Install:

helm install agent-bom oci://ghcr.io/msaad00/charts/agent-bom \
  --version 0.85.0 \
  -n agent-bom --create-namespace \
  -f values.agent-bom.yaml

Single-node SQLite pilot preset

For a fast in-cluster pilot without external Postgres, use the shipped single-node preset:

Create the auth secret and a small PVC first:

kubectl create secret generic agent-bom-control-plane \
  -n agent-bom \
  --from-literal=AGENT_BOM_API_KEY='replace-me'

kubectl apply -n agent-bom -f - <<'EOF'
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: agent-bom-sqlite-pilot
spec:
  accessModes: ["ReadWriteOnce"]
  resources:
    requests:
      storage: 10Gi
EOF

Then install:

helm install agent-bom oci://ghcr.io/msaad00/charts/agent-bom \
  --version 0.85.0 \
  -n agent-bom --create-namespace \
  -f deploy/helm/agent-bom/examples/eks-control-plane-sqlite-pilot-values.yaml

This preset is intentionally single-node only: one API pod, one UI pod, PVC-backed SQLite state, no HPA, no PDB, and no multi-replica availability claims.

Production defaults example

For the stronger self-hosted operator path, start from:

That example adds:

  • HPA for API and UI
  • ServiceMonitor enabled in the production preset so /metrics is scraped when Prometheus Operator is present
  • HPA scale-down stabilization
  • topology spread across zones and nodes
  • preferred pod anti-affinity for API and UI replicas
  • optional control-plane PriorityClass
  • fail-closed shared rate limiting when the Postgres-backed limiter is unavailable
  • cert-manager ingress annotations and TLS wiring
  • external-secrets integration for the control-plane secrets, including split refresh cadence for DB vs auth/HMAC material
  • packaged PrometheusRule alerts for API error rate, scanner failures, OIDC decode failures, and proxy audit backlog
  • packaged Grafana dashboard ConfigMap for clusters that already watch dashboard config
  • packaged Postgres backup CronJob that runs pg_dump and uploads to S3 through IRSA with SSE or KMS
  • dedicated service-account hooks for gateway and backup jobs, inheriting the scanner IRSA annotations unless you override them
  • restricted ingress defaults for the chart network policy
  • optional cert-manager-backed sidecar auto-injection webhook for HTTP/SSE MCP workloads

For clusters that already standardize on a service mesh and policy controller, start from:

That example adds:

  • packaged Istio PeerAuthentication for strict mTLS on agent-bom pods
  • packaged Istio AuthorizationPolicy that keeps same-namespace traffic and explicitly whitelisted ingress namespaces
  • packaged namespaced Kyverno Policy that enforces the same restricted pod contract already used by the chart

This is intentionally an opt-in hardening layer. It composes with the chart's existing NetworkPolicy, PSS-restricted pod settings, anti-affinity, and HPA defaults instead of replacing them.

What you still own

This is a real packaged control plane, but not a magic managed service.

You still own:

  • Postgres and optional ClickHouse
  • ingress controller and TLS
  • OIDC or SAML IdP configuration, or API key secret management
  • cluster-specific autoscaling thresholds and failure-domain policy
  • operator runbooks and load testing

Production guidance

  • keep controlPlane.api.replicas and controlPlane.ui.replicas at 2+
  • use Postgres, not SQLite
  • run Alembic for long-lived Postgres control planes:
  • alembic -c deploy/supabase/postgres/alembic.ini upgrade head
  • existing init.sql databases should be stamped once with 20260416_01
  • enable the control-plane HPAs before higher-volume rollout
  • use anti-affinity and a control-plane PriorityClass when you expect node pressure
  • enable topology spread when you run multi-AZ EKS
  • keep same-origin ingress unless you have a strong reason not to
  • use envFrom / Secrets for AGENT_BOM_POSTGRES_URL, API keys, OIDC issuer, audience, optional required nonce, SAML IdP/SP metadata values, and audit HMAC settings
  • enforce API key lifetime policy with AGENT_BOM_API_KEY_DEFAULT_TTL_SECONDS and AGENT_BOM_API_KEY_MAX_TTL_SECONDS; admin key replacement uses POST /v1/auth/keys/{key_id}/rotate so rotation stays explicit and audited
  • split fast-rotating auth secrets from slower DB config with controlPlane.externalSecrets.secrets[] so AGENT_BOM_OIDC_*, AGENT_BOM_SAML_*, and AGENT_BOM_AUDIT_HMAC_KEY can refresh at 5m while AGENT_BOM_POSTGRES_URL stays at 1h
  • set AGENT_BOM_REQUIRE_SHARED_RATE_LIMIT=1 for multi-replica production control planes so the API refuses to start if the shared limiter backend is unavailable
  • tune Postgres-backed control planes explicitly with AGENT_BOM_POSTGRES_POOL_MIN_SIZE, AGENT_BOM_POSTGRES_POOL_MAX_SIZE, AGENT_BOM_POSTGRES_CONNECT_TIMEOUT_SECONDS, and AGENT_BOM_POSTGRES_STATEMENT_TIMEOUT_MS
  • enable controlPlane.observability.prometheusRule.enabled=true when the cluster already runs Prometheus Operator
  • keep monitor.enabled=true and monitor.serviceMonitor.enabled=true in the production preset unless your platform team has a different scrape contract
  • enable controlPlane.observability.grafanaDashboard.enabled=true when Grafana watches dashboard ConfigMaps
  • enable controlPlane.backup.enabled=true only after setting a real S3 bucket, prefix, and IRSA-backed upload permissions
  • enable controlPlane.serviceMesh.enabled=true only when the control-plane namespace is already part of your Istio data plane
  • keep controlPlane.serviceMesh.istio.authorizationPolicy.allowedNamespaces explicit; the packaged example allows ingress-nginx and istio-system, but production should match your real ingress path
  • enable controlPlane.policyController.enabled=true only when Kyverno is already installed cluster-wide; the chart packages the namespaced policy, not the controller itself
  • set controlPlane.backup.destination.bucketRegion to the actual region of your backup bucket; the production example intentionally uses REPLACE_ME_BUCKET_REGION
  • controlPlane.backup.destination.region remains as a backward-compatible fallback for older values files
  • keep controlPlane.backup.destination.encryption.enabled=true; the default is AES256, and production should set mode=aws:kms with a dedicated kmsKeyId
  • restore drills should use deploy/ops/restore-postgres-backup.sh: ./deploy/ops/restore-postgres-backup.sh s3://bucket/key.dump "$AGENT_BOM_POSTGRES_URL" REPLACE_ME_BUCKET_REGION
  • use Backup and Restore Runbook for the full operator checklist
  • expose GET /v1/auth/saml/metadata to your IdP admins and keep POST /v1/auth/saml/login behind the same ingress hostname as the API
  • enable PDBs when you are running multi-replica workloads

Current boundary

The chart now packages the control plane honestly, but it still does not claim:

  • a bundled Postgres subchart
  • benchmarked throughput guarantees
  • completed auth hardening beyond the currently shipped server contract

Those are the next operator-hardening layers, not hidden assumptions.