Packaged API + UI Control Plane¶
agent-bom now ships a Helm-packaged control plane for teams that want the API
and dashboard inside their own Kubernetes environment instead of running custom
Deployment manifests by hand.
If you still need to choose a path, start with Deployment Overview. Use this page after you already know you want the chart itself, either through the reference installer or your own Helm layering.
This is the right path when you want:
- the API and UI in your own cluster
- same-origin browser traffic through your own ingress
- Postgres, ClickHouse, SSO, and secrets kept in your own environment
- the scanner CronJob and optional runtime monitor packaged alongside the control plane
- production operator defaults without pretending there is a managed vendor plane
- a clean split between the API/runtime image and the standalone UI image
When you also need Terraform ownership for the AWS baseline outside the cluster, pair this chart with the Terraform AWS Baseline module. Terraform should own RDS, S3, IAM/IRSA, and Secrets Manager; Helm should own the in-cluster Deployments, CronJobs, and ExternalSecret objects.
The reverse path is now explicit too:
export AWS_REGION="<your-aws-region>"
agent-bom teardown \
--cluster-name agent-bom-prod \
--region "$AWS_REGION" \
--namespace agent-bom \
--release agent-bom \
--dry-run
That helper tears down the chart first and the product-owned Terraform baseline second, while leaving platform-owned cluster infrastructure alone.
Chart removal now includes packaged Helm pre/post-delete hooks that clean up product-owned in-cluster leftovers such as generated ExternalSecret target secrets, CronJobs, Jobs, and PVCs before the Terraform baseline is destroyed.
What the chart deploys¶
When you set controlPlane.enabled=true, the Helm chart can package:
- API Deployment + Service
- UI Deployment + Service
- same-origin Ingress that routes API paths to the API service and
/to the UI - scanner CronJob
- optional runtime monitor DaemonSet with a dedicated service account and no automounted service-account token by default
The image split is intentional:
agentbom/agent-bomruns the API, scanner jobs, gateway, proxy-related entrypoints, and other non-browser workloadsagentbom/agent-bom-uiruns the standalone browser UI that sits behind the same ingress or a separate UI service
Enterprise deployment topology¶
The canonical self-hosted topology and runtime/data-flow diagrams now live in Deployment Overview.
Use that page when you need to answer:
- what runs where in customer-controlled infrastructure
- which components are core vs optional per rollout
- how scans, fleet, proxy, gateway, and exports flow back into the control plane
This page stays focused on the Helm-packaged control-plane shape itself:
- what the chart deploys
- how same-origin ingress is wired
- what defaults are secure by design
- how to install and operate the packaged API + UI control plane
For the runtime operator guides behind this flow, see:
- Visual Leak Detection
- Worker and Scheduler Concurrency
- Gateway Auto-Discovery From the Control Plane
Same-origin default¶
The UI runtime contract from #1452 is what makes this honest.
By default the chart leaves NEXT_PUBLIC_API_URL blank in the UI pod, so the
browser uses relative paths:
/v1/*/health/docs/redoc/openapi.json/ws/*
The packaged ingress routes those paths to the API service and everything else to the UI service. That means:
- one hostname
- no CORS setup for the default path
- no UI image rebuild per environment
If you want cross-origin instead, set controlPlane.ui.env so
NEXT_PUBLIC_API_URL points at the API host you own.
Secure-by-default boundaries¶
The chart packages the control plane, but it does not quietly weaken the runtime model.
- API and UI pods run with
automountServiceAccountToken: false - the optional monitor DaemonSet now also runs with
automountServiceAccountToken: falseand its own service-account path instead of inheriting the shared chart identity - the discovery service account and IRSA path stay attached to the scanner
- the API still refuses non-loopback startup without
AGENT_BOM_API_KEY, OIDC, SAML-issued session keys, or an explicit insecure override - multi-replica API deployments now require a PostgreSQL-backed shared rate-limit store; the chart exports the replica floor and the API fails closed instead of silently falling back to process-local limits
- same-origin ingress avoids default CORS sprawl
- network policy stays enabled, with configurable ingress restrictions
Minimal values example¶
Create a Secret with the database URL and auth settings you actually use:
kubectl create secret generic agent-bom-control-plane \
-n agent-bom \
--from-literal=AGENT_BOM_POSTGRES_URL='postgresql://agent_bom:...@postgres-rw:5432/agent_bom' \
--from-literal=AGENT_BOM_API_KEY='replace-me'
Then install with a values file like:
controlPlane:
enabled: true
api:
envFrom:
- secretRef:
name: agent-bom-control-plane
ingress:
enabled: true
className: nginx
hosts:
- host: agent-bom.internal.example.com
scanner:
enabled: true
allNamespaces: true
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::REPLACE_ME_ACCOUNT_ID:role/REPLACE_ME_AGENT_BOM_DISCOVERY_ROLE
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::REPLACE_ME_ACCOUNT_ID:role/REPLACE_ME_AGENT_BOM_DISCOVERY_ROLE
The chart supports component-specific service-account overrides for scanner, gateway, and backup jobs. If you omit the component-specific annotations, they inherit the shared serviceAccount.annotations block.
Install:
helm install agent-bom oci://ghcr.io/msaad00/charts/agent-bom \
--version 0.85.0 \
-n agent-bom --create-namespace \
-f values.agent-bom.yaml
Single-node SQLite pilot preset¶
For a fast in-cluster pilot without external Postgres, use the shipped single-node preset:
Create the auth secret and a small PVC first:
kubectl create secret generic agent-bom-control-plane \
-n agent-bom \
--from-literal=AGENT_BOM_API_KEY='replace-me'
kubectl apply -n agent-bom -f - <<'EOF'
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: agent-bom-sqlite-pilot
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
EOF
Then install:
helm install agent-bom oci://ghcr.io/msaad00/charts/agent-bom \
--version 0.85.0 \
-n agent-bom --create-namespace \
-f deploy/helm/agent-bom/examples/eks-control-plane-sqlite-pilot-values.yaml
This preset is intentionally single-node only: one API pod, one UI pod, PVC-backed SQLite state, no HPA, no PDB, and no multi-replica availability claims.
Production defaults example¶
For the stronger self-hosted operator path, start from:
That example adds:
HPAfor API and UIServiceMonitorenabled in the production preset so/metricsis scraped when Prometheus Operator is presentHPAscale-down stabilization- topology spread across zones and nodes
- preferred pod anti-affinity for API and UI replicas
- optional control-plane
PriorityClass - fail-closed shared rate limiting when the Postgres-backed limiter is unavailable
cert-manageringress annotations and TLS wiringexternal-secretsintegration for the control-plane secrets, including split refresh cadence for DB vs auth/HMAC material- packaged
PrometheusRulealerts for API error rate, scanner failures, OIDC decode failures, and proxy audit backlog - packaged Grafana dashboard
ConfigMapfor clusters that already watch dashboard config - packaged Postgres backup
CronJobthat runspg_dumpand uploads to S3 through IRSA with SSE or KMS - dedicated service-account hooks for gateway and backup jobs, inheriting the scanner IRSA annotations unless you override them
- restricted ingress defaults for the chart network policy
- optional cert-manager-backed sidecar auto-injection webhook for HTTP/SSE MCP workloads
For clusters that already standardize on a service mesh and policy controller, start from:
That example adds:
- packaged Istio
PeerAuthenticationfor strict mTLS onagent-bompods - packaged Istio
AuthorizationPolicythat keeps same-namespace traffic and explicitly whitelisted ingress namespaces - packaged namespaced Kyverno
Policythat enforces the same restricted pod contract already used by the chart
This is intentionally an opt-in hardening layer. It composes with the chart's
existing NetworkPolicy, PSS-restricted pod settings, anti-affinity, and HPA
defaults instead of replacing them.
What you still own¶
This is a real packaged control plane, but not a magic managed service.
You still own:
- Postgres and optional ClickHouse
- ingress controller and TLS
- OIDC or SAML IdP configuration, or API key secret management
- cluster-specific autoscaling thresholds and failure-domain policy
- operator runbooks and load testing
Production guidance¶
- keep
controlPlane.api.replicasandcontrolPlane.ui.replicasat2+ - use
Postgres, not SQLite - run Alembic for long-lived Postgres control planes:
alembic -c deploy/supabase/postgres/alembic.ini upgrade head- existing
init.sqldatabases should be stamped once with20260416_01 - enable the control-plane HPAs before higher-volume rollout
- use anti-affinity and a control-plane
PriorityClasswhen you expect node pressure - enable topology spread when you run multi-AZ EKS
- keep same-origin ingress unless you have a strong reason not to
- use
envFrom/ Secrets forAGENT_BOM_POSTGRES_URL, API keys, OIDC issuer, audience, optional required nonce, SAML IdP/SP metadata values, and audit HMAC settings - enforce API key lifetime policy with
AGENT_BOM_API_KEY_DEFAULT_TTL_SECONDSandAGENT_BOM_API_KEY_MAX_TTL_SECONDS; admin key replacement usesPOST /v1/auth/keys/{key_id}/rotateso rotation stays explicit and audited - split fast-rotating auth secrets from slower DB config with
controlPlane.externalSecrets.secrets[]soAGENT_BOM_OIDC_*,AGENT_BOM_SAML_*, andAGENT_BOM_AUDIT_HMAC_KEYcan refresh at5mwhileAGENT_BOM_POSTGRES_URLstays at1h - set
AGENT_BOM_REQUIRE_SHARED_RATE_LIMIT=1for multi-replica production control planes so the API refuses to start if the shared limiter backend is unavailable - tune Postgres-backed control planes explicitly with
AGENT_BOM_POSTGRES_POOL_MIN_SIZE,AGENT_BOM_POSTGRES_POOL_MAX_SIZE,AGENT_BOM_POSTGRES_CONNECT_TIMEOUT_SECONDS, andAGENT_BOM_POSTGRES_STATEMENT_TIMEOUT_MS - enable
controlPlane.observability.prometheusRule.enabled=truewhen the cluster already runs Prometheus Operator - keep
monitor.enabled=trueandmonitor.serviceMonitor.enabled=truein the production preset unless your platform team has a different scrape contract - enable
controlPlane.observability.grafanaDashboard.enabled=truewhen Grafana watches dashboardConfigMaps - enable
controlPlane.backup.enabled=trueonly after setting a real S3 bucket, prefix, and IRSA-backed upload permissions - enable
controlPlane.serviceMesh.enabled=trueonly when the control-plane namespace is already part of your Istio data plane - keep
controlPlane.serviceMesh.istio.authorizationPolicy.allowedNamespacesexplicit; the packaged example allowsingress-nginxandistio-system, but production should match your real ingress path - enable
controlPlane.policyController.enabled=trueonly when Kyverno is already installed cluster-wide; the chart packages the namespaced policy, not the controller itself - set
controlPlane.backup.destination.bucketRegionto the actual region of your backup bucket; the production example intentionally usesREPLACE_ME_BUCKET_REGION controlPlane.backup.destination.regionremains as a backward-compatible fallback for older values files- keep
controlPlane.backup.destination.encryption.enabled=true; the default isAES256, and production should setmode=aws:kmswith a dedicatedkmsKeyId - restore drills should use
deploy/ops/restore-postgres-backup.sh:./deploy/ops/restore-postgres-backup.sh s3://bucket/key.dump "$AGENT_BOM_POSTGRES_URL" REPLACE_ME_BUCKET_REGION - use Backup and Restore Runbook for the full operator checklist
- expose
GET /v1/auth/saml/metadatato your IdP admins and keepPOST /v1/auth/saml/loginbehind the same ingress hostname as the API - enable PDBs when you are running multi-replica workloads
Current boundary¶
The chart now packages the control plane honestly, but it still does not claim:
- a bundled Postgres subchart
- benchmarked throughput guarantees
- completed auth hardening beyond the currently shipped server contract
Those are the next operator-hardening layers, not hidden assumptions.