Unified Platform Control Plane¶
This is the product contract for agent-bom as a self-hosted platform.
It exists to keep the code, UI, CLI, API, docs, storage, and deployment model aligned around one system instead of a set of loosely related tools.
Product identity¶
agent-bom is the self-hosted control plane for:
- AI supply chain security
- AI and cloud infrastructure security
- MCP security
- agent and endpoint security
- runtime policy and audit
It should feel like one coherent operator platform across:
- scan and discovery
- fleet and endpoint inventory
- MCP inventory and granted surface area
- proxy runtime inspection and enforcement
- gateway policy and shared MCP traffic
- graph, findings, remediation, audit, and evidence
Packaged product surfaces¶
The platform is shipped through multiple operator-facing surfaces that must stay semantically aligned:
| Surface | Role in the product |
|---|---|
| CLI | local scans, CI-friendly execution, endpoint discovery, fleet push, remediation, exports |
| Docker image | isolated execution, self-hosted runtime image, API/jobs/gateway/proxy entrypoints |
| Node.js UI | browser control-plane workflow surface, same-origin operator experience |
| API | tenant-scoped control-plane contract for findings, fleet, graph, audit, policy, and auth |
| CI/CD offering | GitHub Action and pipeline-friendly gating for repos, images, IaC, and MCP config scans |
| MCP server mode | exposes agent-bom itself as tools to MCP-capable clients |
| Skills | curated MCP/agent context and productized capability surfaces that should still map back to the same model |
| Proxy | workload-local or endpoint-local runtime enforcement and audit |
| Gateway | shared MCP traffic, shared policy evaluation, upstream discovery, runtime audit |
The rule is simple:
- every surface can specialize
- no surface gets its own conflicting product model
The CLI, Docker path, CI/CD path, UI, MCP server mode, and runtime surfaces should all describe the same entities, provenance, findings, and retention semantics.
Platform principles¶
The platform should always be:
- self-hosted first: runs in customer-controlled infrastructure today
- operator-clear: every score, finding, policy, and graph edge is inspectable
- secure: tenant-scoped, fail-closed where needed, auditable, policy-driven
- scalable: transactional control plane stays lean; event-scale history can offload
- performant: runtime enforcement stays close to workloads; heavy history belongs in analytics tiers
- consistent: CLI, API, UI, docs, and diagrams describe the same system
- interoperable: storage and export paths remain flexible without changing the product model
Core surfaces¶
These are core product surfaces, not side features:
| Surface | What it does |
|---|---|
| API + UI control plane | auth, RBAC, tenant scope, graph, findings, remediation, audit, evidence, operator workflows |
| Scan / discovery | repos, images, IaC, skills, MCP configs, packages, and cloud surfaces |
| Fleet | endpoint and collector inventory, sync, last-seen state, trust/lifecycle metadata |
| Proxy | endpoint or workload-local runtime inspection, policy enforcement, runtime audit |
| Gateway | shared remote MCP traffic plane, policy pull/evaluation, runtime audit, upstream discovery |
That means the product is not just:
- a Node app
- a scanner
- an MCP server
- a proxy
It is the coordinated control plane behind all of them.
The deployment can be selective by workload, but these surfaces are all part of the product.
Inventory and runtime are both first-class¶
There are two valid starting points:
- inventory and discovery first
- runtime policy and enforcement where needed
That means:
- scans and fleet sync should already provide useful MCP inventory without proxy rollout
- proxy and gateway should deepen the same model into live runtime visibility and enforcement
Runtime is not secondary. It is a later operational layer on top of the same canonical control plane.
Canonical MCP provenance model¶
Every MCP object should be explainable through provenance, not just existence.
For each MCP server, the system should be able to say:
- discovered in repo config
- present on 14 endpoints via fleet
- registered as a gateway upstream
- runtime-observed on 3 workloads in the last 24 hours
The canonical provenance fields should cover:
| Field | Meaning |
|---|---|
tenant_id |
tenant ownership |
source_id |
source or sync origin |
observed_via |
repo_scan, fleet_sync, gateway_discovery, proxy_runtime, import |
observed_scope |
repo, endpoint, cluster, gateway, runtime |
deployment_mode |
local, fleet, cluster, hybrid |
first_seen |
first observation |
last_seen |
last observation |
last_synced |
last successful sync into control plane |
runtime_observed |
whether runtime evidence exists |
gateway_registered |
whether gateway upstream registration exists |
fleet_present |
whether endpoint inventory includes it |
repo_present |
whether repo/config discovery includes it |
transport |
stdio, sse, http, streamable-http, or unknown |
auth_mode |
best-effort posture such as env-credentials, local-stdio, network-no-auth-observed |
command |
configured local command when applicable |
url |
configured remote URL when applicable |
config_path |
where the config was discovered |
Correlation rule¶
The product should correlate:
- repo scan
- fleet inventory
- gateway upstream discovery
- proxy runtime evidence
into one MCP object whenever they refer to the same server surface.
That correlation should power:
- Agents
- Fleet
- Registry
- Graph
- Findings and blast radius
The operator should not have to mentally join four disconnected records.
EKS reference shape¶
For a customer deploying in AWS / EKS, the normal shape is:
In EKS¶
agent-bom-apiagent-bom-ui- scan and discovery workers / CronJobs
agent-bom-gateway- selected proxy sidecars on chosen workloads
Outside or adjacent¶
- Postgres / RDS
- optional ClickHouse
- optional S3 archive / evidence / backup
- ingress, IdP, secrets, IRSA
Data flow¶
- scan jobs discover repos, images, IaC, skills, MCP configs, and cloud surfaces
- fleet sync pushes endpoint inventory into the control plane
- gateway contributes shared remote MCP discovery and upstream registration
- proxy contributes runtime-observed MCP evidence where deployed
- the UI presents one correlated inventory and policy view with provenance labels
Storage tiers¶
The storage model should stay explicit:
| Tier | Role |
|---|---|
| Postgres | control-plane truth for transactional state |
| ClickHouse | event-scale analytics and runtime history |
| S3 | evidence archive, backups, export bundles |
| Snowflake | warehouse-native governance and security-lake workflows where shipped parity exists |
| Databricks | future lakehouse target when code-backed support is shipped |
The rule is:
- the control-plane truth should not depend on every analytics backend
- analytics and lake backends should be interoperable sinks or optional deeper stores
- the product semantics should stay the same even when customers choose different storage tiers
Retention model by data class¶
Retention should be explicit by data class, not "whatever the backend keeps."
| Data class | Typical intent |
|---|---|
| Control-plane state | findings, graph, fleet, policy, auth, exceptions, schedules; persisted until pruned by operator policy |
| Runtime evidence | proxy/gateway audit, traces, OCSF events; medium retention in control plane, longer retention in analytics or SIEM if enabled |
| Compliance evidence | signed exports, audit bundles, backup artifacts; durable retention and archive path |
| Ephemeral runtime state | caches, replay windows, job buffers, local spillover; bounded and short-lived |
This should be visible in:
- docs
- config
- operator UI
- archive/offload guidance
Implementation checklist¶
Use this checklist to keep the platform aligned:
- [ ] canonical provenance fields exist in MCP inventory models
- [ ] API returns provenance and correlation metadata, not just summary counts
- [ ] CLI surfaces the same deployment and provenance model
- [ ] UI shows provenance pills and source rollups in Agents, Fleet, Registry, and Graph
- [ ] repo + fleet + gateway + runtime are correlated into one MCP object
- [ ] docs define retention by data class
- [ ] storage guidance is explicit for Postgres, ClickHouse, S3, Snowflake, and future lake targets
- [ ] README and deployment docs describe inventory and runtime as one coherent platform
- [ ] EKS reference architecture reflects the actual control-plane, scan, fleet, gateway, and proxy deployment shape
Managed future¶
If agent-bom becomes hosted or managed later, the hosted product should keep
the same core guarantees:
- same canonical data model
- same policy and audit semantics
- same exportability and retention clarity
- same operator understanding of what is discovered, observed, enforced, and stored
Managed convenience should be an operational layer on top of the same platform, not a different product.