Data Ingestion and Security¶
This page explains how data gets into agent-bom, how it is normalized and used, and which security boundaries apply to each path.
The product should support four honest intake modes:
Direct scanRead-only integrationPushed ingestImported artifact
Those are not the same thing, and the security model is different for each.
Platform record invariants¶
Regardless of intake mode, the persisted platform record should keep a few stable invariants:
tenant_idis normalized server-side and never trusted from ad hoc UI state- timestamps are stored in UTC ISO-8601 form for correlation, graph windows, and audit
- collectors may differ, but the canonical model stays the same across fleet, gateway, runtime, and findings
That contract matters because graph slices, fleet rollups, runtime evidence, and audit trails all depend on consistent tenant and time semantics.
The four intake modes¶
| Mode | What agent-bom does | Typical examples | Security posture |
|---|---|---|---|
Direct scan |
Reads local or directly reachable targets itself | MCP config discovery, project scans, image scans, IaC scans, some cloud discovery | Read-only scanner behavior |
Read-only integration |
Connects to an existing system that already contains the data | Snowflake governance, connector-backed discovery, cloud account inventory | Read-only connected source |
Pushed ingest |
Accepts evidence pushed by the customer or collector | OTLP traces, runtime events, fleet sync, security-lake feeds | Inbound ingest with API/auth boundaries |
Imported artifact |
Parses customer-exported files without owning collection | SBOMs, inventories, JSON findings, offline exports | File import only |
Hosted control plane rule¶
In a hosted deployment, the UI is not a collector. The web app should only act as the operator surface for the control plane.
The secure and scalable split is:
UI= trigger jobs, configure sources, show health, review findingsAPI / control plane= auth, RBAC, tenant scope, orchestration, graph, persistence, audit, and policyworkers / connectors= collect from cloud APIs, connected systems, repos, images, and IaC targetsproxy / gateway= capture runtime MCP evidence and apply policy at the execution boundary
That is why every supported intake path should be reachable through one of these backend paths:
- API-triggered scan jobs
- read-only connectors
- authenticated pushed ingest
- imported artifacts
- proxy or gateway audit flows
This keeps credentials, execution, rate limits, retries, and tenant isolation in the backend instead of moving collection logic into the Node.js UI.
End-to-end flow¶
flowchart LR
subgraph Inputs["Input modes"]
A["Direct scan"]
B["Read-only integration"]
C["Pushed ingest"]
D["Imported artifact"]
end
subgraph Normalize["Canonical normalization"]
N1["Raw source evidence"]
N2["Canonical entities / findings / events"]
N3["Graph nodes / edges / paths"]
end
subgraph Product["Product surfaces"]
P1["Findings"]
P2["Remediation"]
P3["Security Graph"]
P4["Agent Mesh"]
P5["Compliance / Governance"]
P6["Runtime / Gateway / Fleet"]
end
A --> N1
B --> N1
C --> N1
D --> N1
N1 --> N2
N2 --> N3
N2 --> P1
N2 --> P2
N3 --> P3
N3 --> P4
N2 --> P5
N2 --> P6
Direct scan¶
Direct scan is the most local-first mode. agent-bom reads the target itself and derives inventory, findings, and graph context.
Typical sources:
- MCP client config discovery
- project and lockfile scans
- container image scans
- Kubernetes manifests
- Terraform, Helm, CloudFormation, Dockerfile IaC
- selected agentless cloud inventory reads
flowchart LR
A["Local files / project / image / cloud API"] --> B["Scanner / discovery modules"]
B --> C["Packages / servers / agents / cloud resources"]
C --> D["Vulnerability + misconfiguration matching"]
D --> E["Canonical findings + graph"]
Security boundaries:
- scanner mode is read-only
- no source code or credential values are sent to third parties
- outbound lookups are limited to explicit enrichment sources unless offline mode is used
- cloud discovery remains read-only and should never mutate the provider
Relevant docs:
Read-only integration¶
Read-only integration is for systems that already hold the relevant evidence. agent-bom connects to them instead of rebuilding their collection path.
Typical sources:
- Snowflake governance data
- warehouse-backed security or activity data
- connector-backed enterprise sources
- cloud account inventory where the customer grants read-only access
flowchart LR
A["Customer source of truth"] --> B["Read-only connector / query path"]
B --> C["Canonical envelopes"]
C --> D["Findings / governance / graph context"]
Security boundaries:
- customer-owned source systems remain authoritative
agent-bomshould prefer read-only access- no write-back to the upstream platform unless the feature explicitly says otherwise
- warehouse or lake integrations should be treated as connected sources, not hidden scan jobs
Relevant docs:
Pushed ingest¶
Pushed ingest is for evidence that is already collected elsewhere and sent into agent-bom.
Typical sources:
- OTLP traces to
POST /v1/traces - runtime events
- fleet sync submissions
- analytics or security-lake event batches
flowchart LR
A["Collector / agent / platform"] --> B["API ingest route"]
B --> C["Validation / auth / tenant scope"]
C --> D["Canonical event / trace records"]
D --> E["Correlation against known assets and findings"]
Security boundaries:
- this is not local scanner mode; it is an API ingestion surface
- API auth, RBAC, rate limits, request size limits, and audit logging matter here
- the ingest path should preserve source evidence and timestamps
- analytics backends may store summaries or events, but the canonical model still governs the product view
Relevant docs:
Imported artifact¶
Imported artifact is the simplest enterprise-friendly path when the customer already exports data.
Typical sources:
- CycloneDX or SPDX SBOMs
- inventory JSON
- external scanner results
- offline evidence exports
flowchart LR
A["Exported file"] --> B["Artifact parser"]
B --> C["Canonical packages / findings / metadata"]
C --> D["Findings, remediation, compliance, graph context"]
Security boundaries:
agent-bomparses the artifact only; it does not manage the upstream source system- good fit for air-gapped or approval-heavy environments
- file parsing still needs validation and bounded trust assumptions
Relevant docs:
Runtime, fleet, gateway, and policy surfaces¶
Discovery and ingest are only the front door. Once data is in the canonical model, the product exposes different operating surfaces:
| Surface | Primary purpose |
|---|---|
Findings |
Evidence-first exploration |
Remediation |
Fix-first prioritization |
Security Graph |
Path and blast-radius analysis |
Agent Mesh |
Agent-centered shared-infrastructure topology |
Fleet |
Multi-agent operational inventory |
Gateway / Proxy |
Runtime tool-call enforcement and audit |
Compliance / Governance |
Framework and policy views |
This separation is intentional. The intake path should not be confused with the product view layered on top of it.
How data is secured¶
The main security rules are:
- preserve raw source evidence only as needed for audit and debugging
- normalize into canonical
agent-bomentities, findings, and events - do not store credential values
- keep direct discovery read-only
- treat pushed ingest as an authenticated API boundary
- treat connected sources as explicit read-only integrations
- only project to OCSF when interoperability requires it
flowchart TD
A["Raw source evidence"] --> B["Validation and redaction"]
B --> C["Canonical normalization"]
C --> D["Persistence / graph / analytics"]
D --> E["Views and exports"]
E --> F["Raw product views"]
E --> G["Optional OCSF projection"]
What leaves the machine and what does not¶
The short version:
- local scanning and config discovery stay local-first
- offline mode can avoid external enrichment calls entirely
- direct cloud discovery uses explicit read-only provider access
- pushed ingest sends data into
agent-bom, not fromagent-bomto arbitrary third parties - optional exports and SIEM integrations are explicit, not hidden defaults
For the detailed trust model:
Current product stance¶
The product should be described honestly:
New Scanis for direct scan jobsData Sourcesis the map of direct scans, connected sources, pushed ingest, and imported artifactsFindings,Remediation,Security Graph,Mesh,Fleet,Gateway, andComplianceare operating surfaces after the data is in the system
That keeps the architecture:
- accurate
- interoperable
- scalable
- not marketing ahead of the actual data model