Containment for what agents do. Provenance for what they produce.
Agentic AI has a credibility problem that doesn’t get talked about honestly enough. Most agent frameworks treat the model as the substrate — the prompt is the contract, the system message is the guardrail, the tool list is the perimeter. That works in a demo. It falls apart the moment an agent ships work to a customer, touches a production system, or operates against a target where the cost of “the model was creative today” is an external party getting hurt.
We’ve been building a substrate that takes a different position. The runtime is the substrate. The model is a citizen inside it.
Today we’re introducing Agent Ready Armor (ARA) — the second product in the Ready Armor Suite, after Battle Ready Armor (BRA). ARA inherits BRA’s Control in Depth philosophy and applies it to the place where the consequences of unchecked autonomy are most acute: AI agents operating in the real world.
The two failure modes
Agents fail in two distinct ways, and most agent infrastructure addresses only one of them.
The first failure is what the agent does. Unauthorized action — against systems, data, identity, third parties. A pen-test agent that probes an out-of-scope asset. A customer-facing assistant that exfiltrates a credential. A research agent that writes to a production database it was never supposed to see. The remediation is containment: narrow the tool surface, enforce the policy, watch the syscalls.
The second failure is what the agent claims. Output that looks correct but isn’t grounded in real evidence. A finding the agent inferred but can’t cite. A claim that traces back to nothing. A report that ships to a client carrying a vulnerability that doesn’t exist. The remediation here is different: provenance. Every claim must be backed by a chain that resolves to a specific source.
ARA defends both. They’re not bolted onto a model or a prompt — they are properties of the runtime the agent runs inside.
Three tiers, composed
Most policy systems for agents conflate three different things: identity, engagement context, and the procedure being executed. We keep them separate.
Armor is the operator’s posture. Persistent, hand-curated. What the agent will never do, regardless of task. Hotswappable, exportable, tradable, mechanically enforced.
Trim is the engagement overlay. Generated per engagement from intake metadata, applied live, expires when the engagement closes. Where, when, against whom, under what rules of engagement. Same trait properties — hotswappable, exportable, tradable, mechanically enforced.
Weapon is the procedure. Forged once from operator-deposited source materials, sealed, replayable. What this run is performing — with citations, not recall.
The three layers compose deterministically. The agent’s effective policy at any moment is the product of all three; no layer can soften another’s denial.
The Forge: where weapons come from
This is the piece that’s most distinctive about ARA, and the piece we’re proudest of.
Most agent frameworks let the model decide its own playbook on the fly. The agent reads the docs, draws inferences, and picks its next step. That works until it doesn’t — until the agent invents a step that wasn’t actually documented, cites a procedure that doesn’t exist, or confidently extrapolates past what the source material supports.
We invert this. The agent doesn’t compose its playbook. It executes one that was authored, validated, and sealed before the engagement started. The Forge is what produces those playbooks.
A few things make the Forge unusual.
The agent forges the agent’s own weapons. When the operator opens a forge session, ARA hot-swaps in a special posture — a blacksmith armor + a blacksmith-hammer weapon — and the agent itself runs the authoring, under full containment. Source materials never leave the contained workspace. The blacksmith is itself an ARA armor; the hammer is itself a sealed weapon. The substrate produces the substrate. The same primitives that govern an engagement also govern the work that produced the engagement’s procedure.
Multi-format ingestion, no pre-processing. Source materials go in as the operator already has them — text, logs, PDFs, spreadsheets, Word documents, HTML, even screenshots and scanned images. No reformatting, no flattening required.
End-to-end provenance. What emerges isn’t a flat playbook. It’s a typed, cross-citing, multi-layer weapon record where every claim traces back to a specific location in a specific source document. There is no “emergent” claim — only retrieved-and-explained.
Calibration-gated seal. The Forge can’t seal a weapon whose retrieval doesn’t pass an operator-authored test. The operator writes queries with expected results; before seal, the weapon is measured against them. If accuracy falls below threshold, the seal blocks. The operator finds out before the agent runs that the weapon won’t recall the right things.
Designed against ten cognitive-science approaches. The Forge’s schema isn’t an arbitrary engineering choice. We mapped every published agent-memory mechanism we could find onto an ARA primitive and either approved, deferred, or skipped on its merits. Ten ship as foundational; several more have their infrastructure ready. Each produces a structural property of the sealed weapon that no model could fabricate at runtime.
Why offensive security is the flagship
Pen testing is the hardest case for agentic AI today, which makes it the right shape for proving a substrate works.
The cost of an agent doing the wrong thing is immediate and external — a target gets disrupted, a contract gets violated, an out-of-scope asset gets probed. Authorization isn’t binary; real engagements have rules of engagement that are scoped, time-bounded, and conditional. Findings ship to clients — and hallucinated findings ship to clients too, unless something stops them. The cost of a fabricated vulnerability report is reputational damage that no amount of “the model is getting better” can recover.
If ARA can make agents safe and accountable in offensive security, the same substrate covers softer use cases without modification: customer-facing assistants, internal data platforms, regulated research, long-running operations under change-control windows. ARA isn’t a wrapper for one runtime, one model, or one provider — it’s a layer the runtime runs inside.
Where it sits in the Ready Armor Suite
ARA is the second product in the Ready Armor Suite. The first is Battle Ready Armor (BRA) — the operational counterpart in the human-driven security domain. Both follow the same Control in Depth philosophy: layered guarantees enforced at multiple, independent points, with operator authority preserved at every boundary. BRA does this for human operators in the field. ARA does it for the agents those operators are increasingly deploying alongside themselves.
If you’ve worked with BRA, ARA’s three-tier model and operator-first design language will feel immediately familiar. If you haven’t, ARA stands on its own — but the suite framing matters: this is a deliberate, multi-product line that takes operator-led control seriously across the next decade of human-and-agent collaboration.
Status and what’s next
ARA is self-hosted, single-operator-friendly, and licensed for internal use under selective preview. We’re sharing the teaser publicly today. A small group of design partners is already working with the substrate; we expect more over the next few months.
If you operate professional offensive security work and the combination of containment and provenance solves a real problem for you — or if you’re standing up agentic deployments in any domain where the cost of error is external — we’d like to hear from you.
Read more about it on GitHub HERE
And HERE
