The Agent Did What? Why Agentic AI Is Outpacing Your Model Risk Framework

When SR 11-7 was issued in 2011, the regulatory concern was precise: banks were deploying quantitative models to make consequential decisions (credit scores, stress losses, fraud flags) without adequate oversight.

The guidance worked. It gave the industry a common language and a governance baseline: model inventories, validation lifecycles, tiered risk ratings, independent challenge.

Agentic AI breaks that baseline. Quietly. And fast.

What's Actually Different This Time

A traditional model is a function. Data in, output out. You can scope it, validate it, own it.

An agentic AI system is an orchestrator. Given an objective, it:

Plans a sequence of steps
Invokes tools, sub-models, and external data sources
Adapts its reasoning path based on intermediate outputs
Produces a result that no single component fully explains

That adaptive reasoning chain is where governance breaks down, because SR 11-7 was built to evaluate what a model is, not what an agent decides to do next.

A Scenario Worth Sitting With

Imagine a bank deploying an agentic system to support commercial loan underwriting. The agent pulls financials, references market data, invokes a credit scoring sub-model, and drafts a risk memo for analyst review.

Now ask:

Is the scoring sub-model in your inventory, with a record noting it is now called by an autonomous agent?
Who validated the orchestration layer that sequences all of this?
Does your model risk team even know the agent exists?

At most institutions today, the honest answer is: not clearly, no one formally, and possibly not.

How Banks Are Responding, and Where It Falls Short

Three patterns show up repeatedly. None fully holds.

1. Validate sub-components individually

Useful, but the orchestration layer, where novel risk actually emerges, sits completely outside governance.

2. Classify the whole agent as a single model

Tempting, but traditional validation methods do not map onto adaptive, multi-step reasoning. How do you benchmark a system whose decision path changes with every run?

3. Defer to an "emerging technology" committee

The most common response. Also the most dangerous, because pilots scale to production while the governance determination stays pending.

The Questions Examiners Will Ask

These are not hypothetical. They are already showing up in MRM reviews and internal audit findings:

Which agentic systems are deployed, and are they in the model inventory?
What is each agent's defined decision scope, and what are its hard limits?
How was the orchestration logic validated, and by whom?
What data assets does the agent consume, and are controls on those assets documented and tested?
How do you detect behavioral drift between formal validation cycles?
Can you reconstruct the full reasoning chain behind a consequential decision?

If the answers live in engineering wikis rather than governance records, that is a current supervisory exposure, not a future one.

From Model Validation to Agent Certification

This is the reframe that matters. They are not the same thing.

Model Validation

Agent Certification

Point-in-time review

Continuous, evidence-grounded

Periodic report and risk rating

Living, queryable certification state

Evaluates what the model was

Tracks what the agent does

Agent certification requires:

Regulatory lineage: obligation to policy to control to data asset to agent. Maintained as a current relationship, not a diagram in a policy deck.
Data traceability: every asset the agent consumes must have documented controls, clear ownership, and a fitness-for-purpose standard.
Behavioral boundaries: what the agent can do autonomously, what requires human review, and what triggers a hard override. These are governance records, not code comments.
Continuous evidence capture: not audit samples. A persistent, structured trail of agent actions, inputs, and outputs that can be queried at any time.
Visible certification states: Backlog to In Review to Certified to Suspended. Executives, audit, and regulators should see this without scheduling a meeting.

Who Owns the Gap

Head of Model Risk: Does your policy explicitly address agentic systems? Most policies written before 2024 do not. That is an inventory gap for systems already influencing material decisions.

CRO: When a loss traces back to an agent-influenced decision, the question will be: who owned the governance of that agent? At most institutions today, no one in the second line formally does.

Internal Audit: Sampling an adaptive system that generates continuous behavioral data is not audit coverage. At scale, it is a gap dressed up as one.

Examiner: Show me every agentic system, what it influences, and what governance framework it sits under. An institution that cannot answer clearly is carrying a material gap, regardless of how sophisticated the underlying technology is.

The Scale Problem

At smaller institutions, a manual, document-centric model governance approach can hold. The inventory is manageable. Sampling is meaningful.

At the $50B threshold and above, that model becomes a supervisory risk the moment adaptive AI enters consequential decision flows.

Regulators do not grade on a curve for complexity. "We are still developing our AI governance framework" is not defensible when the agent is already in production.

The Agent Control Model

"Human in the loop" is not a governance framework. It is a phrase. It is vague, it is not enforceable, and it does not tell an examiner anything meaningful about how consequential decisions are controlled.

The more useful framing is: defined decision rights, hard boundaries, and certification state.

Defined decision rights: Where is the agent authorized to act autonomously? What categories of decisions fall within its scope, and which require a human to sign off before any action is taken?
Hard boundaries: Where must the agent escalate, regardless of its confidence or the apparent quality of its output? These are not soft guidelines encoded in a prompt. They are documented governance constraints with a named owner and a review record.
Certification state by input: Where can the agent not operate at all without certified inputs? If a data asset feeding the agent lacks documented controls, clear ownership, or a fitness-for-purpose standard, the agent's use of that asset is itself an uncertified action.

This is the starting point for agent governance that can survive regulatory scrutiny. Not a committee. Not a wiki. A structured, queryable record of what the agent is permitted to do, what stops it, and what conditions must be satisfied before it acts.

Generative AI answered questions. Agentic AI takes actions. At scale, the question regulators and boards will eventually ask is not whether your models are validated. It is whether the systems making decisions on behalf of your institution are certified, traceable, and governed, continuously, not just at your last exam.

← Back to All Posts