Steering Agents Without Caging Them

FOLIO CXI 2026-06-24 · 7 MIN · SHORT-FORM

Steering Agents Without Caging Them

Steering agents from the server middleware instead of the tool descriptions.

Diagram · folio cxi

sequenceDiagram
  participant A as Agent
  participant V as Tool visibility
  participant Z as Auth / Authz
  participant G as Session gate
  participant U as Audit + metrics
  participant E as Error contract
  participant R as Rule enforcement
  participant N as Enrichment
  participant H as Handler
  A->>V: tools/call
  V->>Z: filter visible tools
  Z->>G: set PlatformContext
  G->>U: gate check
  U->>E: record call
  E->>R: workflow gating
  R->>N: enrich
  N->>H: invoke
  H-->>A: result (normalized errors, soft notes)

Agents skip the steps you want them to take. Point one at a data platform and it will reach for trino_query and write SQL against a table it never looked up, ignoring the catalog that would have told it the table was deprecated last month and that one of its columns holds PII. Put “search the catalog first” in the tool description and the model will read it, repeat it back to you, and run the query anyway. That is how these models behave, and a description does not change it. The fix has to sit in the server, in the middleware every call passes through. mcp-data-platform does this three ways: workflow gating, the ordering of the middleware stack, and a structured error contract.

This is the third post in MCP by Design. The previous post composed several MCP servers into one process and left a shared substrate behind: every toolkit holding a handle to the catalog and the query engine. This post is what runs on top of it. It is grounded in the open-source txn2/mcp-data-platform, also available hosted as Plexara.

§Comprehension Is Not Compliance

I wrote a whole case study on this once, The Two Failure Modes That Break Your AI Data Agent. The one that matters here is the second: the agent reads the tool description, summarizes it correctly, and then acts on its training-data prior instead. It knows you want discovery before query. It queries anyway, because every example of a database it ever saw went straight to SQL.

Guidance that lives only in the description is advisory, and the agent weighs advice against a strong prior and loses. Steering has to live where the call cannot route around it: the middleware between the request and the handler. That gives two moments to act, before the call and after it, and the platform uses both.

§Steer Before the Call, and After It

Before the call, the platform rewrites what the model sees. A description-override layer replaces the stock trino_query description with one that tells the agent, at the point of decision, to look the table up first. This is the cheapest nudge and it catches the agents that are merely on autopilot rather than actively overriding you.

After the call, the workflow tracker does the harder work. It watches what tools a session actually invokes and annotates the result when the order was wrong:

var DefaultDiscoveryTools = []string{
	"datahub_search", "datahub_get_entity", "datahub_get_schema",
	"datahub_get_lineage", "datahub_get_queries", "datahub_browse",
	"datahub_get_glossary_term", "datahub_get_data_product",
}

var DefaultQueryTools = []string{
	"trino_query", "trino_execute",
}

Every tool call is recorded against the session. Calling a query tool with no prior discovery is the condition the platform reacts to, and the reaction escalates:

func (t *SessionWorkflowTracker) RecordToolCall(sessionID, toolName string) {
	// ...
	if t.discoverySet[toolName] {
		state.discoveryTools[toolName] = now
		state.warningCount = 0 // discovery resets the escalation
	}
	if t.querySet[toolName] {
		state.queryTools[toolName] = now
	}
}

The first query without discovery gets a mild note appended to the result. Repeat the violation and the warning count climbs; past a configured threshold (EscalationAfterWarnings) the platform swaps in a stronger message (EscalationMessage). Call a discovery tool and the count resets to zero. The tracker never blocks the query, because hard-blocking a model mid-task tends to send it down a worse path. It lets the call through and makes the result carry the cost of the skipped step, escalating until the order corrects itself.

The enrichment layer does the same. When a result comes back for a session that never performed discovery, the enrichment middleware still attaches the semantic context, but it prepends a soft note that the agent should have searched the catalog first. The steering runs at several points on the response path rather than one.

§Ordering the Middleware Stack

All of this is middleware, and the order matters more than it first appears. The Go SDK registers receiving middleware by wrapping the current handler, so the last middleware added becomes the outermost layer and runs first. The platform documents this in the code, because it is easy to get wrong:

// IMPORTANT: AddReceivingMiddleware wraps the current handler, so each
// call makes its middleware the new outermost layer. The LAST middleware
// added runs FIRST. We add innermost middleware first and outermost last.
//
// Desired execution order (outermost → innermost → handler):
//   Tool visibility → Apps metadata → Auth/Authz → Session gate →
//   Audit → Rules → Client logging → Enrichment → handler

That order is a sequence of dependencies. Auth and authz run near the outside and, besides allowing or denying the call, create the PlatformContext that carries the user, the persona, and the toolkit kind. Everything inner to them can read it. That is why audit sits inner to auth, as the comment at its registration spells out:

// 4. Audit - logs tool calls (reads PlatformContext set by Auth/Authz above)
// 6. Auth/Authz (outermost for tools/call) - ... creates PlatformContext.
// Must be outer to Audit so PlatformContext is available in the ctx that
// Audit receives.

Invert those two and audit logs every call with an empty user. Nothing errors. You lose attribution and find out during an incident review that the one question you needed the audit trail to answer is the one it cannot.

The session gate sits inner to auth so it can read the context, and outer to audit so a call it blocks never reaches the audit layer. A blocked call should leave no trace but its rejection. Reverse it and the audit log fills with phantom events for calls that never ran. Two ordering constraints, each there because something goes quietly wrong when you violate it. The SDK will not arrange this for you, and it is easy to skip when a server has one middleware or none.

§The Error Contract

The last mechanism is the one I would add to most MCP servers I have seen. When a handler fails, what does the agent receive? In a server that adds nothing, whatever string the error happened to carry: a bare EOF, a pq: relation does not exist, a Go panic flattened into a stack trace. The agent has no way to tell whether the failure is retryable, so it retries the same call unchanged and fails the same way.

The platform refuses to let that happen. One always-on middleware normalizes every failure into a structured envelope:

// 3.5. Error contract - normalizes every tools/call error result into a
// self-describing {code, category, message, hint} envelope and recovers a
// panicking handler into a categorized internal error. ... Always on: an
// uncategorized error result must never reach the agent as an opaque string.

The structure is what makes it usable. A category tells the model what kind of failure this is: a bad input it should fix, an authorization denial it should stop retrying, a transient backend error it may retry, an internal fault it should report and abandon. A hint tells it what to do next. The code lets tooling branch on it. The middleware also recovers panics, so a nil dereference deep in a handler becomes a categorized internal error instead of a dropped connection. It sits inner to audit and metrics on purpose, so those layers observe the normalized category rather than the raw error, which means your dashboards group failures by the same taxonomy the agent sees. The error result is as much a part of the tool’s contract as the success result, and a server that leaves errors unstructured is shipping a tool the agent cannot reason about when it fails.

§Steer, Don’t Cage

All three mechanisms address the same reader. It is not a program that does exactly what you wrote; it is a capable and fallible one that usually cooperates but will override you on a strong prior, and it acts on whatever you return. So the platform nudges before the call and annotates after it, and it structures every outcome so the model can act on it. That is steering, not caging: caging a model mid-task usually produces a worse task.

One limit is worth stating plainly. The workflow tracker keys on the session, not the user. A person running two conversations against the platform has two independent trackers, and skipping discovery in one teaches the other nothing. Session-scoped steering is the right default for cost and isolation, but it is not user-scoped governance; if you need that, you have to thread identity through the tracker yourself.

The next post turns from steering the agent to capturing what it learns: the knowledge loop, where an insight discovered mid-session becomes a reviewed, signed-off change to the catalog instead of evaporating when the conversation ends.

The platform behind this series is txn2/mcp-data-platform, available hosted as Plexara.

Craig Johnston · 2026-06-24 ← back to all notes

Steering Agents Without Caging Them

§Comprehension Is Not Compliance

§Steer Before the Call, and After It

§Ordering the Middleware Stack

§The Error Contract

§Steer, Don’t Cage

Webmentions