Agents skip the steps you want them to take. Point one at a data platform and it will reach for trino_query and write SQL against a table it has never looked up, ignoring the catalog that would have told it the table was deprecated last month and that one of its columns is PII. You can put “search the catalog first” in the tool description, and the model will read it, repeat it back to you, and then run the query anyway. This is a behavior, not a bug, and you do not fix a behavior by asking the model nicely. You fix it in the server, on the path the call has to travel. This post is about the three mechanisms mcp-data-platform uses to steer agents on that path: workflow gating, a carefully ordered middleware stack, and an error contract built for a reader that is itself a model.
This is the third post in MCP by Design. The previous post composed several MCP servers into one process and left a shared substrate behind: every toolkit holding a handle to the catalog and the query engine. This post is what runs on top of it. It is grounded in the open-source
txn2/mcp-data-platform, also available hosted as Plexara.
§Comprehension Is Not Compliance
I wrote a whole case study on this once, The Two Failure Modes That Break Your AI Data Agent. The one that matters here is the second: the agent reads the tool description, summarizes it correctly, and then acts on its training-data prior instead. It knows you want discovery before query. It queries anyway, because every example of a database it ever saw went straight to SQL.
The consequence for design is specific. Guidance that lives only in the description is advisory, and an agent treats advice the way it treats a strong prior, which is to say it loses. So the steering has to live somewhere the call cannot route around: in the middleware between the request and the handler. There are two moments to act, before the call and after it, and the platform uses both.
§Steer Before the Call, and After It
Before the call, the platform rewrites what the model sees. A description-override layer replaces the stock trino_query description with one that tells the agent, at the point of decision, to look the table up first. This is the cheapest nudge and it catches the agents that are merely on autopilot rather than actively overriding you.
After the call, the workflow tracker does the harder work. It watches what tools a session actually invokes and annotates the result when the order was wrong:
var DefaultDiscoveryTools = []string{
"datahub_search", "datahub_get_entity", "datahub_get_schema",
"datahub_get_lineage", "datahub_get_queries", "datahub_browse",
"datahub_get_glossary_term", "datahub_get_data_product",
}
var DefaultQueryTools = []string{
"trino_query", "trino_execute",
}
Every tool call is recorded against the session. Calling a query tool with no prior discovery is the condition the platform reacts to, and the reaction escalates:
func (t *SessionWorkflowTracker) RecordToolCall(sessionID, toolName string) {
// ...
if t.discoverySet[toolName] {
state.discoveryTools[toolName] = now
state.warningCount = 0 // discovery resets the escalation
}
if t.querySet[toolName] {
state.queryTools[toolName] = now
}
}
The first query without discovery gets a note appended to the result. The note is mild. Repeat the violation and the warning count climbs, and past a configured threshold the platform escalates to a stronger message, configured by EscalationAfterWarnings and EscalationMessage. The moment the agent does the right thing and calls a discovery tool, the warning count resets to zero. The mechanism is a teacher, not a wall: it lets the query through, because hard-blocking a model mid-task tends to send it down a worse path, but it makes the result carry the cost of having skipped a step, and it gets louder until the behavior changes.
Even the enrichment layer participates. When a result comes back for a session that never performed discovery, the enrichment middleware still attaches the semantic context, but it prepends a soft note that the agent should have searched the catalog first. The steering is woven through the whole response path, not bolted on at one point.
§The Stack Is an Onion, and the Order Is Load-Bearing
All of this is middleware, and middleware has an order, and in an MCP server the order is not a detail you get to be casual about. The Go SDK registers receiving middleware by wrapping the current handler, so the last middleware added becomes the outermost layer and runs first. The platform documents the consequence in the code, because the consequence is easy to get wrong:
// IMPORTANT: AddReceivingMiddleware wraps the current handler, so each
// call makes its middleware the new outermost layer. The LAST middleware
// added runs FIRST. We add innermost middleware first and outermost last.
//
// Desired execution order (outermost → innermost → handler):
// Tool visibility → Apps metadata → Auth/Authz → Session gate →
// Audit → Rules → Client logging → Enrichment → handler
Read that order as a sequence of dependencies, because that is what it is. Auth and authz run near the outside and do one critical thing besides allowing or denying the call: they create the PlatformContext that carries the user, the persona, the toolkit kind. Everything inner to them can read it. That is why audit sits inner to auth, spelled out in the comment at its registration:
// 4. Audit - logs tool calls (reads PlatformContext set by Auth/Authz above)
// 6. Auth/Authz (outermost for tools/call) - ... creates PlatformContext.
// Must be outer to Audit so PlatformContext is available in the ctx that
// Audit receives.
Invert those two and audit logs every call with an empty user, silently, forever. Nothing breaks loudly. You just lose attribution and find out during an incident review that the one question you needed your audit trail to answer is the one it cannot.
The session gate is positioned with the same care: inner to auth so it can read the context, and outer to audit so that a call it blocks never reaches the audit layer at all. A gated call is not a real call; it should leave no trace but the rejection. Get that backwards and your audit log fills with phantom events for calls that never ran. Three layers, two ordering constraints between them, and each constraint exists because a specific thing goes quietly wrong when you violate it. This is the part of MCP design the SDK cannot do for you, and the part that is easy to skip when a server has one middleware or none.
§The Error Contract
The last mechanism is the one I would retrofit into every MCP server I have ever seen. When a handler fails, what does the agent receive? In a server that adds nothing, whatever string the error happened to carry. A bare EOF. A pq: relation does not exist. A Go panic flattened into a stack trace. The agent reads that, has no idea whether the thing is retryable, and does the worst possible thing, which is to try again, identically, three more times.
The platform refuses to let that happen. One always-on middleware normalizes every failure into a structured envelope:
// 3.5. Error contract - normalizes every tools/call error result into a
// self-describing {code, category, message, hint} envelope and recovers a
// panicking handler into a categorized internal error. ... Always on: an
// uncategorized error result must never reach the agent as an opaque string.
The shape is the whole point. A category tells the model what kind of failure this is: a bad input it should fix, an authorization denial it should stop retrying, a transient backend error it may retry, an internal fault it should report and abandon. A hint tells it what to do next. The code lets tooling branch on it. The middleware also recovers panics, so a nil dereference deep in a handler becomes a categorized internal error instead of a dropped connection. And it is positioned inner to audit and metrics on purpose, so those layers observe the normalized category rather than the raw error, which means your dashboards group failures by the same taxonomy the agent sees. The error result is part of the tool’s contract with the agent exactly as much as the success result is, and a server that treats errors as an afterthought is shipping a tool that lies about half its outcomes.
§Steer, Don’t Cage
The thread through all three mechanisms is the same. You are not building guardrails for a program that will do exactly what you wrote. You are building them for a capable, fallible reader that will mostly cooperate, occasionally override you on a strong prior, and always act on whatever you actually return. So you nudge before the call, you annotate and escalate after it, and you make every outcome, success or failure, something the model can read and act on correctly. You steer. You do not cage, because caging a model mid-task usually produces a worse task.
The limit is worth stating plainly, since the series argues against overstatement. The workflow tracker keys on the session, not the user. A person running two conversations against the platform has two independent trackers, and skipping discovery in one teaches the other nothing. Session-scoped steering is the right default for cost and isolation, but it is not user-scoped governance, and if you needed the latter you would have to thread identity through the tracker yourself. Naming the boundary is part of designing inside it.
The next post turns from steering the agent to capturing what it learns: the knowledge loop, where an insight discovered mid-session becomes a reviewed, signed-off change to the catalog instead of evaporating when the conversation ends.
The platform behind this series is txn2/mcp-data-platform, available hosted as Plexara.