Designing operator consoles for autonomous agents.
Anju
Agent products are weird to design for. The user isn't doing the work — the agent is. So what does the user actually need to see?
Most AI agent startups answer this by showing nothing (and demos fall flat) or by showing everything (and operators drown). The right answer sits in between, and gets to the heart of what we call the trust gap.
The trust gap
When a human user clicks a button, they trust their own intent. When an agent takes an action, the operator has to trust the agent's reasoning. That trust doesn't exist by default. The console's primary job is to build it, transaction by transaction.
If the operator can't tell why the agent did what it did, they'll never approve giving it more autonomy.
Four surfaces that matter
1. Trace view — what the agent did, step by step
For every agent run, show:
- The input it received
- Each tool call it made (named, with parameters)
- The reasoning between tool calls (collapsed by default, expandable)
- The final output
- Time and cost per step
This is the single most important screen for any agent product. If you build nothing else, build this. It turns "the AI did something" into "the AI did X, then Y, then Z because…" — a story the operator can audit.
2. Eval surface — was the result good?
Most agent runs are ambiguous in success/failure. Did the email actually solve the customer's problem? Did the code change pass the test the agent claimed it ran?
The eval surface lets operators (or graders) mark outcomes:
- Quick thumbs up / down on the result
- Optional structured feedback ("tone wrong", "missed context", "wrong tool")
- Linked back to the trace for retraining
3. Interrupt / correct — can I stop or steer it?
For long-running agents, operators need a way in. Either:
- Pause the agent mid-run
- Inject a correction ("don't use that tool, use this one")
- Kill switch with confirmation
Even if the operator never uses these — they need to see them. The presence of controls signals "you're in charge, not the agent."
4. Confidence indicators — how sure was it?
Surface the agent's confidence — even if it's a heuristic. A green badge on confident runs vs. an amber one on uncertain ones helps operators triage which to review.
Don't fake precision here. "High / Medium / Low confidence" beats "82.3%" because the operator knows the latter isn't real.
The anti-pattern: hiding the agent's reasoning
The biggest mistake we see: agent products that show only the final output, like ChatGPT. "Here's your answer."
This works for consumer chat. It fails for operator consoles. The operator's job isn't to consume the output — it's to verify and approve. They need to see the work.
Show too much, not too little. Operators can collapse what they don't need. They can't expand what isn't there.
Agent UX is product UX
The biggest shift in our work over the past year: treating agent products as product UX challenges, not dev-tool UX challenges. The user buying an agent platform isn't a researcher debugging LangChain. They're an ops lead, a manager, a buyer — someone who needs to trust the system and demo it to their CFO.
Design for that person, and the trust gap closes faster.