Agents all the way down
A pattern for UI in MCP clients
Say you’re working on an agent (a model using tools in a loop). Furthermore, let’s say your agent uses the Model Context Protocol to populate its set of tools dynamically. This results in an interesting UX question: how should you show text tool results to the user of your agent?
You could just show the raw text, but that’s a little unsatisfying when tool results are often JSON, XML, or some other structured data. You could parse the structured data, but that’s tricky too; the set of tools your agent has access to may change, and the tool results you get today could be structured differently tomorrow.
I like another option: pass the tool results to another agent.
The Visualization Agent
Let’s add another agent to our system; we’ll call it the visualization agent. After the main agent executes a tool, it will pass the results to the visualization agent and say “hey, can you visualize this for the user?”
The visualization agent has access to specialized tools like “show table”, “show chart”, “show formatted code”, etc. It handles the work of translating tool results in arbitrary formats into the structures that are useful for opinionated visualization.
And if it can’t figure out a good way to visualize something, well, we can always fall back to text.
Why do it this way?
The big thing is that we can display arbitrary data to the user in a nice way, without assuming much about the tools our agent will have access to. We could also give the main agent visualization tools (tempting! so simple!), but:
- That can be very wasteful of the context window
- Imagine receiving 10,000 tokens from a tool, then the agent decides to pass those 10,000 tokens by calling a visualization tool - the 10,000 tokens just doubled to 20,000 in our chat history
- The more tools an agent has access to, the more likely it is to get confused
- A specialized visualization agent can use a faster+cheaper model than our main agent
It’s not all sunshine and roses; calling the visualization agent can be slow, and it adds some complexity. But I like this approach compared to the others I’ve seen, and we’re not far away from fast local models being widely available. If you’ve got another approach, I’d love to hear from you!