Tool Calls Are Expensive And Finite

Design your agents accordingly

Giving LLMs access to tools (which turns them into ✨agents✨) is an incredibly powerful way to give LLMs capabilities that go beyond generating text. But it’s important to think clearly about the costs and limitations of tool calling, and in particular, people should understand that calling a tool is many orders of magnitude more costly than calling a plain old function from code. There is and probably always will be a limit on how many tool calls an agent can effectively make, and people should design their agentic systems accordingly.

Wait, why?

For this to make sense, you have to consider what a tool call is “under the hood.” LLMs are typically used as very fancy text generation machines. And the way they do tool calls is by generating text, although that’s typically abstracted away from us.

Let’s say you have an agent with one tool, add, for adding 2 numbers together. A user asks the agent a question that’s easy to answer with the add tool:

What’s 15 + 27?

To actually call the add tool, the model generates a message like this (simplified):

{
  "tool_call_id": "call_abc123",
  "tool_name": "add",
  "tool_arguments": "{\"a\": 15, \"b\": 27}"
}

At this point the model stops generating tokens. The thing that’s driving the model (the agentic loop?) parses that message, passes those arguments to some function like add(15, 27), and then puts the output of that into chat history as a new message:

{
  "tool_call_id": "call_abc123", 
  "tool_call_result": "42"
}

Inference resumes, and the LLM now has everything it needs to tell the user that the answer is 42. This works! It’s the foundation of some really incredible software systems! But it wasn’t free:

  1. The model had to generate a bunch of tokens.
  2. We used up precious context window for the 2 messages.

But why does that matter?

If you’re adding 2 numbers once, it probably doesn’t matter. If you’re summing up 1,000 numbers… you’re going to be waiting a very long time for those 999 tool calls to finish, and you might blow through your entire context window.

This might seem like an academic point, but calling a function many times in a loop is one of the most common ways to solve a problem with code. To give a contrived example, say we have 100 user IDs and we want to count the users whose name starts with ‘R’:

  1. A programmer with a get_user_info(id) function can write+run a simple for loop
  2. An agent with a get_user_info(id) tool can try to make 100 tool calls, but it will probably run out of context window long before it finishes
    1. Remember, the entire result of every tool call ends up in the context window

Designing agentic tools that are flexible enough for every use case (or even most use cases) is hard, and I don’t think enough people are talking about that.

So what do we do instead?

As always, it depends. Maybe your agent is solving problems where it will never need to make large numbers of tool calls. Maybe you’re clever and you can design your tools to be very flexible+powerful. Maybe you can sidestep this problem by letting your agent write+run code (keeping in mind all of the necessary security precautions).

The Model Context Protocol (MCP) is a pretty big deal these days. It’s become the de facto standard for giving LLMs access to tools that someone else wrote, which, of course, turns them into agents. But writing tools for a new MCP server is hard, and so people often propose auto-converting existing APIs into MCP tools; typically using OpenAPI metadata (1, 2).

In my experience, this can work but it doesn’t work well. Here are a few reasons why:

Agents don’t do well with large numbers of tools

Infamously, VS Code has a hard limit of 128 tools - but many models struggle with accurate tool calling well before that number. Also, each tool and its description takes up valuable context window space.

Most web APIs weren’t designed with these constraints in mind! It’s fine to have umpteen APIs for a single product area when those APIs are called from code, but if each of those APIs is mapped to an MCP tool the results might not be great.

MCP tools designed from the ground up are typically much more flexible than individual web APIs, with each tool being able to do the work of several individual APIs.

APIs can blow through context windows quickly

Imagine an API that returns 100 records at a time, and each record is very wide (say, 50 fields). Sending those results to an agent as-is will use up a lot of tokens; even if a query can be satisfied with only a few fields, every field ends up in the context window.

APIs are typically paginated by the number of records, but records can vary a lot in size. One record might contain a large text field that takes up 100,000 tokens, while another might contain 10. Putting these API results directly into an agent’s context window is a gamble; sometimes it works, sometimes it will blow up.

The format of the data can also be an issue. Most web APIs these days return JSON, but JSON is a very token-inefficient format. Take this:

[
  {
    "firstName": "Alice",
    "lastName": "Johnson",
    "age": 28
  },
  {
    "firstName": "Bob",
    "lastName": "Smith",
    "age": 35
  }
]

Compare to the same data in CSV format:

firstName,lastName,age
Alice,Johnson,28
Bob,Smith,35

The CSV data is much more succinct - it uses up half as many tokens per record. Typically CSV, TSV, or YAML (for nested data) are better choices than JSON.

None of these issues are insurmountable. You could imagine automatically adding tool arguments that let agents project fields, automatically truncating or summarizing large results, and automatically converting JSON results to CSV (or YAML for nested data). But most servers I’ve seen do none of those things.

APIs don’t make the most of agents’ unique capabilities

APIs return structured data for programmatic consumption. That’s often what agents want from tool calls… but agents can also handle other, more free-form instructions.

For example an ask_question tool could perform a RAG query over some documentation, then return information in plain text that is used to inform the next tool call - skipping structured data entirely.

Or, a call to a search_cities tool could return a structured list of cities and a suggestion of what to call next:

city_name,population,country,region
Tokyo,37194000,Japan,Asia
Delhi,32941000,India,Asia
Shanghai,28517000,China,Asia

Suggestion: To get more specific information (weather, attractions, demographics), try calling get_city_details with the city_name parameter.

That sort of layering and tool chaining can be very effective in MCP servers, and it’s something you’ll miss out on completely if auto-converting APIs to tools.

If an agent needs to call an API, it could just do that

Agents like Claude Code are remarkably capable of writing+executing code these days, including scripts that call web APIs. Some people take this so far as to argue that MCP isn’t needed at all!

I disagree with that conclusion, but I do think we should skate to where the puck is going. Sandboxing of agents is improving rapidly, and if it’s easy+safe for an agent to call APIs directly then we might as well do that and cut out the middleman.

Conclusion

Agents are fundamentally different from the typical consumers of APIs. It’s possible to automatically create MCP tools from existing APIs, but doing that is unlikely to work well. Agents do best when given tools that are designed for their unique capabilities and limitations.

Land Values and Affordability

The relationship might not work the way you think

I want to get something off my chest: attempts to keep the price of urban land down are not necessarily good. Many people in local politics place a high priority on keeping land prices down. For example, the new Vancouver councillor who opposed a church building apartments on their own land:

Land values displace people. This will increase land values.

…and then used that same reason to vote against apartments at a major train station:

I’m worried that filtering will take too long, that land value increases will lead to displacement

Vancouver’s planning staff share these concerns and try to keep land values down when changing zoning. For example, the recent multiplex policy was designed to avoid raising land values:

Proposed density bonus contribution requirements & rates (are) set to… limit any potential land value escalation

thinking
That makes sense; if land value is lower then homes are more affordable, right?
Reilly
WRONG (if you keep land values down by stopping development)

Land Prices Are Not Housing Prices

The main way people save on housing costs in cities is by using less land. For example, imagine the following uses on a 4000 sqft lot:

Building Land/Household
Single family home (small) 4000 sqft
Duplex 2000 sqft
5-unit apartment/condo building 800 sqft

It is generally much cheaper to buy 800 square feet of land than it is to buy 4000. But where this gets interesting is that those denser uses may cause higher land prices. Let’s walk through how:

  1. Say that 4000 sqft lot is zoned to only allow a single-family home. Richie McRicherson is willing to pay $1M so he can build a house on that land. The land sells for 1 MILLION DOLLARS.
  2. Now, suppose the land is zoned to allow a duplex. 2 households who each have $600k pool their money together and outbid Richie. The land sells for 1.2 MILLION DOLLARS.
  3. Finally, suppose the land is zoned to allow a 5-unit condo building. 5 households who each have $400k pool their money and outbid both Richie and the duplex buyers. The land sells for 2 MILLION DOLLARS.
Building Land Price Land Price/Sqft Land Price/Household
Single family home $1,000,000 $250 $1,000,000
Duplex $1,200,000 $300 $600,000
5-unit condo building $2,000,000 $500 $400,000

It is really important to note that even though allowing more homes drove land prices up, households are paying less for land.

OK that’s the theory; what about in practice?

It can be hard to observe this in real life, because dense city centres tend to be pretty expensive. That’s a complicated topic that’s beyond the scope of this blog post, but there are places where it’s easy to see this specific phenomenon in Vancouver with a map of land values. For example:

North West Point Grey

Left: cheap land and expensive homes. Right: expensive land and relatively cheap homes

This is one of the most expensive neighbourhoods in Vancouver, by design. Apartments are forbidden everywhere, only houses are allowed. And city planning rules require each house to use up much more land west of Blanca Street:

Area Minimum Lot Size Land Price/Sq Ft Land Price/Lot
West of Blanca 12,000-18,000 sqft Usually around $300 $7M-$30M
East of Blanca 3000-5400 sqft Usually around $800 $3M-$8M

This is exactly what we were talking about. When the city lets people use less land per home, land prices go up and home prices go down. To be clear, $3M still isn’t cheap; we should go a lot further.

Shaughnessy

It’s a similar story in Shaughnessy, historically Vancouver’s most exclusive neighbourhood:

Top: Fairview/South Granville apartments+condos. Bottom: Shaughnessy mansions

South of 16th we zone for mansions on very large lots (making the land relatively cheap), and north of 16th we allow apartments and condo buildings (making the land relatively expensive). If you know Vancouver at all, you know that those apartments are a lot cheaper than the $10M+ Shaughnessy mansions!

Takeaway

It’s important to distinguish between the cost of land per square foot and the cost of land per home. Limiting density does work to drive the former down, but at a terrible cost: it stops people of modest means from pooling their resources to outbid someone much richer.

I had an odd experience with this website, and I’m finally writing it up. The short version:

  1. In August 2024 I wrote a blog post that documented how a local “independent journalist” had written for white nationalist websites.
  2. In October 2024 he filed a DMCA complaint with my host (Netlify).
    1. Netlify support rubber-stamped the complaint without giving me a reasonable way to appeal.
    2. I moved to CloudFlare and cut the blog post back to a few essential facts+links, to make it easier for the next overworked support person to interpret.
  3. In February 2025 CloudFlare approved another DMCA complaint from someone who’d copied my entire post to a content mill and backdated it!

This post will mostly focus on the 2nd DMCA complaint, as it’s the most interesting one.

My post was copied to… MormonFind.com?

On February 14, while on vacation, I received the following email:

Cloudflare received the below copyright infringement complaint regarding your account. If the content identified in the complaint is not removed within 48 hours, Cloudflare will take steps to disable access to the content, consistent with section 512(c) of the Digital Millennium Copyright Act. Please note that these steps will include disabling access to the reported URL on which the content is located, which will affect any other content located on the same URL.

Complaint Information:

Reporter’s Name: Aaron Bennet

Reporter’s Email Address: <redacted>

Reporter’s Title: Copyright Infringement

Reporter’s Company Name: Bennet Media Association

Reporter’s Address: <redacted>

Reported URL(s): hxxps://www[.]reillywood[.]com/blog/riley-donovan/

Original Work Description: https://mormonfind.com/2024/04/10/riley-donovan-contributes-to-white-supremacist-websites/

To respond to this issue, please reply to [email protected].

Agents all the way down

A pattern for UI in MCP clients

Say you’re working on an agent (a model using tools in a loop). Furthermore, let’s say your agent uses the Model Context Protocol to populate its set of tools dynamically. This results in an interesting UX question: how should you show text tool results to the user of your agent?

You could just show the raw text, but that’s a little unsatisfying when tool results are often JSON, XML, or some other structured data. You could parse the structured data, but that’s tricky too; the set of tools your agent has access to may change, and the tool results you get today could be structured differently tomorrow.

I like another option: pass the tool results to another agent.

The Visualization Agent

Let’s add another agent to our system; we’ll call it the visualization agent. After the main agent executes a tool, it will pass the results to the visualization agent and say “hey, can you visualize this for the user?”

The visualization agent has access to specialized tools like “show table”, “show chart”, “show formatted code”, etc. It handles the work of translating tool results in arbitrary formats into the structures that are useful for opinionated visualization.

And if it can’t figure out a good way to visualize something, well, we can always fall back to text.

Why do it this way?

The big thing is that we can display arbitrary data to the user in a nice way, without assuming much about the tools our agent will have access to. We could also give the main agent visualization tools (tempting! so simple!), but:

  1. That can be very wasteful of the context window
    1. Imagine receiving 10,000 tokens from a tool, then the agent decides to pass those 10,000 tokens by calling a visualization tool - the 10,000 tokens just doubled to 20,000 in our chat history
  2. The more tools an agent has access to, the more likely it is to get confused
  3. A specialized visualization agent can use a faster+cheaper model than our main agent

It’s not all sunshine and roses; calling the visualization agent can be slow, and it adds some complexity. But I like this approach compared to the others I’ve seen, and we’re not far away from fast local models being widely available. If you’ve got another approach, I’d love to hear from you!

headshot

Cities & Code

Top Categories

View all categories