Agent Loops: Why a 'Simple' Task Costs 50K Tokens

Q: Agent Loops: Why a 'Simple' Task Costs 50K Tokens

## The Quadratic Trap Naïve agents send the entire conversation history on each turn. If the system prompt is 5,000 tokens and each turn adds 2,000 tokens of tool output: | Turn | Cumulative input tokens | Per-turn cost (GPT-4o) | | ---- | ----------------------- | ---------------------- | | 1 | 5,000 | $0.0125 | | 2 | 7,000 | $0.0175 | | 5 | 13,000 | $0.0325 | | 10 | 23,000

An agent that reads a file, edits it, and runs tests typically uses 5-20 LLM turns. Each turn re-sends the growing transcript. The token bill grows quadratically.

Caching & long context

Detailed Explanation

The Quadratic Trap

Naïve agents send the entire conversation history on each turn. If the system prompt is 5,000 tokens and each turn adds 2,000 tokens of tool output:

Turn	Cumulative input tokens	Per-turn cost (GPT-4o)
1	5,000	$0.0125
2	7,000	$0.0175
5	13,000	$0.0325
10	23,000	$0.0575
20	43,000	$0.108

A 20-turn agent loop costs ~$0.85 in input tokens alone (sum of all turns). Add output tokens (~$0.20) and you're at $1.05 per task. At 10,000 tasks per day that is $10,500/day — and that's GPT-4o, the cheapest of the frontier models.

Why naïve agents are quadratic

Each turn appends new content to the conversation history. The LLM has no memory between calls, so the entire history must be re-sent. For N turns each adding k tokens, total tokens sent = N(N+1)·k/2 = O(N²k).

Three mitigations

Aggressive caching. With Anthropic caching the first 50K tokens (system prompt + few-shot examples + tool definitions), the per-turn read cost drops 10x. Cache hit rate in agent loops is typically 70-90%.
Conversation summarization. Every 10 turns, replace the transcript with a 500-token summary. Cuts token count by 90% at the cost of fine-grained recall.
Sub-agent architecture. Spawn a fresh agent for each sub-task with only the relevant slice of context. The orchestrator agent's history stays small; the worker agents are short-lived.

Tool call overhead

Each tool call adds:

Tool definitions in the system prompt (~50-200 tokens per tool).
Function call JSON in the assistant message (~50-100 tokens).
Tool result in the next user message (varies; can be huge for file reads).

If you have 20 tools defined and only use 3 per task, consider lazy tool loading (load tool definitions on demand based on task type).

Use Case

Apply this analysis when designing an agent product, debugging a higher-than-expected bill on a working agent, or deciding when to introduce summarization or sub-agents.

Try It — Prompt Token Cost Calculator

Open full tool →