Agent Loops: Why a 'Simple' Task Costs 50K Tokens
An agent that reads a file, edits it, and runs tests typically uses 5-20 LLM turns. Each turn re-sends the growing transcript. The token bill grows quadratically.
Detailed Explanation
The Quadratic Trap
Naïve agents send the entire conversation history on each turn. If the system prompt is 5,000 tokens and each turn adds 2,000 tokens of tool output:
| Turn | Cumulative input tokens | Per-turn cost (GPT-4o) |
|---|---|---|
| 1 | 5,000 | $0.0125 |
| 2 | 7,000 | $0.0175 |
| 5 | 13,000 | $0.0325 |
| 10 | 23,000 | $0.0575 |
| 20 | 43,000 | $0.108 |
A 20-turn agent loop costs ~$0.85 in input tokens alone (sum of all turns). Add output tokens (~$0.20) and you're at $1.05 per task. At 10,000 tasks per day that is $10,500/day — and that's GPT-4o, the cheapest of the frontier models.
Why naïve agents are quadratic
Each turn appends new content to the conversation history. The LLM has no memory between calls, so the entire history must be re-sent. For N turns each adding k tokens, total tokens sent = N(N+1)·k/2 = O(N²k).
Three mitigations
Aggressive caching. With Anthropic caching the first 50K tokens (system prompt + few-shot examples + tool definitions), the per-turn read cost drops 10x. Cache hit rate in agent loops is typically 70-90%.
Conversation summarization. Every 10 turns, replace the transcript with a 500-token summary. Cuts token count by 90% at the cost of fine-grained recall.
Sub-agent architecture. Spawn a fresh agent for each sub-task with only the relevant slice of context. The orchestrator agent's history stays small; the worker agents are short-lived.
Tool call overhead
Each tool call adds:
- Tool definitions in the system prompt (~50-200 tokens per tool).
- Function call JSON in the assistant message (~50-100 tokens).
- Tool result in the next user message (varies; can be huge for file reads).
If you have 20 tools defined and only use 3 per task, consider lazy tool loading (load tool definitions on demand based on task type).
Use Case
Apply this analysis when designing an agent product, debugging a higher-than-expected bill on a working agent, or deciding when to introduce summarization or sub-agents.
Try It — Prompt Token Cost Calculator
Related Topics
Claude Prompt Caching: 80% Bill Reduction in One Setting
Caching & long context
Long-Context Costs: What 128K Tokens Actually Cost Per Call
Caching & long context
RAG Pipeline Cost: Embedding + Retrieval + Generation
Workload patterns
Code Generation Cost: Per-Function, Per-File, Per-PR
Workload patterns
Cost Optimization Strategies: 10 Techniques to Cut Your LLM Bill
Operational