Long-Context Costs: What 128K Tokens Actually Cost Per Call
GPT-4o at 128K input is $0.32 per call. Claude Opus 4.7 at 200K is $3.00. Why long-context calls without caching are the single most expensive thing you can do.
Detailed Explanation
A Single 128K-Token Call
Take the maximum context length on each model and assume a 2,000-token completion:
| Model | Context | Input cost | Output cost | Total |
|---|---|---|---|---|
| GPT-4o | 128K | $0.32 | $0.02 | $0.34 |
| GPT-4.1 | 1.05M | $2.10 | $0.016 | $2.12 |
| Claude Opus 4.7 | 200K | $3.00 | $0.15 | $3.15 |
| Claude Sonnet 4.6 | 200K | $0.60 | $0.03 | $0.63 |
| Gemini 2.5 Pro | 2M | $2.50 | $0.02 | $2.52 |
A single Claude Opus call at full context is $3.15 — more than the entire daily LLM bill of many small applications. Make 1,000 of those a day and you're at $94K/month before any output growth.
Why people do it anyway
Long context replaces retrieval engineering. Instead of:
- Chunk your codebase
- Embed 50,000 chunks
- Run a top-k similarity search per query
- Pack the top 10 chunks into the prompt
You can simply:
- Stuff the entire 80K-token codebase into the prompt
- Ask a question
The trade-off is cost vs. complexity. Long context is right when:
- The corpus is small (a single repo, single document, single contract).
- Recall matters more than precision (you need every relevant fact).
- The query is a one-off (no monthly bill multiplier).
The cache-it-or-die rule
If you're going to put 80K+ tokens in a prompt, you must cache it. Without caching, repeated long-context calls scale linearly in cost; with Anthropic caching they scale at 0.1x. The example in Claude Prompt Caching shows the exact numbers.
Context-window pricing tiers
Both Gemini 2.5 Pro and Claude have higher pricing above a threshold (Gemini: 200K tokens triggers a 2x rate; Claude historically had similar tiers). Always check the provider's pricing page when planning very long calls.
Use Case
Use this before deciding to put a full PDF, codebase, or contract into a single prompt. When the answer is yes, follow up with prompt caching to make the bill survivable.
Try It — Prompt Token Cost Calculator
Related Topics
Claude Prompt Caching: 80% Bill Reduction in One Setting
Caching & long context
RAG Pipeline Cost: Embedding + Retrieval + Generation
Workload patterns
Agent Loops: Why a 'Simple' Task Costs 50K Tokens
Caching & long context
Monthly Budget Estimation: Build a 30-Day Forecast in 5 Minutes
Operational
Cost Optimization Strategies: 10 Techniques to Cut Your LLM Bill
Operational