Long-Context Costs: What 128K Tokens Actually Cost Per Call

Q: Long-Context Costs: What 128K Tokens Actually Cost Per Call

## A Single 128K-Token Call Take the maximum context length on each model and assume a 2,000-token completion: | Model | Context | Input cost | Output cost | Total | | ---------------- | ------- | ---------- | ----------- | ----------- | | GPT-4o | 128K | $0.32 | $0.02 | $0.34 | | GPT-4.1 | 1.05M | $2.10 | $0.016 | $2.12 | | Claude Opus 4.7 | 200K | $3.00 | $0.15 | $3.15 | | Claude Sonnet 4.6| 200K | $0.60

GPT-4o at 128K input is $0.32 per call. Claude Opus 4.7 at 200K is $3.00. Why long-context calls without caching are the single most expensive thing you can do.

Caching & long context

Detailed Explanation

A Single 128K-Token Call

Take the maximum context length on each model and assume a 2,000-token completion:

Model	Context	Input cost	Output cost	Total
GPT-4o	128K	$0.32	$0.02	$0.34
GPT-4.1	1.05M	$2.10	$0.016	$2.12
Claude Opus 4.7	200K	$3.00	$0.15	$3.15
Claude Sonnet 4.6	200K	$0.60	$0.03	$0.63
Gemini 2.5 Pro	2M	$2.50	$0.02	$2.52

A single Claude Opus call at full context is $3.15 — more than the entire daily LLM bill of many small applications. Make 1,000 of those a day and you're at $94K/month before any output growth.

Why people do it anyway

Long context replaces retrieval engineering. Instead of:

Chunk your codebase
Embed 50,000 chunks
Run a top-k similarity search per query
Pack the top 10 chunks into the prompt

You can simply:

Stuff the entire 80K-token codebase into the prompt
Ask a question

The trade-off is cost vs. complexity. Long context is right when:

The corpus is small (a single repo, single document, single contract).
Recall matters more than precision (you need every relevant fact).
The query is a one-off (no monthly bill multiplier).

The cache-it-or-die rule

If you're going to put 80K+ tokens in a prompt, you must cache it. Without caching, repeated long-context calls scale linearly in cost; with Anthropic caching they scale at 0.1x. The example in Claude Prompt Caching shows the exact numbers.

Context-window pricing tiers

Both Gemini 2.5 Pro and Claude have higher pricing above a threshold (Gemini: 200K tokens triggers a 2x rate; Claude historically had similar tiers). Always check the provider's pricing page when planning very long calls.

Use Case

Use this before deciding to put a full PDF, codebase, or contract into a single prompt. When the answer is yes, follow up with prompt caching to make the bill survivable.

Try It — Prompt Token Cost Calculator

Open full tool →