Long-Context Costs: What 128K Tokens Actually Cost Per Call

GPT-4o at 128K input is $0.32 per call. Claude Opus 4.7 at 200K is $3.00. Why long-context calls without caching are the single most expensive thing you can do.

Caching & long context

Detailed Explanation

A Single 128K-Token Call

Take the maximum context length on each model and assume a 2,000-token completion:

Model Context Input cost Output cost Total
GPT-4o 128K $0.32 $0.02 $0.34
GPT-4.1 1.05M $2.10 $0.016 $2.12
Claude Opus 4.7 200K $3.00 $0.15 $3.15
Claude Sonnet 4.6 200K $0.60 $0.03 $0.63
Gemini 2.5 Pro 2M $2.50 $0.02 $2.52

A single Claude Opus call at full context is $3.15 — more than the entire daily LLM bill of many small applications. Make 1,000 of those a day and you're at $94K/month before any output growth.

Why people do it anyway

Long context replaces retrieval engineering. Instead of:

  1. Chunk your codebase
  2. Embed 50,000 chunks
  3. Run a top-k similarity search per query
  4. Pack the top 10 chunks into the prompt

You can simply:

  1. Stuff the entire 80K-token codebase into the prompt
  2. Ask a question

The trade-off is cost vs. complexity. Long context is right when:

  • The corpus is small (a single repo, single document, single contract).
  • Recall matters more than precision (you need every relevant fact).
  • The query is a one-off (no monthly bill multiplier).

The cache-it-or-die rule

If you're going to put 80K+ tokens in a prompt, you must cache it. Without caching, repeated long-context calls scale linearly in cost; with Anthropic caching they scale at 0.1x. The example in Claude Prompt Caching shows the exact numbers.

Context-window pricing tiers

Both Gemini 2.5 Pro and Claude have higher pricing above a threshold (Gemini: 200K tokens triggers a 2x rate; Claude historically had similar tiers). Always check the provider's pricing page when planning very long calls.

Use Case

Use this before deciding to put a full PDF, codebase, or contract into a single prompt. When the answer is yes, follow up with prompt caching to make the bill survivable.

Try ItPrompt Token Cost Calculator

Open full tool