Summarization Cost: Map-Reduce vs Single-Call vs Streaming
A 100,000-token document summarized to 1,000 tokens costs $0.26 single-call on GPT-4o, or $0.05 with map-reduce on GPT-4o mini. The right architecture matters.
Detailed Explanation
Three Architectures
Summarizing a 100K-token document to a 1K-token summary on GPT-4o:
1. Single-call (entire doc → summary)
- Input: 100K × $2.50/1M = $0.250
- Output: 1K × $10/1M = $0.010
- Total: $0.260
Constraint: the doc must fit in the model's context window. GPT-4o is fine at 100K (128K limit). Claude is fine at 100K (200K limit). For larger docs, you must chunk.
2. Map-reduce (chunk → per-chunk summary → final summary)
Split into 20 chunks of 5K tokens each, summarize each to 200 tokens, then summarize the 20 summaries:
- Map step: 20 × (5K input + 200 output) on GPT-4o mini:
- Input: 100K × $0.15/1M = $0.015
- Output: 4K × $0.60/1M = $0.0024
- Reduce step: 1 × (4K input + 1K output) on GPT-4o:
- Input: 4K × $2.50/1M = $0.010
- Output: 1K × $10/1M = $0.010
Total: $0.037 — ~7x cheaper than single-call.
3. Streaming with rolling window
Process the document sequentially, maintaining a 2,000-token rolling summary that gets updated with each new 5,000-token chunk:
- 20 chunks × (2K rolling + 5K new + 2K updated rolling) on GPT-4o mini:
- Input: 20 × 7K × $0.15/1M = $0.021
- Output: 20 × 2K × $0.60/1M = $0.024
- Total: $0.045
Quality trade-offs
- Single-call: best quality, captures cross-document patterns the model can see at once.
- Map-reduce: ~85% of single-call quality, parallelizable (do all 20 map calls concurrently).
- Streaming: ~75% of single-call quality, useful when documents arrive incrementally.
Decision matrix
| Document size | Best architecture | Notes |
|---|---|---|
| < 50K tokens | Single-call | Cost is a rounding error |
| 50K-500K | Map-reduce | Best cost/quality ratio |
| > 500K | Streaming | Avoid context limits |
| Live streams | Streaming | Process as it arrives |
Use Case
Use when designing a summarization feature: document analysis, meeting transcript summary, multi-document research synthesis, news aggregation, customer feedback distillation.
Try It — Prompt Token Cost Calculator
Related Topics
Long-Context Costs: What 128K Tokens Actually Cost Per Call
Caching & long context
RAG Pipeline Cost: Embedding + Retrieval + Generation
Workload patterns
Translation Task Cost: GPT-4o vs DeepL vs Google Translate
Workload patterns
Batch Processing: 50% Off via OpenAI / Anthropic Batch APIs
Operational
Cost Optimization Strategies: 10 Techniques to Cut Your LLM Bill
Operational