Summarization Cost: Map-Reduce vs Single-Call vs Streaming

A 100,000-token document summarized to 1,000 tokens costs $0.26 single-call on GPT-4o, or $0.05 with map-reduce on GPT-4o mini. The right architecture matters.

Workload patterns

Detailed Explanation

Three Architectures

Summarizing a 100K-token document to a 1K-token summary on GPT-4o:

1. Single-call (entire doc → summary)

  • Input: 100K × $2.50/1M = $0.250
  • Output: 1K × $10/1M = $0.010
  • Total: $0.260

Constraint: the doc must fit in the model's context window. GPT-4o is fine at 100K (128K limit). Claude is fine at 100K (200K limit). For larger docs, you must chunk.

2. Map-reduce (chunk → per-chunk summary → final summary)

Split into 20 chunks of 5K tokens each, summarize each to 200 tokens, then summarize the 20 summaries:

  • Map step: 20 × (5K input + 200 output) on GPT-4o mini:
    • Input: 100K × $0.15/1M = $0.015
    • Output: 4K × $0.60/1M = $0.0024
  • Reduce step: 1 × (4K input + 1K output) on GPT-4o:
    • Input: 4K × $2.50/1M = $0.010
    • Output: 1K × $10/1M = $0.010

Total: $0.037~7x cheaper than single-call.

3. Streaming with rolling window

Process the document sequentially, maintaining a 2,000-token rolling summary that gets updated with each new 5,000-token chunk:

  • 20 chunks × (2K rolling + 5K new + 2K updated rolling) on GPT-4o mini:
    • Input: 20 × 7K × $0.15/1M = $0.021
    • Output: 20 × 2K × $0.60/1M = $0.024
  • Total: $0.045

Quality trade-offs

  • Single-call: best quality, captures cross-document patterns the model can see at once.
  • Map-reduce: ~85% of single-call quality, parallelizable (do all 20 map calls concurrently).
  • Streaming: ~75% of single-call quality, useful when documents arrive incrementally.

Decision matrix

Document size Best architecture Notes
< 50K tokens Single-call Cost is a rounding error
50K-500K Map-reduce Best cost/quality ratio
> 500K Streaming Avoid context limits
Live streams Streaming Process as it arrives

Use Case

Use when designing a summarization feature: document analysis, meeting transcript summary, multi-document research synthesis, news aggregation, customer feedback distillation.

Try ItPrompt Token Cost Calculator

Open full tool