Code Generation Cost: Per-Function, Per-File, Per-PR

Generating one function costs ~$0.005. Generating an entire file costs ~$0.05. Generating a multi-file PR costs $0.50-$5. The cost ladder for AI-assisted coding.

Workload patterns

Detailed Explanation

The Three Tiers of Code Generation Cost

Per-function generation (~50 lines of code)

Typical context: 200-line file as context, generate one function (50 lines).

  • Input: ~3,000 tokens (file + system prompt) on GPT-4o
  • Output: ~500 tokens (the function)
  • Cost: 3K × $2.50/1M + 500 × $10/1M = $0.0125

At 100 functions per day per developer, that's $1.25/day — well under the cost of a coffee. For a 50-engineer team, $62.50/day or $1,875/month.

Per-file generation (~300 lines)

Generating a complete file from a description, with related files as context:

  • Input: ~10,000 tokens (3-5 related files + spec) on Claude Opus 4.7
  • Output: ~3,000 tokens (the new file)
  • Cost: 10K × $15/1M + 3K × $75/1M = $0.375

Three-file refactor: ~$1.50. For a complex feature involving 10 files: ~$5.

Per-PR generation (multi-file, with tests)

This is where serious agent products live. A PR typically involves:

  • Reading 10-30 files for context: ~50,000 tokens of input
  • Multiple LLM turns to plan, edit, test, and refine: ~5-15 turns
  • Generated code + tests + commit message: ~10,000 tokens output

Without caching: 50K × 10 turns × $15/1M + 10K × $75/1M = $8.25 per PR (Claude Opus). With caching (assume 80% cache hit on the file context): drops to **$2.00** per PR.

Why Claude Opus dominates here

Coding tasks are the workload where Opus's quality premium most reliably justifies the price gap vs GPT-4o. Reduced regressions, fewer compile errors, and better adherence to codebase conventions translate to less human cleanup.

For high-volume background tasks (auto-formatting, lint fixes, simple refactors), drop to GPT-4o mini or Claude Haiku 4.5 and the per-PR cost drops to under $0.10.

Caching is non-negotiable

Code agents read the same files multiple times during a task. Without prompt caching, the bill on a 10-turn refactor is genuinely painful. Anthropic caching at 0.1x reads is the difference between $8 and $2 per PR.

Use Case

Apply when sizing the budget for an internal coding-agent rollout, when deciding which model to use for which class of code task, or when justifying the cost of a developer-AI subscription.

Try ItPrompt Token Cost Calculator

Open full tool