Fine-Tuning Cost: Training, Hosting, and Per-Token Inference

OpenAI fine-tuning costs $25/1M training tokens for GPT-4o mini, then 2x base price at inference. When does fine-tuning beat prompt engineering?

Operational

Detailed Explanation

Three Cost Components

1. Training cost

OpenAI training prices (per 1M tokens of training data):

Model Training $/1M Hosting $/day
GPT-4o mini $25 $0 (no extra)
GPT-4o (gen mini) $25 $0
GPT-3.5 Turbo $8 $0

A typical fine-tuning dataset is 1,000-10,000 examples × ~500 tokens each = 0.5M-5M tokens. Cost: $12.50-$125 for one training run. You typically run 3-5 training runs to find the right hyperparameters → $50-$625 total training cost.

2. Inference cost markup

Fine-tuned models are billed at 2x the base model rate at inference time:

  • GPT-4o mini base: $0.15 input / $0.60 output per 1M
  • GPT-4o mini fine-tuned: $0.30 input / $1.20 output per 1M

For 10M tokens/month of inference:

  • Base GPT-4o mini: ~$3.75
  • Fine-tuned: ~$7.50 (+$3.75/month forever)

3. Break-even math vs prompt engineering

Fine-tuning saves you tokens only if you can shrink the prompt significantly afterward. The classic case:

  • Before: 2,000-token system prompt with 20 few-shot examples for classification.
  • After: 100-token system prompt; the model has internalized the task pattern.

Token savings per call: 1,900 tokens × $0.15/1M = $0.000285 Inference markup per call: 100 tokens × ($0.30 - $0.15)/1M = $0.000015

Net savings: $0.000270 per call. Break-even at ~230,000 calls to recover a $62 training run.

When fine-tuning makes sense

  • High volume + simple task: classification, intent detection, structured-data extraction at >1M calls/month.
  • Format consistency: when the base model occasionally drifts from a strict output schema and you need 99.9%+ adherence.
  • Latency: fine-tuned models with shorter prompts have lower TTFT (time to first token).

When prompt engineering wins

  • <100K calls/month: training cost recovery takes too long.
  • Frequently changing requirements: every spec change requires retraining.
  • Tasks that benefit from chain-of-thought: fine-tuning suppresses verbose reasoning, hurting complex-task quality.

Anthropic / Gemini

Anthropic does not offer public fine-tuning for Claude models as of 2026. Gemini offers fine-tuning for Gemini 2.5 Flash with similar pricing structure to OpenAI.

Use Case

Apply when evaluating fine-tuning vs. prompt engineering for a production workload, when defending the cost of a fine-tuning project, or when planning the volume threshold at which fine-tuning becomes cost-effective.

Try ItPrompt Token Cost Calculator

Open full tool