Fine-Tuning Cost: Training, Hosting, and Per-Token Inference
OpenAI fine-tuning costs $25/1M training tokens for GPT-4o mini, then 2x base price at inference. When does fine-tuning beat prompt engineering?
Detailed Explanation
Three Cost Components
1. Training cost
OpenAI training prices (per 1M tokens of training data):
| Model | Training $/1M | Hosting $/day |
|---|---|---|
| GPT-4o mini | $25 | $0 (no extra) |
| GPT-4o (gen mini) | $25 | $0 |
| GPT-3.5 Turbo | $8 | $0 |
A typical fine-tuning dataset is 1,000-10,000 examples × ~500 tokens each = 0.5M-5M tokens. Cost: $12.50-$125 for one training run. You typically run 3-5 training runs to find the right hyperparameters → $50-$625 total training cost.
2. Inference cost markup
Fine-tuned models are billed at 2x the base model rate at inference time:
- GPT-4o mini base: $0.15 input / $0.60 output per 1M
- GPT-4o mini fine-tuned: $0.30 input / $1.20 output per 1M
For 10M tokens/month of inference:
- Base GPT-4o mini: ~$3.75
- Fine-tuned: ~$7.50 (+$3.75/month forever)
3. Break-even math vs prompt engineering
Fine-tuning saves you tokens only if you can shrink the prompt significantly afterward. The classic case:
- Before: 2,000-token system prompt with 20 few-shot examples for classification.
- After: 100-token system prompt; the model has internalized the task pattern.
Token savings per call: 1,900 tokens × $0.15/1M = $0.000285 Inference markup per call: 100 tokens × ($0.30 - $0.15)/1M = $0.000015
Net savings: $0.000270 per call. Break-even at ~230,000 calls to recover a $62 training run.
When fine-tuning makes sense
- High volume + simple task: classification, intent detection, structured-data extraction at >1M calls/month.
- Format consistency: when the base model occasionally drifts from a strict output schema and you need 99.9%+ adherence.
- Latency: fine-tuned models with shorter prompts have lower TTFT (time to first token).
When prompt engineering wins
- <100K calls/month: training cost recovery takes too long.
- Frequently changing requirements: every spec change requires retraining.
- Tasks that benefit from chain-of-thought: fine-tuning suppresses verbose reasoning, hurting complex-task quality.
Anthropic / Gemini
Anthropic does not offer public fine-tuning for Claude models as of 2026. Gemini offers fine-tuning for Gemini 2.5 Flash with similar pricing structure to OpenAI.
Use Case
Apply when evaluating fine-tuning vs. prompt engineering for a production workload, when defending the cost of a fine-tuning project, or when planning the volume threshold at which fine-tuning becomes cost-effective.
Try It — Prompt Token Cost Calculator
Related Topics
Embedding Costs: text-embedding-3-small vs Cohere vs Voyage
Model comparison
Monthly Budget Estimation: Build a 30-Day Forecast in 5 Minutes
Operational
Cost Optimization Strategies: 10 Techniques to Cut Your LLM Bill
Operational
Batch Processing: 50% Off via OpenAI / Anthropic Batch APIs
Operational
RAG Pipeline Cost: Embedding + Retrieval + Generation
Workload patterns