Fine-Tuning Cost: Training, Hosting, and Per-Token Inference

Q: Fine-Tuning Cost: Training, Hosting, and Per-Token Inference

## Three Cost Components ### 1. Training cost OpenAI training prices (per 1M tokens of training data): | Model | Training $/1M | Hosting $/day | | ---------------- | ------------- | -------------- | | GPT-4o mini | $25 | $0 (no extra) | | GPT-4o (gen mini)| $25 | $0 | | GPT-3.5 Turbo | $8 | $0 | A typical fine-tuning dataset is 1,000-10,000 examples × ~500 tokens each = 0.5M-5M tokens. Cost: $12.50-$125 for one train

Q: When is this useful?

Apply when evaluating fine-tuning vs. prompt engineering for a production workload, when defending the cost of a fine-tuning project, or when planning the volume threshold at which fine-tuning becomes cost-effective.

OpenAI fine-tuning costs $25/1M training tokens for GPT-4o mini, then 2x base price at inference. When does fine-tuning beat prompt engineering?

Operational

Detailed Explanation

Three Cost Components

1. Training cost

OpenAI training prices (per 1M tokens of training data):

Model	Training $/1M	Hosting $/day
GPT-4o mini	$25	$0 (no extra)
GPT-4o (gen mini)	$25	$0
GPT-3.5 Turbo	$8	$0

A typical fine-tuning dataset is 1,000-10,000 examples × ~500 tokens each = 0.5M-5M tokens. Cost: $12.50-$125 for one training run. You typically run 3-5 training runs to find the right hyperparameters → $50-$625 total training cost.

2. Inference cost markup

Fine-tuned models are billed at 2x the base model rate at inference time:

GPT-4o mini base: $0.15 input / $0.60 output per 1M
GPT-4o mini fine-tuned: $0.30 input / $1.20 output per 1M

For 10M tokens/month of inference:

Base GPT-4o mini: ~$3.75
Fine-tuned: ~$7.50 (+$3.75/month forever)

3. Break-even math vs prompt engineering

Fine-tuning saves you tokens only if you can shrink the prompt significantly afterward. The classic case:

Before: 2,000-token system prompt with 20 few-shot examples for classification.
After: 100-token system prompt; the model has internalized the task pattern.

Token savings per call: 1,900 tokens × $0.15/1M = $0.000285 Inference markup per call: 100 tokens × ($0.30 - $0.15)/1M = $0.000015

Net savings: $0.000270 per call. Break-even at ~230,000 calls to recover a $62 training run.

When fine-tuning makes sense

High volume + simple task: classification, intent detection, structured-data extraction at >1M calls/month.
Format consistency: when the base model occasionally drifts from a strict output schema and you need 99.9%+ adherence.
Latency: fine-tuned models with shorter prompts have lower TTFT (time to first token).

When prompt engineering wins

<100K calls/month: training cost recovery takes too long.
Frequently changing requirements: every spec change requires retraining.
Tasks that benefit from chain-of-thought: fine-tuning suppresses verbose reasoning, hurting complex-task quality.

Anthropic / Gemini

Anthropic does not offer public fine-tuning for Claude models as of 2026. Gemini offers fine-tuning for Gemini 2.5 Flash with similar pricing structure to OpenAI.

Use Case