Tokenizer Accuracy: OpenAI BPE vs Anthropic Approximation
Why our OpenAI counts are exact and our Anthropic / Gemini counts carry an Approx badge — what 5-15% error means for your monthly LLM budget.
Detailed Explanation
Two Worlds: Exact and Approximate
OpenAI ships its tokenizer as a public BPE (cl100k_base for GPT-4 Turbo / GPT-3.5, o200k_base for GPT-4o and the o-series). The same vocabulary file that powers their internal API is used by gpt-tokenizer — so when this calculator displays "5,432 tokens" for an OpenAI row, the API will also report 5,432 in usage.prompt_tokens (modulo 3-7 per-message chat formatting tokens).
Anthropic, Google, and DeepSeek don't ship a JS tokenizer. Anthropic's official /v1/messages/count_tokens endpoint requires sending your prompt over the wire — defeating the privacy model of an offline tool.
The heuristic
We use:
- English / Latin: 1 token per ~4 characters
- CJK (Chinese / Japanese / Korean): 1 token per ~2 characters
That maps reasonably well to Anthropic and Gemini's reported token counts: empirically the error sits in the 5-15% range for natural-language prompts.
Where the heuristic breaks
Three content types tokenize denser than natural language and will under-count by 15-30%:
- Code — punctuation, identifiers, and special characters tokenize per-character or per-pair.
- Tables / TSV / CSV — heavy on whitespace and delimiters.
- JSON with deep nesting — every key, comma, and brace becomes its own token.
The budgeting rule
Apply this safety margin to the displayed approximate cost:
| Workload | Margin |
|---|---|
| Natural-language chat | +0% |
| Mixed prose + light code | +10% |
| Heavy code / JSON / TSV | +25% |
| RAG with structured docs | +15% |
For a $5,000/month Claude bill, +15% margin means budgeting $5,750 — meaningful but not catastrophic. Track the actual Anthropic invoice for the first month and adjust the multiplier from there.
Use Case
Reach for this when defending a budget line item to finance, when reconciling a higher-than-expected Anthropic invoice, or when picking which provider to use for a code-heavy workload.
Try It — Prompt Token Cost Calculator
Related Topics
GPT-4o vs Claude Opus 4.7: Per-Token Cost Comparison
Model comparison
Claude Prompt Caching: 80% Bill Reduction in One Setting
Caching & long context
Monthly Budget Estimation: Build a 30-Day Forecast in 5 Minutes
Operational
Code Generation Cost: Per-Function, Per-File, Per-PR
Workload patterns
RAG Pipeline Cost: Embedding + Retrieval + Generation
Workload patterns