Tokenizer Accuracy: OpenAI BPE vs Anthropic Approximation

Q: Tokenizer Accuracy: OpenAI BPE vs Anthropic Approximation

## Two Worlds: Exact and Approximate OpenAI ships its tokenizer as a public BPE (cl100k_base for GPT-4 Turbo / GPT-3.5, o200k_base for GPT-4o and the o-series). The same vocabulary file that powers their internal API is used by [gpt-tokenizer](https://github.com/niieani/gpt-tokenizer) — so when this calculator displays "5,432 tokens" for an OpenAI row, the API will also report 5,432 in usage.prompt_tokens (modulo 3-7 per-message chat formatting tokens). Anthropic, Google, and DeepSeek don't sh

Why our OpenAI counts are exact and our Anthropic / Gemini counts carry an Approx badge — what 5-15% error means for your monthly LLM budget.

Model comparison

Detailed Explanation

Two Worlds: Exact and Approximate

OpenAI ships its tokenizer as a public BPE (cl100k_base for GPT-4 Turbo / GPT-3.5, o200k_base for GPT-4o and the o-series). The same vocabulary file that powers their internal API is used by gpt-tokenizer — so when this calculator displays "5,432 tokens" for an OpenAI row, the API will also report 5,432 in usage.prompt_tokens (modulo 3-7 per-message chat formatting tokens).

Anthropic, Google, and DeepSeek don't ship a JS tokenizer. Anthropic's official /v1/messages/count_tokens endpoint requires sending your prompt over the wire — defeating the privacy model of an offline tool.

The heuristic

We use:

English / Latin: 1 token per ~4 characters
CJK (Chinese / Japanese / Korean): 1 token per ~2 characters

That maps reasonably well to Anthropic and Gemini's reported token counts: empirically the error sits in the 5-15% range for natural-language prompts.

Where the heuristic breaks

Three content types tokenize denser than natural language and will under-count by 15-30%:

Code — punctuation, identifiers, and special characters tokenize per-character or per-pair.
Tables / TSV / CSV — heavy on whitespace and delimiters.
JSON with deep nesting — every key, comma, and brace becomes its own token.

The budgeting rule

Apply this safety margin to the displayed approximate cost:

Workload	Margin
Natural-language chat	+0%
Mixed prose + light code	+10%
Heavy code / JSON / TSV	+25%
RAG with structured docs	+15%

For a $5,000/month Claude bill, +15% margin means budgeting $5,750 — meaningful but not catastrophic. Track the actual Anthropic invoice for the first month and adjust the multiplier from there.

Use Case

Reach for this when defending a budget line item to finance, when reconciling a higher-than-expected Anthropic invoice, or when picking which provider to use for a code-heavy workload.

Try It — Prompt Token Cost Calculator

Open full tool →