Prompt Token Cost Calculator
Paste a prompt to count tokens with the exact OpenAI tokenizer (cl100k / o200k) and compare per-call cost across GPT, Claude, Gemini, and DeepSeek with cache discounts.
About This Tool
Estimating the dollar cost of an LLM call before you ship a feature is non-trivial: every provider uses a different tokenizer, charges different rates for input vs. output, and offers prompt-caching discounts that can change a $400/month bill into a $40/month bill. This calculator collapses all of that into one screen.
For OpenAI models — GPT-4o, GPT-4o mini, GPT-4 Turbo, GPT-4.1, o1, o1-mini, o3, o3-mini, and o4-mini — token counts are exact. We bundle the official cl100k_base and o200k_base BPE encoders via the gpt-tokenizer library and run them in your browser. The same number you see here is what OpenAI's API will report in usage.prompt_tokens.
For Anthropic Claude (Opus 4.7, Sonnet 4.6, Haiku 4.5), Google Gemini (2.5 Pro, 2.5 Flash), and DeepSeek-V3 we fall back to a character-based heuristic — about 4 characters per token for English and 2 characters per token for CJK. These rows are flagged with an "Approx" badge so you never confuse the two. The estimate is typically within 5–15% of the real value for natural language; for code and tables the variance widens, so treat any approximate row as a budgeting figure rather than a billing figure.
The calculator separates input (your prompt) from output (model completion). Output cost dominates for short questions with long answers — exactly the case you hit when generating documentation, translations, or long-form content. Toggle the prompt caching option to see Anthropic's 1.25x write / 0.1x read pricing applied; for Anthropic, caching a 50K-token system prompt across 100 requests can drop the bill by 80% or more. The batch count field multiplies everything for monthly forecasts. For deeper analysis, pair this calculator with our LLM Token Counter for a single-model view, the JSON Formatter for inspecting raw API responses, and the Regex Tester when you need to extract usage blocks from log lines.
All processing — tokenization, cost math, and clipboard copy — runs entirely in your browser. Your prompt text is never uploaded, logged, or analyzed; the page works offline once loaded. There is no telemetry tied to input content, so it is safe to paste internal system prompts, customer data, or unreleased product copy.
How to Use
- Paste your prompt (or one of the sample chips) into the Prompt / Input text area. Token counts for both OpenAI encoders update within ~250ms.
- Set the Expected output tokens field to the typical completion length you expect (200 for short answers, 500 for paragraphs, 2000 for long documents).
- Adjust the Batch / call count field if you want a monthly forecast — e.g. 10,000 daily calls × 30 days = 300,000.
- Toggle entire providers on/off using the Providers chips, or click the ✕ next to a single model to remove just that row.
- Enable Prompt caching discount if you plan to use Anthropic / OpenAI prompt caching, then drag the read/write share sliders to reflect the typical hit rate.
- Read the Cheapest indicator above the table to see which model wins at the current configuration. Sort visually by clicking column headers.
- Press Copy summary (or
Ctrl+Shift+C) to copy a plain-text breakdown into your clipboard for sharing in a budget doc.
Popular Examples
- GPT-4o vs Claude Opus 4.7: Per-Token Cost Comparison
- Tokenizer Accuracy: OpenAI BPE vs Anthropic Approximation
- Embedding Costs: text-embedding-3-small vs Cohere vs Voyage
- Claude Prompt Caching: 80% Bill Reduction in One Setting
- Long-Context Costs: What 128K Tokens Actually Cost Per Call
- Agent Loops: Why a 'Simple' Task Costs 50K Tokens
- RAG Pipeline Cost: Embedding + Retrieval + Generation
- Translation Task Cost: GPT-4o vs DeepL vs Google Translate
FAQ
What is a prompt token?
A token is a sub-word unit produced by the model's tokenizer. For OpenAI's o200k encoding (GPT-4o family) one token is roughly 3-4 English characters or about 0.75 of a word. Whitespace, punctuation, code, and CJK characters all tokenize differently — that's why two prompts of the same character count can have very different token counts. Pricing is always quoted per 1,000,000 tokens.
Why are Claude, Gemini, and DeepSeek estimates approximate?
Anthropic, Google, and DeepSeek do not publish a JavaScript-runnable tokenizer. Anthropic's official advice is to use their server-side count_tokens endpoint, which would require sending your prompt to their server — defeating the purpose of an offline tool. We instead use a character heuristic (~4 chars/token English, ~2 chars/token CJK) which is accurate to within roughly 5-15% for natural-language prompts. Code, tables, and JSON tend to tokenize denser than natural language, so for those workloads add a 10-20% safety margin on top of the displayed cost.
What is prompt caching and how is the discount calculated?
Prompt caching lets you reuse a long system prompt or document context across many calls. Anthropic charges 1.25x the input price the first time the cache block is written (5-minute TTL) and only 0.1x on subsequent reads — so a 50K-token system prompt reused 100 times costs roughly the equivalent of 26K tokens billed at full price, instead of 5,000K tokens. OpenAI's prompt caching is automatic and bills cache reads at 0.5x the input price (no write premium). Toggle the cache option, set the cache-read share slider to your expected hit rate, and the table updates instantly.
When should I use GPT-4o vs Claude Opus vs Gemini 2.5 Pro?
GPT-4o ($2.50 / $10 per 1M tokens) is the cheapest of the three frontier models and ships with the lowest latency, making it ideal for chat UIs and low-stakes generation. Claude Opus 4.7 ($15 / $75) is the most expensive but tends to win on long-form reasoning, careful writing, and tool-use accuracy — pair it with prompt caching to keep the bill manageable. Gemini 2.5 Pro ($1.25 / $10) is the cheapest input-side option of the three, has a 2M-token context window, and excels at multimodal (vision, video) tasks. For high-volume background jobs (extraction, classification) consider GPT-4o mini, Claude Haiku 4.5, Gemini 2.5 Flash, or DeepSeek-V3 — each under $1/1M input tokens.
How accurate is gpt-tokenizer for OpenAI models?
Exact. The library ships the same BPE rank tables that OpenAI uses internally and the API. The o200k_base encoder is used for GPT-4o, GPT-4.1, and the entire o-series (o1, o3, o4-mini); cl100k_base is used for GPT-4 Turbo and GPT-3.5. The number this calculator reports for an OpenAI row will match the usage.prompt_tokens field returned by the OpenAI API to the digit, modulo special tokens added by chat formatting (typically 3-7 extra tokens per message). For raw text completions the count is exact.
Can I add custom models or override prices?
The price table is bundled in src/data/llm-model-pricing.ts and is purely data — no UI is needed to change it. If you fork this repo or run it locally, edit that file to add a row with your provider's pricing, set tokenizer to 'approx' (or 'cl100k_base' / 'o200k_base' if your model uses an OpenAI tokenizer), and rebuild. The lastUpdated field is there to make pricing audits easy.
Is my data safe?
Yes — completely. Tokenization runs in your browser via the gpt-tokenizer JavaScript library; no prompt text, no token counts, and no cost numbers are ever transmitted to a server. We do not log prompts, do not run analytics on input text, and do not use a remote tokenizer endpoint (which would defeat the purpose). The page works offline after the first load, so you can paste internal system prompts, customer data, or unreleased product copy without leaving any trace.
Related Tools
LLM Token Counter
Count tokens and estimate API costs for Claude, GPT-4o, Gemini, and other LLM models. Compare pricing across providers.
JSON Formatter
Format, validate, and beautify JSON with syntax highlighting and tree view.
Regex Tester
Test regular expressions with real-time match highlighting and capture groups.
Code Minifier
Minify and beautify JavaScript, CSS, and HTML code with size comparison stats.
MCP Server Config Generator
Generate MCP (Model Context Protocol) config JSON for Claude Desktop, Cursor, Cline, and Windsurf with five built-in server presets.
Word & Character Counter
Count words, characters, sentences, paragraphs, and estimate reading time with keyword frequency analysis.