LLM Token Counter & API Cost Calculator
Paste your prompt or completion text to count tokens and estimate API costs across Claude, GPT, and Gemini models.
About This Tool
If you have ever shipped a feature that calls an LLM API and then watched your invoice triple overnight, you already know why token counting matters. A single misplaced system prompt or an untruncated document dump can turn a $50/month prototype into a $500 surprise. This tool lets you catch those problems before they hit production.
Paste any text — a system prompt, a few-shot example block, a full document — and the counter breaks down token counts and estimated costs for multiple models side by side. Supported providers include Claude (Opus, Sonnet, Haiku), GPT-4o, GPT-4o mini, and Gemini 1.5 Pro / Flash. Input and output pricing are shown separately so you can model real request/response pairs accurately.
A quick note on accuracy: this tool uses a character-based heuristic to approximate token counts rather than running each provider's exact tokenizer. For English text the estimate is typically within 5-10% of the actual count. CJK text (Chinese, Japanese, Korean) tends to tokenize less efficiently, so the tool applies a higher token-per-character ratio for those scripts. The numbers are good enough for budgeting and comparison, but if you need exact counts for billing reconciliation, use the official tokenizer libraries like tiktoken for OpenAI or Anthropic's token counting API.
You can pair this with the Word Counter to track both human-readable length and machine cost at the same time. If you are building prompts that include structured data, the JSON Formatter can help you minify payloads before pasting — fewer characters often means fewer tokens. For checking raw byte length of your prompts, the String Length Calculator is useful when you are close to context window limits.
Some practical tips for keeping costs down: enable prompt caching if your provider supports it — Anthropic and OpenAI both offer cached prompt reads at up to 90% off the standard input price, which adds up fast when your system prompt stays the same across requests. For non-latency-sensitive workloads, both providers offer batch APIs at roughly 50% off standard pricing. Beyond that, strip unnecessary whitespace from context documents and pick the cheapest model that meets your quality bar. Haiku and GPT-4o mini handle classification and extraction tasks well at a fraction of the cost of larger models. Keep an eye on each model's context window limit too — Claude supports up to 200K tokens, GPT-4o up to 128K, and Gemini up to 1M — exceeding these limits means you need to truncate or summarize input.
All processing runs entirely in your browser. Your text is never sent to any server — no API calls, no logging, no telemetry. Close the tab and the data is gone.
How to Use
- Paste or type your text into the Input area. This can be a system prompt, user message, document, or any content you plan to send to an LLM.
- The token count and cost estimates update instantly as you type. No button press needed.
- Review the model comparison table to see estimated token counts and per-request costs for each provider.
- Toggle between Input and Output pricing modes to estimate costs for prompts vs. completions (output tokens are more expensive on most models).
- Adjust the token count slider or type a number manually if you want to model a specific response length for the output side.
- Click Copy or press Ctrl+Shift+C to copy the cost breakdown to your clipboard for sharing or documentation.
- Use the Clear button to reset the input and start a new estimate.
FAQ
Is my data safe?
Yes. Token counting is done entirely in your browser using a character-based estimation algorithm. No API calls are made to OpenAI, Anthropic, Google, or any other service. Your text never leaves your machine. You can verify this by opening your browser's DevTools Network tab — you will see zero outbound requests while using the tool.
How accurate is the token count?
The tool uses a character-to-token heuristic rather than running each provider's actual tokenizer (like tiktoken for OpenAI). For standard English prose, estimates are typically within 10-15% of the real count. Code, URLs, or JSON-heavy content may deviate more — up to 20-30% in edge cases — because special characters and punctuation tokenize unpredictably. The estimates are reliable for cost budgeting and model comparison, but not suitable for exact billing reconciliation.
Why do different models show different token counts for the same text?
Each LLM provider uses a different tokenizer with its own vocabulary. GPT-4o uses the o200k_base tokenizer, Claude uses its own proprietary tokenizer, and Gemini uses SentencePiece. A word that is one token in one system might be split into two tokens in another. This tool applies provider-specific ratios to approximate those differences.
How does the tool handle CJK (Chinese, Japanese, Korean) text?
CJK characters are tokenized less efficiently than Latin characters in most LLM tokenizers — a single Chinese character might become 2-3 tokens depending on the model. The tool detects CJK character ranges and applies a higher token-per-character ratio for those segments, giving you a more realistic estimate than a simple word-count division would.
Are the pricing numbers up to date?
Pricing is hardcoded based on the latest published rates at the time the tool was last updated. LLM providers change their pricing periodically — for example, OpenAI has cut GPT-4o pricing multiple times since launch. Check the provider's official pricing page if you need exact current rates. The relative cost comparisons between models remain useful even if absolute numbers shift.
Can I estimate costs for a full conversation with multiple turns?
The tool estimates cost for a single block of text at a time. For multi-turn conversations, remember that each API call sends the entire conversation history as input tokens. Paste your full conversation context (system prompt + all previous turns) to get the cumulative input cost. Then estimate the output cost separately for the expected response length.
What is the difference between input and output token pricing?
Most LLM providers charge different rates for input tokens (your prompt) and output tokens (the model's response). Output tokens are typically 3-5x more expensive than input tokens because generation requires more compute than reading. Check your provider's pricing page for current rates — they change frequently. This tool lets you toggle between input and output pricing to model both sides of a request.
Related Tools
Word & Character Counter
Count words, characters, sentences, paragraphs, and estimate reading time with keyword frequency analysis.
String Length Calculator
Calculate string length in characters, code points, grapheme clusters, and byte sizes for UTF-8, UTF-16, and UTF-32.
JSON Formatter
Format, validate, and beautify JSON with syntax highlighting and tree view.
Markdown Preview
Write and preview Markdown in real time with GFM support, tables, task lists, and HTML export.