String Length Calculator
Calculate string length in multiple encodings: character count, code points, grapheme clusters, UTF-8/UTF-16/UTF-32 byte sizes.
About This Tool
The String Length Calculator is a free browser-based tool that
measures text length across multiple dimensions simultaneously. Unlike
a simple .length property call, this tool shows you the full picture:
JavaScript character count (UTF-16 code units), Unicode code point count,
grapheme cluster count (what humans perceive as "characters"), and byte
sizes in UTF-8, UTF-16, and UTF-32 encodings.
Understanding string length is critical when working with databases,
APIs, and platforms that impose character or byte limits. A single
emoji like 👨👩👧👦 (family emoji) can be 1 grapheme cluster but
7 code points and 25 UTF-8 bytes. If your database column is
VARCHAR(255) measured in bytes, a string that looks short could
easily overflow. This tool helps you catch those discrepancies before
they become bugs.
The grapheme cluster count uses the browser's built-in
Intl.Segmenter API to accurately split text into visual units. This
correctly handles complex emoji sequences (like flag emojis and family
emojis), combining diacritical marks (like é composed as e + ́),
and other multi-code-point graphemes. The visual breakdown table lets
you inspect each grapheme individually, showing its Unicode code
points and byte sizes across all three encodings.
The max length checker lets you set a custom limit and instantly see
whether your string exceeds it. The common limits reference shows how
your text measures against popular platform constraints like Twitter's
280-character limit, SMS message limits (160 for GSM, 70 for Unicode),
database VARCHAR columns, HTML title tags, and Git commit messages.
If you need to count words and sentences instead of characters, try the Word & Character Counter. For escaping special characters in different formats, use String Escape/Unescape. To convert text between naming conventions, check out the Text Case Converter.
All processing runs entirely in your browser using JavaScript's native
TextEncoder, Intl.Segmenter, and string APIs. No data is sent
to any server — your text stays completely private.
How to Use
- Type or paste your text into the Text Input area on the left.
- View the results panel on the right, which shows character count, code point count, grapheme cluster count, and byte sizes for UTF-8, UTF-16, and UTF-32 simultaneously.
- Check the Encoding Comparison table to see the bytes-per-code-point ratio for each encoding.
- Set a Max length value in the toolbar to check if your string fits within a specific limit (e.g., 280 for Twitter, 255 for VARCHAR).
- Click Show Grapheme Breakdown to see each visual character with its Unicode code points and byte sizes. Surrogate pairs and multi-byte characters are highlighted.
- Scroll down to the Common Platform Limits table to see how your text measures against popular limits like Twitter, SMS, VARCHAR, and more.
- Press Ctrl+Shift+C or click Copy to copy the results summary to your clipboard.
Popular String Length Examples
FAQ
What is the difference between characters, code points, and grapheme clusters?
In JavaScript, .length returns the number of UTF-16 code units, not characters. A code point is a single Unicode value (e.g., U+1F600 for a smiley face). A grapheme cluster is what a human perceives as a single character — it can consist of multiple code points (e.g., a flag emoji is two regional indicator code points). For ASCII text, all three counts are identical, but for emoji and non-Latin scripts they can differ significantly.
Why does my emoji show different lengths for .length and code points?
Emoji above U+FFFF (like 😀) require two UTF-16 code units (a surrogate pair), so JavaScript's .length counts them as 2. Complex emoji sequences like family emojis use Zero Width Joiners (ZWJ) to combine multiple emoji, resulting in many code units but just one visual grapheme. The code point count and grapheme cluster count in this tool give you the more useful measurements.
Which count should I use for database VARCHAR limits?
It depends on your database and encoding. PostgreSQL VARCHAR(n) counts characters (code points). MySQL VARCHAR(n) with utf8mb4 also counts characters. However, MySQL's TEXT type limits are in bytes. For byte-limited columns, use the UTF-8 byte count. Always check your database documentation to know whether limits are in characters or bytes.
How is the grapheme cluster count calculated?
This tool uses the Intl.Segmenter API (available in modern browsers) with grapheme granularity. This correctly handles complex emoji sequences, combining marks, and other multi-code-point graphemes according to the Unicode segmentation rules. In older browsers without Intl.Segmenter, it falls back to splitting by code points, which may not be accurate for complex emoji.
What are surrogate pairs?
UTF-16 uses 2 bytes per code unit. Characters with code points above U+FFFF (like most emoji and some CJK characters) cannot fit in a single 16-bit code unit, so they are encoded as a pair of code units called a surrogate pair. This is why JavaScript's .length returns 2 for a single emoji. The tool highlights surrogate pairs in orange in the grapheme breakdown.
Is my data safe?
Yes. All processing runs entirely in your browser using JavaScript. No text is sent to any server. You can verify this by checking the Network tab in your browser's developer tools while using the tool.
Why does the same text have different byte sizes in UTF-8 and UTF-16?
UTF-8 and UTF-16 are variable-length encodings with different strategies. UTF-8 uses 1 byte for ASCII, 2-3 bytes for most other scripts, and 4 bytes for emoji. UTF-16 uses 2 bytes for most characters and 4 bytes (a surrogate pair) for characters above U+FFFF. For English text, UTF-8 is more compact. For CJK text, UTF-16 is often smaller. UTF-32 always uses 4 bytes per code point, regardless of the character.
Related Tools
Word & Character Counter
Count words, characters, sentences, paragraphs, and estimate reading time with keyword frequency analysis.
Unicode Inspector
Inspect Unicode characters with code point, UTF-8/UTF-16 encoding, character name, category, and block details.
String Escape/Unescape
Escape and unescape strings for JSON, JavaScript, HTML, URL, SQL, and CSV formats.
Text Case Converter
Convert text between camelCase, PascalCase, snake_case, kebab-case, and other naming conventions.
Whitespace Visualizer
Visualize invisible characters like spaces, tabs, newlines, zero-width spaces, and BOM. Detect line endings and clean hidden characters.