ASCII String Length in Different Encodings
Understand how plain ASCII text is measured in character count, code points, grapheme clusters, and byte sizes across UTF-8, UTF-16, and UTF-32 encodings.
Detailed Explanation
ASCII: The Simplest Case
For pure ASCII strings (characters U+0000 through U+007F), all length measurements yield consistent and predictable results. This makes ASCII the perfect starting point for understanding string length.
Example String
Hello, World!
Length Measurements
| Metric | Value |
|---|---|
JavaScript .length |
13 |
| Code points | 13 |
| Grapheme clusters | 13 |
| UTF-8 bytes | 13 |
| UTF-16 bytes | 26 |
| UTF-32 bytes | 52 |
Why They Match (Mostly)
Every ASCII character maps to a single Unicode code point below U+0080. In UTF-8, these code points each require exactly 1 byte because UTF-8 was designed to be backward-compatible with ASCII. This means UTF-8 byte count equals the character count for pure ASCII.
In UTF-16, every character still fits in a single 16-bit code unit, but each code unit is 2 bytes. So UTF-16 byte count is always 2 × character count for ASCII. UTF-32 uses a fixed 4 bytes per code point, so it is always 4 × code point count.
Practical Implication
If your application only handles ASCII text, you can safely use JavaScript's .length as the byte count for UTF-8 storage. However, the moment you allow user input with accented characters, CJK text, or emoji, this assumption breaks down completely.
Use Case
When validating input fields that are restricted to ASCII-only characters (like usernames, slugs, or machine identifiers), you can safely equate .length with UTF-8 byte count. This is common in URL slugs, file naming, and protocol-level identifiers.