Emoji String Length: Why One Emoji Can Be 7+ Code Points
Discover why a single emoji can have a JavaScript .length of 11 or more. Covers simple emoji, skin tone modifiers, ZWJ sequences, and flag emoji.
Detailed Explanation
Emoji Length: It is Complicated
Emoji are the most common source of string length surprises. What looks like a single character can be anywhere from 2 to 25+ UTF-8 bytes.
Emoji Complexity Levels
Level 1: Simple Emoji (1 code point, 2 UTF-16 code units)
😀 U+1F600 .length = 2 UTF-8: 4 bytes
Level 2: Skin Tone Modified (2 code points, 4 UTF-16 code units)
👋🏾 U+1F44B U+1F3FE .length = 4 UTF-8: 8 bytes
Level 3: ZWJ Sequence (multiple code points joined by U+200D)
👨💻 U+1F468 U+200D U+1F4BB .length = 5 UTF-8: 11 bytes
(man technologist = man + ZWJ + laptop)
Level 4: Complex Family Emoji
👨👩👧👦 U+1F468 U+200D U+1F469 U+200D U+1F467 U+200D U+1F466
.length = 11 UTF-8: 25 bytes Graphemes: 1
Flag Emoji (Regional Indicators)
Flag emoji use pairs of Regional Indicator Symbols (U+1F1E6–U+1F1FF):
🇯🇵 = U+1F1EF U+1F1F5 (JP) .length = 4 UTF-8: 8 bytes
🇺🇸 = U+1F1FA U+1F1F8 (US) .length = 4 UTF-8: 8 bytes
Each regional indicator is a surrogate pair in UTF-16, so a flag is 4 code units (8 bytes in UTF-16).
Comparison Table
| Emoji | Graphemes | Code Points | .length | UTF-8 Bytes |
|---|---|---|---|---|
| 😀 | 1 | 1 | 2 | 4 |
| 👋🏾 | 1 | 2 | 4 | 8 |
| 🇯🇵 | 1 | 2 | 4 | 8 |
| 👨💻 | 1 | 3 | 5 | 11 |
| 👨👩👧👦 | 1 | 7 | 11 | 25 |
Why This Matters
A "280 character" tweet limit on Twitter actually counts by a weighted scheme, not by .length. If you are building character counters, always count grapheme clusters, not code units or code points.
Use Case
When building social media applications, chat systems, or any UI with character limits that allows emoji input, understanding emoji encoding complexity is essential for correct character counting, storage estimation, and preventing data truncation.