Common Emoji Code Points and UTF-8 Encoding
Explore how popular emoji are encoded in Unicode — their code points on the Supplementary Multilingual Plane, 4-byte UTF-8 sequences, and UTF-16 surrogate pairs.
Detailed Explanation
Common Emoji in Unicode
Most emoji reside on the Supplementary Multilingual Plane (Plane 1) at code points U+1F300 and above. Because they exceed U+FFFF, they require special handling in UTF-16 (surrogate pairs) and use 4 bytes in UTF-8.
Popular Emoji Code Points
| Emoji | Code Point | UTF-8 Bytes | UTF-16 Units | Name |
|---|---|---|---|---|
| 😀 | U+1F600 | F0 9F 98 80 | D83D DE00 | GRINNING FACE |
| ❤ | U+2764 | E2 9D A4 | 2764 | HEAVY RED HEART |
| 🌍 | U+1F30D | F0 9F 8C 8D | D83D DF0D | EARTH GLOBE EUROPE-AFRICA |
| 🚀 | U+1F680 | F0 9F 9A 80 | D83D DE80 | ROCKET |
| ✅ | U+2705 | E2 9C 85 | 2705 | CHECK MARK |
| 💡 | U+1F4A1 | F0 9F 92 A1 | D83D DCA1 | ELECTRIC LIGHT BULB |
4-Byte UTF-8 Encoding
Emoji above U+FFFF follow the UTF-8 pattern for 4-byte sequences:
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
For U+1F600: binary 0001 1111 0110 0000 0000 → F0 9F 98 80
UTF-16 Surrogate Pairs
In JavaScript strings (which use UTF-16), emoji above U+FFFF are stored as two 16-bit code units called a surrogate pair:
- High surrogate: 0xD800–0xDBFF
- Low surrogate: 0xDC00–0xDFFF
This means "\ud83d\ude00".length === 2 in JavaScript, even though it represents a single visible character. The Unicode Inspector correctly counts such emoji as one code point while showing both UTF-16 units.
Emoji Sequences
Many modern emoji are composed of multiple code points joined by Zero Width Joiner (U+200D). For example, the family emoji is four person emoji joined by ZWJ characters. Variation selectors (U+FE0E, U+FE0F) control whether a character renders as text or emoji style. The Unicode Inspector reveals every component in such sequences.
Use Case
Use this when debugging emoji rendering issues in web applications, calculating the true byte length of strings containing emoji for database storage, or understanding why JavaScript's string.length gives unexpected results for emoji text.