Surrogate Pairs: Characters Beyond the BMP

Learn how characters outside the Basic Multilingual Plane use surrogate pairs in UTF-16, causing JavaScript's .length to return 2 for a single character.

Multi-byte Characters

Detailed Explanation

Surrogate Pairs in UTF-16

The Basic Multilingual Plane (BMP) contains Unicode code points U+0000 through U+FFFF. Characters beyond this range (supplementary planes) cannot fit in a single 16-bit code unit, so UTF-16 encodes them as surrogate pairs — two 16-bit code units working together.

How Surrogate Pairs Work

For a code point like U+1F600 (😀 Grinning Face):

  1. Subtract 0x10000: 0x1F600 - 0x10000 = 0xF600
  2. High surrogate: 0xD800 + (0xF600 >> 10) = 0xD83D
  3. Low surrogate: 0xDC00 + (0xF600 & 0x3FF) = 0xDE00

JavaScript stores this as two code units: \uD83D\uDE00

Impact on JavaScript .length

"😀".length          // 2  (surrogate pair)
[..."😀"].length     // 1  (code points)
"😀".codePointAt(0)  // 128512 (U+1F600)

Characters That Use Surrogate Pairs

Category Range Examples
Emoji U+1F600–U+1FAFF 😀 🚀 🍕
Math symbols U+1D400–U+1D7FF 𝐀 𝐁 𝐂 (bold math)
Musical symbols U+1D100–U+1D1FF 𝄞 (treble clef)
Historic scripts U+10000–U+1007F 𐀀 (Linear B)
CJK Extension B+ U+20000–U+2A6FF Rare kanji

String Operations That Break

Common string operations can corrupt surrogate pairs:

// WRONG: May split surrogate pair
str.substring(0, 1)  // Could return lone high surrogate
str.charAt(0)        // Returns only high surrogate

// CORRECT: Use code-point-aware methods
[...str].slice(0, 1).join("")
str.slice(0, [...str][0].length)

Byte Sizes

Encoding Bytes per Surrogate-Pair Character
UTF-8 4 bytes
UTF-16 4 bytes (2 code units × 2 bytes)
UTF-32 4 bytes (always)

Interestingly, all three encodings use the same 4 bytes for supplementary characters. The difference only matters for BMP characters.

Use Case

When building JavaScript applications that manipulate strings containing emoji or rare characters, understanding surrogate pairs is essential to avoid data corruption during substring operations, database storage, and API payload handling.

Try It — String Length Calculator

Open full tool