Zero-Width Characters and Invisible String Length

Explore zero-width joiners, zero-width spaces, and other invisible Unicode characters that add to string length without being visible to the user.

Emoji

Detailed Explanation

Zero-Width Characters: Invisible But Counted

Several Unicode characters have zero visual width but still count toward string length. These can cause subtle bugs in validation, comparison, and storage.

Common Zero-Width Characters

Character Code Point UTF-8 Bytes Purpose
Zero Width Space (ZWSP) U+200B 3 Word break hint
Zero Width Joiner (ZWJ) U+200D 3 Joins emoji sequences
Zero Width Non-Joiner (ZWNJ) U+200C 3 Prevents ligatures
Word Joiner (WJ) U+2060 3 Prevents line break
Soft Hyphen U+00AD 2 Optional hyphenation point
BOM (Byte Order Mark) U+FEFF 3 Encoding indicator

The ZWJ in Emoji

The Zero Width Joiner (U+200D) is what makes complex emoji possible:

👩 + ZWJ + 🚀 = 👩‍🚀 (woman astronaut)

Each ZWJ adds 3 UTF-8 bytes and 1 code unit to the string, but no visible width. A family emoji with 3 ZWJs adds 9 invisible bytes.

Hidden Text Attacks

Zero-width characters can be used to:

  1. Bypass filters: Insert ZWSP between banned words so they are not detected
  2. Watermark text: Embed invisible patterns to track copy-paste
  3. Break validation: A seemingly empty input that has non-zero length
"".length              // 0
"\u200B".length         // 1 (looks empty but isn't!)
"\u200B\u200C\u200D".length  // 3 (three invisible characters)

Detection and Removal

// Detect zero-width characters
const hasZeroWidth = /[\u200B\u200C\u200D\u2060\uFEFF]/.test(str);

// Remove zero-width characters
const clean = str.replace(/[\u200B\u200C\u200D\u2060\uFEFF]/g, "");

Impact on String Comparison

Two strings that look identical may differ in zero-width characters:

"hello" === "hel\u200Blo"  // false!

The String Length Calculator's grapheme breakdown helps identify these invisible characters by showing the exact code points for each position.

Use Case

When building input sanitization, content moderation, or anti-spam systems, detecting zero-width characters helps prevent filter bypass, hidden text attacks, and invisible string length inflation that can cause unexpected database or API errors.

Try It — String Length Calculator

Open full tool