String Length Surprises with Zalgo Text

Discover how Zalgo text affects string length calculations in JavaScript, Python, and other languages, and the difference between code points, code units, and grapheme clusters.

Technical

Detailed Explanation

String Length and Zalgo Text

Zalgo text creates a disconnect between visual length and programmatic string length. A single visible character can contain dozens of code points.

JavaScript String Length

const clean = "Hello";
const zalgo = "H\u0300\u0301\u0302e\u0303\u0304l\u0305\u0306l\u0307\u0308o\u0309\u030A";

clean.length;  // 5
zalgo.length;  // 15 (5 base + 10 combining)

JavaScript's .length counts UTF-16 code units, which includes every combining mark. This means zalgo text reports a much longer length than it appears.

Grapheme-Aware Length

To get the "visual" length, use the Intl.Segmenter API:

function graphemeLength(str) {
  const segmenter = new Intl.Segmenter('en', { granularity: 'grapheme' });
  return [...segmenter.segment(str)].length;
}

graphemeLength(clean);  // 5
graphemeLength(zalgo);  // 5 (same visual length!)

Python

import unicodedata

clean = "Hello"
# Zalgo version with combining marks
zalgo = "H\u0300\u0301\u0302e\u0303\u0304l\u0305\u0306l\u0307\u0308o\u0309\u030A"

len(clean)  # 5
len(zalgo)  # 15

# For grapheme clusters:
import grapheme
grapheme.length(zalgo)  # 5

Practical Implications

  1. Character limits: A 280-character tweet limit counts code points, so zalgo text eats up the limit quickly
  2. Database storage: VARCHAR(100) may not hold 100 visible zalgo characters
  3. Input validation: Checking input.length <= 50 may reject zalgo text that looks like only 10 characters
  4. Truncation: Naively truncating at index N may cut in the middle of a grapheme cluster
  5. Bandwidth: Zalgo text is significantly larger in bytes than its clean equivalent

Safe Truncation

function safeTruncate(str, maxGraphemes) {
  const segmenter = new Intl.Segmenter('en', { granularity: 'grapheme' });
  const segments = [...segmenter.segment(str)];
  return segments.slice(0, maxGraphemes).map(s => s.segment).join('');
}

Use Case

Understanding string length behavior with Zalgo text is critical for developers implementing character limits, input validation, database schemas, and text truncation in applications that handle user-generated Unicode content.

Try It — Zalgo Text Generator

Open full tool