How Zalgo Text Works at the Unicode Level

Explore the internal Unicode representation of Zalgo text, including code points, grapheme clusters, and how renderers handle excessive combining marks.

Fundamentals

Detailed Explanation

Zalgo at the Unicode Level

To truly understand Zalgo text, you need to look at what happens at the code point level and how text rendering systems process the stacked combining marks.

Code Point Representation

A single "zalgo character" is actually multiple Unicode code points:

Base character:  H (U+0048)
Combining above: ̀ (U+0300) ́ (U+0301) ̂ (U+0302)
Combining below: ̧ (U+0327) ̰ (U+0330)
Combining mid:   ̶ (U+0336)

The string "H" with 6 combining marks is stored as 7 code points but renders as a single (heavily modified) glyph.

String Length vs. Visual Length

This has important implications for string handling:

const zalgo = "Ḩ̶̰̀́̂";
zalgo.length;           // 7 (code units)
[...zalgo].length;      // 7 (code points)
// But visually it appears as ONE character

Grapheme Clusters

Unicode defines grapheme clusters as user-perceived characters. A base character plus all its combining marks forms a single extended grapheme cluster. The Intl.Segmenter API correctly identifies this:

const segmenter = new Intl.Segmenter('en', { granularity: 'grapheme' });
const segments = [...segmenter.segment(zalgo)];
segments.length; // 1 (one grapheme cluster)

Rendering Behavior

Text rendering engines handle excessive combining marks differently:

  • Most browsers: Attempt to render all marks, causing visual overflow
  • Terminal emulators: May truncate or ignore excess marks
  • Mobile devices: May limit rendering to prevent performance issues
  • PDF generators: Usually render all marks faithfully

Canonical Ordering

Unicode specifies a Canonical Combining Class (CCC) for each combining mark, which determines rendering order. Marks with the same CCC value may be reordered during normalization. Marks with CCC 0 (spacing marks) are never reordered.

Use Case

Understanding Unicode internals is essential for developers building text processing pipelines, input sanitization systems, or debugging rendering issues with internationalized text that contains unexpected combining characters.

Try It — Zalgo Text Generator

Open full tool