Unicode Combining Diacritical Marks Explained

Deep dive into Unicode combining diacritical marks (U+0300–U+036F), how they modify base characters, and why they enable the Zalgo text effect.

Fundamentals

Detailed Explanation

Combining Diacritical Marks in Unicode

The Unicode standard defines combining diacritical marks as characters that do not stand alone but instead modify the preceding base character. The primary block is Combining Diacritical Marks (U+0300–U+036F), containing 112 characters.

How Combining Characters Work

In Unicode, text is stored as a sequence of code points. When a renderer encounters a combining mark, it visually attaches it to the preceding base character:

Code points: U+0061 U+0301
Rendered:    á (a with acute accent)

Multiple combining marks can stack:

Code points: U+0061 U+0301 U+0308 U+0303
Rendered:    á̈̃ (a with acute, diaeresis, and tilde)

Categories of Combining Marks

Category Range Examples Position
Above U+0300–U+0315 ̀ ́ ̂ ̃ ̈ Top of character
Below U+0316–U+0333 ̧ ̨ ̰ ̱ Bottom of character
Overlay U+0334–U+0338 ̴ ̵ ̶ ̷ Through character
Extensions U+0339–U+036F ͅ ͠ ͡ Various positions

Why Zalgo Exploits This

The Unicode spec does not define a hard limit on how many combining marks can follow a base character. Renderers attempt to display all of them, stacking them visually. Adding 10+ marks in each direction creates the overflow and distortion that characterizes Zalgo text.

Normalization and Combining Marks

Unicode normalization forms (NFC, NFD) can decompose or compose characters with combining marks. However, normalization does not remove excess combining marks — it only handles canonical equivalences. To remove Zalgo, you must explicitly strip combining mark code points.

Use Case

Knowledge of combining diacritical marks is critical for developers building text processing systems, input validation, content moderation filters, and internationalization (i18n) support in software applications.

Try It — Zalgo Text Generator

Open full tool