Combining Marks Beyond U+0300–U+036F
Explore the extended Unicode blocks for combining marks including Combining Diacritical Marks Extended, Combining Diacritical Marks Supplement, and more.
Detailed Explanation
Extended Combining Mark Blocks
While the primary Combining Diacritical Marks block (U+0300–U+036F) is the most commonly used for Zalgo text, Unicode defines several additional blocks of combining characters.
All Combining Mark Blocks
| Block | Range | Characters | Purpose |
|---|---|---|---|
| Combining Diacritical Marks | U+0300–U+036F | 112 | Standard accents, tildes, hooks |
| Combining Diacritical Marks Extended | U+1AB0–U+1AFF | 80 | Medieval and phonetic marks |
| Combining Diacritical Marks Supplement | U+1DC0–U+1DFF | 64 | Additional phonetic marks |
| Combining Diacritical Marks for Symbols | U+20D0–U+20FF | 48 | Mathematical symbols |
| Combining Half Marks | U+FE20–U+FE2F | 16 | Double diacritics |
Combining Diacritical Marks Extended (U+1AB0–U+1AFF)
This block was added in Unicode 7.0 and contains marks used primarily in medieval manuscripts and specialized phonetic notation:
- U+1AB0: Combining Doubled Circumflex Accent
- U+1AB1–U+1ABE: Various medieval combining marks
- Used in scholarly editions of historical texts
Combining Diacritical Marks Supplement (U+1DC0–U+1DFF)
Added across several Unicode versions for UPA (Uralic Phonetic Alphabet) and other phonetic systems:
- U+1DC0: Combining Dotted Grave Accent
- U+1DC1: Combining Dotted Acute Accent
- Used in detailed phonetic transcription
Combining Diacritical Marks for Symbols (U+20D0–U+20FF)
These are specifically for mathematical and technical symbols:
- U+20D0: Combining Left Harpoon Above
- U+20D1: Combining Right Harpoon Above
- U+20D2: Combining Long Vertical Line Overlay
- Used in mathematical notation (vector arrows, etc.)
Implications for Zalgo Stripping
A thorough Zalgo removal regex should cover all these blocks:
const COMBINING_MARKS = /[\u0300-\u036F\u1AB0-\u1AFF\u1DC0-\u1DFF\u20D0-\u20FF\uFE20-\uFE2F]/g;
// Or use the Unicode property:
const COMBINING_MARKS_SAFE = /\p{Mn}/gu;
Using the Unicode property escape \p{Mn} is the most future-proof approach, as it automatically covers new combining marks added in future Unicode versions.
Use Case
Knowledge of extended combining mark blocks is important for developers building comprehensive Unicode sanitization, for linguists working with specialized phonetic transcription, and for maximizing the variety of Zalgo effects using the full range of available combining characters.