Combining Marks Beyond U+0300–U+036F

Explore the extended Unicode blocks for combining marks including Combining Diacritical Marks Extended, Combining Diacritical Marks Supplement, and more.

Technical

Detailed Explanation

Extended Combining Mark Blocks

While the primary Combining Diacritical Marks block (U+0300–U+036F) is the most commonly used for Zalgo text, Unicode defines several additional blocks of combining characters.

All Combining Mark Blocks

Block Range Characters Purpose
Combining Diacritical Marks U+0300–U+036F 112 Standard accents, tildes, hooks
Combining Diacritical Marks Extended U+1AB0–U+1AFF 80 Medieval and phonetic marks
Combining Diacritical Marks Supplement U+1DC0–U+1DFF 64 Additional phonetic marks
Combining Diacritical Marks for Symbols U+20D0–U+20FF 48 Mathematical symbols
Combining Half Marks U+FE20–U+FE2F 16 Double diacritics

Combining Diacritical Marks Extended (U+1AB0–U+1AFF)

This block was added in Unicode 7.0 and contains marks used primarily in medieval manuscripts and specialized phonetic notation:

  • U+1AB0: Combining Doubled Circumflex Accent
  • U+1AB1–U+1ABE: Various medieval combining marks
  • Used in scholarly editions of historical texts

Combining Diacritical Marks Supplement (U+1DC0–U+1DFF)

Added across several Unicode versions for UPA (Uralic Phonetic Alphabet) and other phonetic systems:

  • U+1DC0: Combining Dotted Grave Accent
  • U+1DC1: Combining Dotted Acute Accent
  • Used in detailed phonetic transcription

Combining Diacritical Marks for Symbols (U+20D0–U+20FF)

These are specifically for mathematical and technical symbols:

  • U+20D0: Combining Left Harpoon Above
  • U+20D1: Combining Right Harpoon Above
  • U+20D2: Combining Long Vertical Line Overlay
  • Used in mathematical notation (vector arrows, etc.)

Implications for Zalgo Stripping

A thorough Zalgo removal regex should cover all these blocks:

const COMBINING_MARKS = /[\u0300-\u036F\u1AB0-\u1AFF\u1DC0-\u1DFF\u20D0-\u20FF\uFE20-\uFE2F]/g;

// Or use the Unicode property:
const COMBINING_MARKS_SAFE = /\p{Mn}/gu;

Using the Unicode property escape \p{Mn} is the most future-proof approach, as it automatically covers new combining marks added in future Unicode versions.

Use Case

Knowledge of extended combining mark blocks is important for developers building comprehensive Unicode sanitization, for linguists working with specialized phonetic transcription, and for maximizing the variety of Zalgo effects using the full range of available combining characters.

Try It — Zalgo Text Generator

Open full tool