Unicode Combining Characters and Normalization

Learn how Unicode combining characters work, how they interact with normalization forms, and why they matter for text processing and display.

Character Types

Detailed Explanation

Combining Characters in Unicode

Combining characters are Unicode characters that are intended to modify the appearance of the preceding base character. They do not stand alone — they "combine" with the character before them.

Common Combining Characters

Code Point Name Example
U+0300 COMBINING GRAVE ACCENT à → à
U+0301 COMBINING ACUTE ACCENT é → é
U+0302 COMBINING CIRCUMFLEX ACCENT ô → ô
U+0303 COMBINING TILDE ñ → ñ
U+0308 COMBINING DIAERESIS ü → ü
U+0327 COMBINING CEDILLA ç → ç

Multiple Combining Characters

A single base character can have multiple combining marks:

a + ̈ (diaeresis) + ́ (acute) = ä́

How Normalization Handles Them

  • NFC: Combines base + combining mark into precomposed form (if one exists)
  • NFD: Separates precomposed characters into base + combining marks
  • Canonical ordering: When multiple combining marks are present, NFD/NFC sort them by their Canonical Combining Class (CCC) value

Why This Matters

If you have two strings — one using a precomposed character and one using a base + combining mark — they will not be equal in a byte comparison unless you normalize them first. This is the most common reason to use normalization.

Use Case

Essential for developers working with multilingual text, especially languages with diacritics (French, German, Spanish, Vietnamese). Understanding combining characters prevents bugs in text search, sorting, and display across different platforms and browsers.

Try It — Unicode Normalizer

Open full tool