NFC vs NFD — Canonical Composition vs Decomposition

Understand the difference between NFC (Canonical Composition) and NFD (Canonical Decomposition). Learn when to use each form and how they transform accented characters.

Core Forms

Detailed Explanation

NFC vs NFD: The Two Canonical Forms

NFC and NFD are the two canonical normalization forms. They produce canonically equivalent text — meaning the text represents the same abstract characters — but they differ in how those characters are stored.

NFD: Canonical Decomposition

NFD breaks precomposed characters into their base character plus combining marks:

Input NFD Result Code Points
é (U+00E9) U+0065 + U+0301
ñ (U+00F1) U+006E + U+0303
ü (U+00FC) U+0075 + U+0308

NFC: Canonical Composition

NFC first decomposes (like NFD), then recomposes characters into precomposed form where possible:

Input NFC Result Code Points
e + ́ é U+00E9
n + ̃ ñ U+00F1
u + ̈ ü U+00FC

Key Differences

  • NFC produces shorter strings (fewer code points) because it combines characters
  • NFD produces longer strings but makes combining marks explicit
  • Both are canonically equivalent — they represent the same text
  • NFC is the W3C recommendation for web content
  • NFD is used by macOS APFS for filenames

When They Produce the Same Output

For ASCII-only text (no accents or special characters), NFC and NFD produce identical output because there is nothing to compose or decompose.

Use Case

Essential for web developers working with internationalized text. The W3C recommends NFC for HTML content, while macOS file systems use NFD. Understanding the difference prevents bugs in file handling, form submission, and database storage of accented text.

Try It — Unicode Normalizer

Open full tool