NFC vs NFD — Canonical Composition vs Decomposition
Understand the difference between NFC (Canonical Composition) and NFD (Canonical Decomposition). Learn when to use each form and how they transform accented characters.
Detailed Explanation
NFC vs NFD: The Two Canonical Forms
NFC and NFD are the two canonical normalization forms. They produce canonically equivalent text — meaning the text represents the same abstract characters — but they differ in how those characters are stored.
NFD: Canonical Decomposition
NFD breaks precomposed characters into their base character plus combining marks:
| Input | NFD Result | Code Points |
|---|---|---|
é (U+00E9) |
é |
U+0065 + U+0301 |
ñ (U+00F1) |
ñ |
U+006E + U+0303 |
ü (U+00FC) |
ü |
U+0075 + U+0308 |
NFC: Canonical Composition
NFC first decomposes (like NFD), then recomposes characters into precomposed form where possible:
| Input | NFC Result | Code Points |
|---|---|---|
e + ́ |
é |
U+00E9 |
n + ̃ |
ñ |
U+00F1 |
u + ̈ |
ü |
U+00FC |
Key Differences
- NFC produces shorter strings (fewer code points) because it combines characters
- NFD produces longer strings but makes combining marks explicit
- Both are canonically equivalent — they represent the same text
- NFC is the W3C recommendation for web content
- NFD is used by macOS APFS for filenames
When They Produce the Same Output
For ASCII-only text (no accents or special characters), NFC and NFD produce identical output because there is nothing to compose or decompose.
Use Case
Essential for web developers working with internationalized text. The W3C recommends NFC for HTML content, while macOS file systems use NFD. Understanding the difference prevents bugs in file handling, form submission, and database storage of accented text.