NFKC vs NFKD — Compatibility Composition vs Decomposition
Learn how NFKC and NFKD differ from NFC/NFD by applying compatibility decomposition. Understand when ligatures, fullwidth characters, and special symbols are transformed.
Detailed Explanation
NFKC vs NFKD: Compatibility Normalization
NFKC and NFKD add compatibility decomposition on top of canonical normalization. This means they replace characters that are semantically different but visually similar with their standard equivalents.
What Compatibility Decomposition Does
| Input | NFKC/NFKD Result | Description |
|---|---|---|
fi (fi ligature) |
fi |
Ligature split into letters |
A (fullwidth A) |
A |
Fullwidth to ASCII |
½ (vulgar fraction) |
1⁄2 |
Fraction decomposed |
Ω (Ohm sign) |
Ω |
Symbol to Greek letter |
Ⅰ (Roman numeral I) |
I |
Numeral to letter |
NFKC vs NFKD
The difference between NFKC and NFKD mirrors the NFC/NFD distinction:
- NFKD: Compatibility decomposition only (longer output)
- NFKC: Compatibility decomposition followed by canonical composition (shorter output)
For example, with fié (fi-ligature followed by é):
- NFKD:
f+i+e+́(4 code points) - NFKC:
f+i+é(3 code points)
Important Warning
Compatibility normalization is lossy — it discards formatting distinctions that may be meaningful. The fi-ligature and the letters "fi" are semantically different in some contexts (e.g., typography). Only use NFKC/NFKD when you intentionally want to discard these distinctions.
Use Case
Critical for search engines, username validation, and security systems that need to treat visually similar characters identically. NFKC is used by Python's NFKC casefold for identifier comparison, and by PRECIS (RFC 8264) for username/password preparation.