Unicode Normalization in Programming Languages
See how to normalize Unicode in JavaScript, Python, Java, Go, Rust, and Swift. Includes code examples for each language's normalization API.
Programming
Detailed Explanation
Normalization Across Programming Languages
Every major programming language provides Unicode normalization, but the APIs differ significantly.
JavaScript
const text = "café";
text.normalize("NFC"); // NFC (default)
text.normalize("NFD"); // NFD
text.normalize("NFKC"); // NFKC
text.normalize("NFKD"); // NFKD
JavaScript's String.prototype.normalize() defaults to NFC if no argument is given.
Python
import unicodedata
text = "café"
unicodedata.normalize("NFC", text)
unicodedata.normalize("NFD", text)
unicodedata.normalize("NFKC", text)
unicodedata.normalize("NFKD", text)
# Python identifiers are NFKC-normalized
# So `café` and `café` are the same variable name
Java
import java.text.Normalizer;
String text = "café";
Normalizer.normalize(text, Normalizer.Form.NFC);
Normalizer.normalize(text, Normalizer.Form.NFD);
Normalizer.normalize(text, Normalizer.Form.NFKC);
Normalizer.normalize(text, Normalizer.Form.NFKD);
Go
import "golang.org/x/text/unicode/norm"
text := "café"
nfc := norm.NFC.String(text)
nfd := norm.NFD.String(text)
nfkc := norm.NFKC.String(text)
nfkd := norm.NFKD.String(text)
Rust
// Using the unicode-normalization crate
use unicode_normalization::UnicodeNormalization;
let text = "café";
let nfc: String = text.nfc().collect();
let nfd: String = text.nfd().collect();
let nfkc: String = text.nfkc().collect();
let nfkd: String = text.nfkd().collect();
Swift
let text = "café"
text.precomposedStringWithCanonicalMapping // NFC
text.decomposedStringWithCanonicalMapping // NFD
text.precomposedStringWithCompatibilityMapping // NFKC
text.decomposedStringWithCompatibilityMapping // NFKD
Key Differences
- JavaScript/Python normalize lazily (on demand)
- Swift strings use grapheme-based comparison by default, so normalization differences don't affect
== - Python NFKC-normalizes identifiers at compile time
Use Case
Reference for developers implementing Unicode normalization in their applications. Useful when porting code between languages, reviewing code for normalization correctness, or choosing the right normalization API for a new project.