Unicode Normalization in Programming Languages

See how to normalize Unicode in JavaScript, Python, Java, Go, Rust, and Swift. Includes code examples for each language's normalization API.

Programming

Detailed Explanation

Normalization Across Programming Languages

Every major programming language provides Unicode normalization, but the APIs differ significantly.

JavaScript

const text = "café";
text.normalize("NFC");   // NFC (default)
text.normalize("NFD");   // NFD
text.normalize("NFKC");  // NFKC
text.normalize("NFKD");  // NFKD

JavaScript's String.prototype.normalize() defaults to NFC if no argument is given.

Python

import unicodedata

text = "café"
unicodedata.normalize("NFC", text)
unicodedata.normalize("NFD", text)
unicodedata.normalize("NFKC", text)
unicodedata.normalize("NFKD", text)

# Python identifiers are NFKC-normalized
# So `café` and `café` are the same variable name

Java

import java.text.Normalizer;

String text = "café";
Normalizer.normalize(text, Normalizer.Form.NFC);
Normalizer.normalize(text, Normalizer.Form.NFD);
Normalizer.normalize(text, Normalizer.Form.NFKC);
Normalizer.normalize(text, Normalizer.Form.NFKD);

Go

import "golang.org/x/text/unicode/norm"

text := "café"
nfc := norm.NFC.String(text)
nfd := norm.NFD.String(text)
nfkc := norm.NFKC.String(text)
nfkd := norm.NFKD.String(text)

Rust

// Using the unicode-normalization crate
use unicode_normalization::UnicodeNormalization;

let text = "café";
let nfc: String = text.nfc().collect();
let nfd: String = text.nfd().collect();
let nfkc: String = text.nfkc().collect();
let nfkd: String = text.nfkd().collect();

Swift

let text = "café"
text.precomposedStringWithCanonicalMapping      // NFC
text.decomposedStringWithCanonicalMapping       // NFD
text.precomposedStringWithCompatibilityMapping  // NFKC
text.decomposedStringWithCompatibilityMapping   // NFKD

Key Differences

  • JavaScript/Python normalize lazily (on demand)
  • Swift strings use grapheme-based comparison by default, so normalization differences don't affect ==
  • Python NFKC-normalizes identifiers at compile time

Use Case

Reference for developers implementing Unicode normalization in their applications. Useful when porting code between languages, reviewing code for normalization correctness, or choosing the right normalization API for a new project.

Try It — Unicode Normalizer

Open full tool