Unicode Normalization for Text Comparison

Q: Unicode Normalization for Text Comparison

## Normalizing for Correct Text Comparison String comparison is the most common reason to use Unicode normalization. Without it, two visually identical strings can fail an equality check. ### The Problem javascript const a = "é"; // U+00E9 (precomposed) const b = "é"; // U+0065 + U+0301 (decomposed) a === b; // false! a.length === b.length; // false! (1 vs 2) Both a and b display as "é" but they are not byte-equal. ### The Solution javascript a.normalize("NF

Learn how to properly compare Unicode text strings using normalization. Avoid subtle bugs where visually identical strings fail equality checks due to different code point representations.

Best Practices

Detailed Explanation

Normalizing for Correct Text Comparison

String comparison is the most common reason to use Unicode normalization. Without it, two visually identical strings can fail an equality check.

The Problem

const a = "é";      // U+00E9 (precomposed)
const b = "é";     // U+0065 + U+0301 (decomposed)

a === b;                  // false!
a.length === b.length;    // false! (1 vs 2)

Both a and b display as "é" but they are not byte-equal.

The Solution

a.normalize("NFC") === b.normalize("NFC");  // true
a.normalize("NFD") === b.normalize("NFD");  // true

Choosing the Right Form for Comparison

Scenario	Recommended Form
Exact text comparison	NFC or NFD (either works, be consistent)
Search / indexing	NFKC (treats compatibility characters as equal)
Username comparison	NFKC + case fold
File path comparison	NFC (cross-platform safe)
Cryptographic hashing	NFC (canonical, compact)

Sorting and Collation

Normalization ensures correct sorting of accented characters:

const names = ["Zoë", "Zoë"];
// Without normalization, these might sort differently
// With normalization, they are treated as identical
const sorted = names
  .map(n => n.normalize("NFC"))
  .sort((a, b) => a.localeCompare(b));

Hash-Based Comparison

If you are using hash-based data structures (hash maps, sets, checksums), normalization is critical:

const set = new Set();
set.add("café".normalize("NFC"));
set.has("café".normalize("NFC"));  // true

// Without normalization:
const set2 = new Set();
set2.add("café");
set2.has("café");  // false!

Performance Tip

Normalize once at the point of input (form submission, file read, API response), not repeatedly at each comparison. Store the normalized form.

Use Case

Fundamental for any application comparing text: authentication systems verifying passwords and usernames, duplicate detection in databases, spell checkers, autocomplete systems, and test frameworks comparing expected vs actual output.

Try It — Unicode Normalizer

Open full tool →