Question 1

Unicode Normalization for URL Comparison

Accepted Answer

## Normalization in URLs

URLs can contain Unicode characters in two ways: directly in Internationalized Domain Names (IDN) and percent-encoded in the path and query components. Normalization is critical for comparing and deduplicating URLs.

### Internationalized Domain Names (IDN)

Domain names like café.com are converted to Punycode (xn--caf-dma.com) for DNS. The IDN standard (IDNA2008) requires NFC normalization before the Punycode conversion:

café.com (NFC: U+00E9)  →  xn--caf-dma.com
caf

Question 2

When is this useful?

Accepted Answer

Important for web crawlers, URL deduplication systems, CDN cache key generation, and security tools that need to detect equivalent URLs. Also critical for internationalized web applications handling user-provided URLs.

Unicode Normalization for URL Comparison

Detailed Explanation

Normalization in URLs

Internationalized Domain Names (IDN)

Percent-Encoded Paths

URL Comparison Best Practice

Security Implications

Use Case

Try It — Unicode Normalizer

Related Topics