URL Normalization and Canonicalization
Learn how to normalize URLs by resolving differences in encoding, trailing slashes, scheme casing, port omission, and path segments. Essential for caching, deduplication, and SEO.
Detailed Explanation
URL Normalization
URL normalization (or canonicalization) is the process of converting a URL into a consistent, standardized format. Multiple different URL strings can point to the same resource, and normalization resolves these differences.
Why Normalize?
Different URLs can reference the same resource:
https://EXAMPLE.COM/path → https://example.com/path
https://example.com:443/path → https://example.com/path
https://example.com/path/ → https://example.com/path
https://example.com/a/../b/page → https://example.com/b/page
https://example.com/path?b=2&a=1 → https://example.com/path?a=1&b=2
https://example.com/%7Euser → https://example.com/~user
Common Normalization Steps
Lowercase the scheme and host
HTTP://Example.COM → http://example.comRemove default port
https://example.com:443 → https://example.com http://example.com:80 → http://example.comRemove dot segments
/a/b/../c/./d → /a/c/dDecode unreserved characters
/%7Euser → /~user /caf%C3%A9 → /café (optional, depends on context)Normalize empty path
https://example.com → https://example.com/Sort query parameters (optional, context-dependent)
?z=3&a=1&m=2 → ?a=1&m=2&z=3Remove trailing slash (or always add one — pick a convention)
/path/ → /path (or vice versa)
Implementation
function normalizeUrl(urlString) {
const url = new URL(urlString);
// Scheme and host are already lowercased by URL constructor
// Default port is already removed by URL constructor
// Remove dot segments (handled by URL constructor)
// Remove trailing slash from path (optional)
if (url.pathname.length > 1 && url.pathname.endsWith("/")) {
url.pathname = url.pathname.slice(0, -1);
}
// Sort query parameters
url.searchParams.sort();
// Remove empty hash
if (url.hash === "#") url.hash = "";
return url.toString();
}
SEO Impact
Search engines normalize URLs to determine canonical pages. Serving the same content under multiple URL variations can:
- Split ranking signals between duplicates
- Cause duplicate content penalties
- Waste crawl budget
Use the <link rel="canonical"> tag to declare the preferred URL.
Use Case
URL normalization is essential for web crawlers, caching layers, deduplication systems, and SEO. CDNs and proxy servers use normalized URLs as cache keys. Analytics platforms normalize URLs to accurately count page views. Search engines canonicalize URLs to avoid indexing duplicate content.