URL Normalization and Canonicalization

Learn how to normalize URLs by resolving differences in encoding, trailing slashes, scheme casing, port omission, and path segments. Essential for caching, deduplication, and SEO.

Advanced

Detailed Explanation

URL Normalization

URL normalization (or canonicalization) is the process of converting a URL into a consistent, standardized format. Multiple different URL strings can point to the same resource, and normalization resolves these differences.

Why Normalize?

Different URLs can reference the same resource:

https://EXAMPLE.COM/path          → https://example.com/path
https://example.com:443/path      → https://example.com/path
https://example.com/path/         → https://example.com/path
https://example.com/a/../b/page   → https://example.com/b/page
https://example.com/path?b=2&a=1  → https://example.com/path?a=1&b=2
https://example.com/%7Euser       → https://example.com/~user

Common Normalization Steps

  1. Lowercase the scheme and host

    HTTP://Example.COM → http://example.com
    
  2. Remove default port

    https://example.com:443 → https://example.com
    http://example.com:80 → http://example.com
    
  3. Remove dot segments

    /a/b/../c/./d → /a/c/d
    
  4. Decode unreserved characters

    /%7Euser → /~user
    /caf%C3%A9 → /café (optional, depends on context)
    
  5. Normalize empty path

    https://example.com → https://example.com/
    
  6. Sort query parameters (optional, context-dependent)

    ?z=3&a=1&m=2 → ?a=1&m=2&z=3
    
  7. Remove trailing slash (or always add one — pick a convention)

    /path/ → /path (or vice versa)
    

Implementation

function normalizeUrl(urlString) {
  const url = new URL(urlString);

  // Scheme and host are already lowercased by URL constructor
  // Default port is already removed by URL constructor

  // Remove dot segments (handled by URL constructor)
  // Remove trailing slash from path (optional)
  if (url.pathname.length > 1 && url.pathname.endsWith("/")) {
    url.pathname = url.pathname.slice(0, -1);
  }

  // Sort query parameters
  url.searchParams.sort();

  // Remove empty hash
  if (url.hash === "#") url.hash = "";

  return url.toString();
}

SEO Impact

Search engines normalize URLs to determine canonical pages. Serving the same content under multiple URL variations can:

  • Split ranking signals between duplicates
  • Cause duplicate content penalties
  • Waste crawl budget

Use the <link rel="canonical"> tag to declare the preferred URL.

Use Case

URL normalization is essential for web crawlers, caching layers, deduplication systems, and SEO. CDNs and proxy servers use normalized URLs as cache keys. Analytics platforms normalize URLs to accurately count page views. Search engines canonicalize URLs to avoid indexing duplicate content.

Try It — URL Parser & Builder

Open full tool