Handling Special Characters in URLs

Learn how to correctly handle special characters like spaces, unicode, emojis, and reserved characters in URL paths, query strings, and fragments without breaking the URL.

Advanced

Detailed Explanation

Special Characters in URLs

URLs are defined to use a limited set of ASCII characters. Any character outside this set — including spaces, international characters, and emojis — must be encoded.

Characters That Break URLs

Character Problem Solution
Space Splits the URL %20 (path) or + (query)
# Starts a fragment %23
? Starts query string %3F
& Separates query params %26
= Key-value separator %3D
% Encoding prefix %25
+ Space in query strings %2B
/ Path separator %2F (in values)

Unicode Characters

Non-ASCII characters are encoded as UTF-8 bytes, then each byte is percent-encoded:

"日本" → UTF-8 bytes: E6 97 A5 E6 9C AC → %E6%97%A5%E6%9C%AC
"café" → UTF-8: 63 61 66 C3 A9 → caf%C3%A9

Internationalized Domain Names (IDN)

Domains with non-ASCII characters use Punycode encoding:

https://例え.jp → https://xn--r8jz45g.jp
https://münchen.de → https://xn--mnchen-3ya.de

Emojis in URLs

Yes, emojis can appear in URLs (encoded):

encodeURIComponent("🚀")  // "%F0%9F%9A%80"

Some modern browsers display decoded unicode in the address bar but transmit the encoded form.

Context-Dependent Encoding

The same character may need different handling depending on where it appears:

Path:   /search/hello world   → /search/hello%20world
Query:  ?q=hello world        → ?q=hello+world (or ?q=hello%20world)
Hash:   #section one          → #section%20one

Safe URL Construction in Practice

// WRONG: String concatenation
const url = "https://api.com/search?q=" + userInput;

// RIGHT: URLSearchParams
const url = new URL("https://api.com/search");
url.searchParams.set("q", userInput);

// RIGHT: encodeURIComponent for path segments
const url = `https://api.com/users/${encodeURIComponent(username)}/profile`;

Common Mistakes

  1. Not encoding user input — injection risk and broken URLs
  2. Using encodeURI() for parameters — does not encode &, =, #
  3. Encoding entire URL — breaks the structure (:// becomes %3A%2F%2F)
  4. Double encoding — encoding already-encoded values

Use Case

Properly handling special characters is critical for building search engines, user-generated content platforms, internationalized applications, and any system that constructs URLs from dynamic data. Incorrect encoding is a frequent source of broken links, XSS vulnerabilities, and failed API requests in production systems.

Try It — URL Parser & Builder

Open full tool