URL Encode Unicode Characters
Learn how to URL encode Unicode characters using UTF-8 percent encoding. Covers multi-byte encoding, international characters, and special symbols.
Character
Unicode
Encoded
%E2%9C%93
Detailed Explanation
Unicode characters beyond the ASCII range (code points above 127) must be encoded in URLs using their UTF-8 byte sequences, with each byte percent-encoded individually. For example, the check mark character (✓) is encoded as %E2%9C%93 because its UTF-8 representation is three bytes: 0xE2, 0x9C, 0x93.
How Unicode URL encoding works:
- Convert the character to its UTF-8 byte sequence
- Percent-encode each byte as
%HHwhere HH is the hexadecimal value
For example, the euro sign (€):
- Unicode code point: U+20AC
- UTF-8 bytes: 0xE2, 0x82, 0xAC
- URL encoded:
%E2%82%AC
JavaScript behavior:
encodeURIComponent("✓") // "%E2%9C%93" (check mark, 3 bytes)
encodeURIComponent("é") // "%C3%A9" (e with accent, 2 bytes)
encodeURIComponent("€") // "%E2%82%AC" (euro sign, 3 bytes)
encodeURIComponent("你好") // "%E4%BD%A0%E5%A5%BD" (Chinese "hello", 3 bytes each)
// Decoding
decodeURIComponent("%E2%9C%93") // "✓"
Byte length by code point range:
- U+0000 to U+007F: 1 byte (ASCII, e.g.,
A=%41) - U+0080 to U+07FF: 2 bytes (e.g.,
é=%C3%A9) - U+0800 to U+FFFF: 3 bytes (e.g.,
✓=%E2%9C%93) - U+10000 to U+10FFFF: 4 bytes (e.g., emojis, see encode-emoji)
Important: The encodeURIComponent() function in JavaScript always uses UTF-8 encoding, which is the standard mandated by RFC 3986. Older systems might use other encodings (like ISO-8859-1 or Shift-JIS), which produce different byte sequences for the same characters. If you encounter garbled text after decoding, an encoding mismatch is likely the cause.
Internationalized Resource Identifiers (IRIs): RFC 3987 defines IRIs as an extension to URIs that allows Unicode characters directly. Modern browsers display Unicode characters in the address bar but transmit the percent-encoded form in HTTP requests. This can cause confusion when copying URLs: the displayed URL may look different from the transmitted one.
Pitfall: URL length limits become a practical concern with Unicode text because each character may expand to 3-12 percent-encoded characters (each UTF-8 byte becomes three characters like %E2). A 100-character Chinese string can expand to around 900 characters when percent-encoded.
Use Case
Building multilingual search URLs or internationalized API requests that include non-ASCII characters such as accented letters, CJK characters, or symbols.