URL Encode Unicode Characters

Learn how to URL encode Unicode characters using UTF-8 percent encoding. Covers multi-byte encoding, international characters, and special symbols.

Character

Unicode

Encoded

%E2%9C%93

Detailed Explanation

Unicode characters beyond the ASCII range (code points above 127) must be encoded in URLs using their UTF-8 byte sequences, with each byte percent-encoded individually. For example, the check mark character (✓) is encoded as %E2%9C%93 because its UTF-8 representation is three bytes: 0xE2, 0x9C, 0x93.

How Unicode URL encoding works:

Convert the character to its UTF-8 byte sequence
Percent-encode each byte as %HH where HH is the hexadecimal value

For example, the euro sign (€):

Unicode code point: U+20AC
UTF-8 bytes: 0xE2, 0x82, 0xAC
URL encoded: %E2%82%AC

JavaScript behavior:

encodeURIComponent("✓")     // "%E2%9C%93" (check mark, 3 bytes)
encodeURIComponent("é")     // "%C3%A9" (e with accent, 2 bytes)
encodeURIComponent("€")     // "%E2%82%AC" (euro sign, 3 bytes)
encodeURIComponent("你好")   // "%E4%BD%A0%E5%A5%BD" (Chinese "hello", 3 bytes each)

// Decoding
decodeURIComponent("%E2%9C%93") // "✓"

Byte length by code point range:

U+0000 to U+007F: 1 byte (ASCII, e.g., A = %41)
U+0080 to U+07FF: 2 bytes (e.g., é = %C3%A9)
U+0800 to U+FFFF: 3 bytes (e.g., ✓ = %E2%9C%93)
U+10000 to U+10FFFF: 4 bytes (e.g., emojis, see encode-emoji)

Important: The encodeURIComponent() function in JavaScript always uses UTF-8 encoding, which is the standard mandated by RFC 3986. Older systems might use other encodings (like ISO-8859-1 or Shift-JIS), which produce different byte sequences for the same characters. If you encounter garbled text after decoding, an encoding mismatch is likely the cause.

Internationalized Resource Identifiers (IRIs): RFC 3987 defines IRIs as an extension to URIs that allows Unicode characters directly. Modern browsers display Unicode characters in the address bar but transmit the percent-encoded form in HTTP requests. This can cause confusion when copying URLs: the displayed URL may look different from the transmitted one.

Pitfall: URL length limits become a practical concern with Unicode text because each character may expand to 3-12 percent-encoded characters (each UTF-8 byte becomes three characters like %E2). A 100-character Chinese string can expand to around 900 characters when percent-encoded.

Use Case

Building multilingual search URLs or internationalized API requests that include non-ASCII characters such as accented letters, CJK characters, or symbols.

Try It — URL Encoder

Open full tool →

URL Encode Unicode Characters

Detailed Explanation

Use Case

Try It — URL Encoder

Related Topics