URL Encode Japanese Text
Learn how to URL encode Japanese characters (Hiragana, Katakana, Kanji). Covers UTF-8 multi-byte encoding for CJK characters in URLs.
Character
日本語
Encoded
%E6%97%A5...
Detailed Explanation
Japanese text in URLs is encoded using UTF-8, with each character typically requiring 3 bytes (9 percent-encoded characters). Japanese writing uses three scripts: Hiragana, Katakana, and Kanji (CJK ideographs), all of which fall in the 3-byte UTF-8 range.
Encoding examples:
- 日 (day/sun, Kanji): UTF-8 bytes E6 97 A5 →
%E6%97%A5 - 本 (book/origin, Kanji): UTF-8 bytes E6 9C AC →
%E6%9C%AC - 語 (language, Kanji): UTF-8 bytes E8 AA 9E →
%E8%AA%9E - 日本語 ("Japanese language"):
%E6%97%A5%E6%9C%AC%E8%AA%9E
JavaScript behavior:
encodeURIComponent("日本語") // "%E6%97%A5%E6%9C%AC%E8%AA%9E"
encodeURIComponent("こんにちは") // "%E3%81%93%E3%82%93%E3%81%AB%E3%81%A1%E3%81%AF" (konnichiwa in Hiragana)
encodeURIComponent("トウキョウ") // "%E3%83%88%E3%82%A6%E3%82%AD%E3%83%A7%E3%82%A6" (Tokyo in Katakana)
// Decoding
decodeURIComponent("%E6%97%A5%E6%9C%AC%E8%AA%9E") // "日本語"
URL length impact: Japanese text expands dramatically when percent-encoded. Each character becomes 9 characters (3 bytes, each encoded as %XX). A 20-character Japanese phrase produces 180 characters of encoded text. This is a 9x expansion compared to ASCII text.
Historical encoding issues: Before UTF-8 became the universal standard, Japanese URLs often used Shift-JIS or EUC-JP encoding, producing different byte sequences for the same characters. Modern systems (post-2005) almost universally use UTF-8, but legacy systems may still produce Shift-JIS encoded URLs. If you decode UTF-8 encoded Japanese with a Shift-JIS decoder (or vice versa), you get garbled text known as "mojibake" (文字化け).
Internationalized Domain Names (IDN): Japanese domain names like 日本語.ジェーピー use Punycode encoding (a different system from percent encoding) to convert Unicode domain names into ASCII-compatible form. The domain 日本語.jp becomes xn--wgv71a309e.jp in Punycode.
Pitfall: When building search URLs for Japanese websites, ensure your application uses UTF-8 consistently. Mixing encodings at any point (browser, proxy, server, database) will produce mojibake. Set the charset parameter in HTML forms (<form accept-charset="UTF-8">) and ensure your HTTP Content-Type headers specify UTF-8.
Use Case
Building search URLs for Japanese content, such as querying the Japanese Wikipedia API or creating internationalized e-commerce product URLs.