URL Encode Japanese Text

Learn how to URL encode Japanese characters (Hiragana, Katakana, Kanji). Covers UTF-8 multi-byte encoding for CJK characters in URLs.

Character

日本語

Encoded

%E6%97%A5...

Detailed Explanation

Japanese text in URLs is encoded using UTF-8, with each character typically requiring 3 bytes (9 percent-encoded characters). Japanese writing uses three scripts: Hiragana, Katakana, and Kanji (CJK ideographs), all of which fall in the 3-byte UTF-8 range.

Encoding examples:

  • 日 (day/sun, Kanji): UTF-8 bytes E6 97 A5 → %E6%97%A5
  • 本 (book/origin, Kanji): UTF-8 bytes E6 9C AC → %E6%9C%AC
  • 語 (language, Kanji): UTF-8 bytes E8 AA 9E → %E8%AA%9E
  • 日本語 ("Japanese language"): %E6%97%A5%E6%9C%AC%E8%AA%9E

JavaScript behavior:

encodeURIComponent("日本語")     // "%E6%97%A5%E6%9C%AC%E8%AA%9E"
encodeURIComponent("こんにちは") // "%E3%81%93%E3%82%93%E3%81%AB%E3%81%A1%E3%81%AF" (konnichiwa in Hiragana)
encodeURIComponent("トウキョウ") // "%E3%83%88%E3%82%A6%E3%82%AD%E3%83%A7%E3%82%A6" (Tokyo in Katakana)

// Decoding
decodeURIComponent("%E6%97%A5%E6%9C%AC%E8%AA%9E") // "日本語"

URL length impact: Japanese text expands dramatically when percent-encoded. Each character becomes 9 characters (3 bytes, each encoded as %XX). A 20-character Japanese phrase produces 180 characters of encoded text. This is a 9x expansion compared to ASCII text.

Historical encoding issues: Before UTF-8 became the universal standard, Japanese URLs often used Shift-JIS or EUC-JP encoding, producing different byte sequences for the same characters. Modern systems (post-2005) almost universally use UTF-8, but legacy systems may still produce Shift-JIS encoded URLs. If you decode UTF-8 encoded Japanese with a Shift-JIS decoder (or vice versa), you get garbled text known as "mojibake" (文字化け).

Internationalized Domain Names (IDN): Japanese domain names like 日本語.ジェーピー use Punycode encoding (a different system from percent encoding) to convert Unicode domain names into ASCII-compatible form. The domain 日本語.jp becomes xn--wgv71a309e.jp in Punycode.

Pitfall: When building search URLs for Japanese websites, ensure your application uses UTF-8 consistently. Mixing encodings at any point (browser, proxy, server, database) will produce mojibake. Set the charset parameter in HTML forms (<form accept-charset="UTF-8">) and ensure your HTTP Content-Type headers specify UTF-8.

Use Case

Building search URLs for Japanese content, such as querying the Japanese Wikipedia API or creating internationalized e-commerce product URLs.

Try It — URL Encoder

Open full tool