Hiragana and Katakana — Japanese Syllabaries in Unicode

Explore the Unicode encoding of Japanese Hiragana (U+3040–U+309F) and Katakana (U+30A0–U+30FF) — their code points, 3-byte UTF-8 representation, and usage patterns.

CJK Characters

Detailed Explanation

Hiragana and Katakana in Unicode

Japanese uses two phonetic syllabary scripts alongside Kanji (CJK ideographs): Hiragana for native Japanese words and grammatical elements, and Katakana for foreign loanwords, emphasis, and technical terms.

Unicode Blocks

Script Range Character Count
Hiragana U+3040–U+309F 93 characters
Katakana U+30A0–U+30FF 96 characters
Katakana Phonetic Extensions U+31F0–U+31FF 16 characters

Code Point Examples

Hiragana Code Point Katakana Code Point
あ (a) U+3042 ア (a) U+30A2
い (i) U+3044 イ (i) U+30A4
う (u) U+3046 ウ (u) U+30A6
え (e) U+3048 エ (e) U+30A8
お (o) U+304A オ (o) U+30AA

UTF-8 Encoding

Both Hiragana and Katakana characters fall in the BMP and use 3 bytes in UTF-8:

  • あ (Hiragana A) → UTF-8: E3 81 82
  • ア (Katakana A) → UTF-8: E3 82 A2

The offset between corresponding Hiragana and Katakana characters is exactly 0x0060 (96 in decimal). This consistent offset enables simple script conversion by adding or subtracting 0x60 from the code point.

Voiced and Semi-Voiced Marks

Japanese adds dakuten (゛) and handakuten (゜) to modify consonant sounds. In Unicode, pre-composed forms exist (e.g. が = か + dakuten), but combining marks (U+3099, U+309A) can also be used. The Unicode Inspector reveals whether a character is pre-composed or uses combining marks.

Halfwidth Katakana

Legacy Japanese computing used halfwidth Katakana (U+FF65–U+FF9F), which occupy 3 bytes in UTF-8 despite being visually narrow. Modern systems prefer fullwidth forms, but halfwidth variants still appear in legacy data and fixed-width displays.

Use Case

Use this when building Japanese text processing systems, implementing Hiragana-to-Katakana conversion, validating input fields that accept only specific Japanese scripts, or debugging encoding issues in Japanese text data.

Try It — Unicode Inspector

Open full tool