Hiragana and Katakana — Japanese Syllabaries in Unicode
Explore the Unicode encoding of Japanese Hiragana (U+3040–U+309F) and Katakana (U+30A0–U+30FF) — their code points, 3-byte UTF-8 representation, and usage patterns.
Detailed Explanation
Hiragana and Katakana in Unicode
Japanese uses two phonetic syllabary scripts alongside Kanji (CJK ideographs): Hiragana for native Japanese words and grammatical elements, and Katakana for foreign loanwords, emphasis, and technical terms.
Unicode Blocks
| Script | Range | Character Count |
|---|---|---|
| Hiragana | U+3040–U+309F | 93 characters |
| Katakana | U+30A0–U+30FF | 96 characters |
| Katakana Phonetic Extensions | U+31F0–U+31FF | 16 characters |
Code Point Examples
| Hiragana | Code Point | Katakana | Code Point |
|---|---|---|---|
| あ (a) | U+3042 | ア (a) | U+30A2 |
| い (i) | U+3044 | イ (i) | U+30A4 |
| う (u) | U+3046 | ウ (u) | U+30A6 |
| え (e) | U+3048 | エ (e) | U+30A8 |
| お (o) | U+304A | オ (o) | U+30AA |
UTF-8 Encoding
Both Hiragana and Katakana characters fall in the BMP and use 3 bytes in UTF-8:
- あ (Hiragana A) → UTF-8:
E3 81 82 - ア (Katakana A) → UTF-8:
E3 82 A2
The offset between corresponding Hiragana and Katakana characters is exactly 0x0060 (96 in decimal). This consistent offset enables simple script conversion by adding or subtracting 0x60 from the code point.
Voiced and Semi-Voiced Marks
Japanese adds dakuten (゛) and handakuten (゜) to modify consonant sounds. In Unicode, pre-composed forms exist (e.g. が = か + dakuten), but combining marks (U+3099, U+309A) can also be used. The Unicode Inspector reveals whether a character is pre-composed or uses combining marks.
Halfwidth Katakana
Legacy Japanese computing used halfwidth Katakana (U+FF65–U+FF9F), which occupy 3 bytes in UTF-8 despite being visually narrow. Modern systems prefer fullwidth forms, but halfwidth variants still appear in legacy data and fixed-width displays.
Use Case
Use this when building Japanese text processing systems, implementing Hiragana-to-Katakana conversion, validating input fields that accept only specific Japanese scripts, or debugging encoding issues in Japanese text data.