Question 1

Japanese Kana and Unicode Normalization

Accepted Answer

## Japanese Kana and Normalization

Japanese text involves several Unicode normalization scenarios, primarily around voiced/semi-voiced marks (dakuten/handakuten) and halfwidth/fullwidth forms.

### Dakuten and Handakuten

Voiced (゙ dakuten) and semi-voiced (゚ handakuten) marks can be either:
- Precomposed: が (ga = U+304C, single code point)
- Decomposed: か + ゙ (ka + combining dakuten, two code points)

| Character | NFC | NFD |
|-----------|-----|-----|
| が (ga) | U+304C | U+304B + U+3099 |
| だ

Question 2

When is this useful?

Accepted Answer

Critical for Japanese text processing, search engines serving Japanese users, e-commerce platforms handling Japanese product names, and any system processing Japanese text from legacy systems that may contain halfwidth Katakana. Also important for OCR output normalization.

Halfwidth	Fullwidth	NFKC Result
ｶ (ka)	カ	カ
ｶﾞ (ga)	ガ	ガ

Japanese Kana and Unicode Normalization

Detailed Explanation

Japanese Kana and Normalization

Dakuten and Handakuten

Halfwidth vs Fullwidth Katakana

Practical Example

Why This Matters

CJK Compatibility Ideographs

Use Case

Try It — Unicode Normalizer

Related Topics

Character	NFC	NFD
が (ga)	U+304C	U+304B + U+3099
だ (da)	U+3060	U+305F + U+3099
ぱ (pa)	U+3071	U+306F + U+309A