Korean Hangul Unicode Normalization

Understand how Unicode normalization works with Korean Hangul syllables. Learn about Jamo decomposition, Hangul Syllable composition, and the algorithmic decomposition/composition process.

Language-Specific

Detailed Explanation

Hangul and Unicode Normalization

Korean Hangul has a unique relationship with Unicode normalization because its composition and decomposition are defined algorithmically rather than through lookup tables.

Hangul Syllable Structure

A Hangul syllable consists of:

  • Leading consonant (Choseong): e.g., ᄀ (HANGUL CHOSEONG KIYEOK, ㄱ)
  • Vowel (Jungseong): e.g., ᅡ (HANGUL JUNGSEONG A, ㅏ)
  • Optional trailing consonant (Jongseong): e.g., ᆨ (HANGUL JONGSEONG KIYEOK)

Precomposed Syllable Blocks

Unicode defines 11,172 precomposed Hangul syllable blocks (U+AC00 to U+D7A3). The syllable 가 (ᄀ + ᅡ) is U+AC00 (HANGUL SYLLABLE GA).

NFC vs NFD for Hangul

Form Result for 가 Code Points
NFC U+AC00 (1 code point)
NFD 가 U+1100 + U+1161 (2 code points)

Algorithmic Composition

Unlike Latin characters where composition is table-based, Hangul uses a mathematical formula:

SBase = 0xAC00
LBase = 0x1100, VBase = 0x1161, TBase = 0x11A7
LCount = 19, VCount = 21, TCount = 28
NCount = VCount * TCount = 588

syllableIndex = (L - LBase) * NCount + (V - VBase) * TCount + (T - TBase)
composedCodePoint = SBase + syllableIndex

Practical Impact

Most Korean text is already in NFC form (precomposed syllable blocks). However, text from certain input methods or text processing systems may use decomposed Jamo. Without normalization, string comparison and search of Korean text can fail silently.

NFKC and Hangul Compatibility Jamo

Unicode includes "compatibility Jamo" (U+3131–U+3163) that are separate from the composing Jamo. NFKC maps compatibility Jamo to their standard Jamo counterparts.

Use Case

Essential for developers building Korean-language applications, search engines indexing Korean content, and any system processing Korean text from mixed sources (web forms, OCR, file systems). Korean text from macOS filenames uses NFD Jamo, which must be normalized for comparison with standard NFC text.

Try It — Unicode Normalizer

Open full tool