Hangul Syllables — Korean Characters in Unicode
Understand Korean Hangul syllables in Unicode (U+AC00–U+D7AF) — their algorithmic decomposition, 3-byte UTF-8 encoding, and the Jamo composition system.
Detailed Explanation
Hangul Syllables in Unicode
Korean Hangul is unique among writing systems because its syllable blocks are algorithmically composed from individual letter components called Jamo. Unicode dedicates a massive block of 11,172 pre-composed syllables (U+AC00 to U+D7AF) covering every possible combination.
Hangul Composition Formula
Each syllable block consists of:
- Leading consonant (L): 19 possible values (U+1100–U+1112)
- Vowel (V): 21 possible values (U+1161–U+1175)
- Trailing consonant (T): 28 possible values (including none)
The code point is calculated as:
CP = 0xAC00 + (L × 21 + V) × 28 + T
For example, 한 (han) = 0xAC00 + (18 × 21 + 0) × 28 + 4 = U+D55C
UTF-8 Encoding
All pre-composed Hangul syllables use 3 bytes in UTF-8:
- 가 (ga) = U+AC00 → UTF-8:
EA B0 80 - 한 (han) = U+D55C → UTF-8:
ED 95 9C - 글 (geul) = U+AE00 → UTF-8:
EA B8 80
Jamo vs. Pre-composed
Unicode supports both individual Jamo characters (Hangul Jamo block U+1100–U+11FF) and pre-composed syllables. Most modern Korean text uses pre-composed forms for efficiency. The Hangul Compatibility Jamo block (U+3130–U+318F) provides standalone forms used in dictionaries and educational materials.
Normalization Considerations
Unicode normalization forms NFC and NFD produce different representations for Hangul:
- NFC: Uses pre-composed syllable (1 code point, 3 UTF-8 bytes)
- NFD: Decomposes into 2–3 Jamo (2–3 code points, 6–9 UTF-8 bytes)
The Unicode Inspector shows the actual code points in your text, helping you determine which normalization form is being used.
Use Case
Use this when working with Korean text processing, implementing Hangul search or sorting algorithms, debugging normalization issues (NFC vs NFD), or calculating storage requirements for Korean content databases.
Try It — Unicode Inspector
Related Topics
CJK Unified Ideographs — Chinese, Japanese, Korean Characters
CJK Characters
Hiragana and Katakana — Japanese Syllabaries in Unicode
CJK Characters
Fullwidth and Halfwidth Forms in Unicode
Encoding Issues
Basic Latin Alphabet — A to Z in Unicode
Basic Characters
Byte Order Mark (BOM) and Encoding Markers in Unicode
Encoding Issues