Fullwidth and Halfwidth Forms in Unicode
Understand Unicode fullwidth and halfwidth character forms — why they exist, their code points (U+FF00–U+FFEF), 3-byte UTF-8 encoding, and CJK compatibility uses.
Detailed Explanation
Fullwidth and Halfwidth Forms
The Halfwidth and Fullwidth Forms block (U+FF00–U+FFEF) contains alternative-width versions of characters from other blocks. These exist primarily for compatibility with East Asian computing, where characters traditionally occupy either a "full" or "half" cell width in monospaced layouts.
Fullwidth Characters
Fullwidth forms are twice the visual width of their standard counterparts, matching the width of CJK ideographs:
| Fullwidth | Code Point | Standard | Code Point | UTF-8 Bytes |
|---|---|---|---|---|
| A (A) | U+FF21 | A | U+0041 | EF BC A1 vs. 41 |
| 0 (0) | U+FF10 | 0 | U+0030 | EF BC 90 vs. 30 |
| ! (!) | U+FF01 | ! | U+0021 | EF BC 81 vs. 21 |
| @ (@) | U+FF20 | @ | U+0040 | EF BC A0 vs. 40 |
Halfwidth Characters
Halfwidth forms are narrower versions of normally wide characters:
| Halfwidth | Code Point | Standard | Code Point |
|---|---|---|---|
| カ (Katakana Ka) | U+FF76 | カ | U+30AB |
| ネ (Katakana Ne) | U+FF88 | ネ | U+30CD |
| ¥ (Yen) | U+FFE5 | ¥ | U+00A5 |
Why They Exist
In the early days of computing, Japanese text systems used fixed-width displays where each cell could hold either one CJK character (fullwidth) or one ASCII character (halfwidth). To maintain alignment, ASCII characters needed fullwidth variants and Katakana needed halfwidth variants. While modern systems handle variable-width text natively, these legacy characters persist in:
- Japanese data entry: Some systems still use fullwidth numbers and letters
- Financial systems: Fullwidth numbers are common in Japanese banking
- Legacy file formats: Older database systems may store fullwidth data
- Form validation: Some Japanese websites require fullwidth input
Encoding Impact
Every fullwidth character uses 3 bytes in UTF-8, compared to 1 byte for the ASCII original. This means a string of fullwidth Latin characters uses 3x the storage of its standard equivalent. The Unicode Inspector clearly shows this difference, helping you identify unnecessary fullwidth usage that inflates data size.
Normalization and Conversion
Unicode's NFKC and NFKD normalization forms convert fullwidth characters to their standard equivalents:
"\uFF21".normalize("NFKC") === "A" // true
This is critical for search indexing and data deduplication in multilingual systems.
Use Case
Use this when normalizing Japanese user input that contains fullwidth Latin characters, debugging data imports from legacy CJK systems, implementing search that treats fullwidth and standard characters as equivalent, or calculating accurate storage requirements for mixed-width text.
Try It — Unicode Inspector
Related Topics
CJK Unified Ideographs — Chinese, Japanese, Korean Characters
CJK Characters
Hiragana and Katakana — Japanese Syllabaries in Unicode
CJK Characters
Hangul Syllables — Korean Characters in Unicode
CJK Characters
Basic Latin Alphabet — A to Z in Unicode
Basic Characters
Latin Accented Characters — Diacritics in Unicode
Encoding Issues