Fullwidth and Halfwidth Forms in Unicode

Understand Unicode fullwidth and halfwidth character forms — why they exist, their code points (U+FF00–U+FFEF), 3-byte UTF-8 encoding, and CJK compatibility uses.

Encoding Issues

Detailed Explanation

Fullwidth and Halfwidth Forms

The Halfwidth and Fullwidth Forms block (U+FF00–U+FFEF) contains alternative-width versions of characters from other blocks. These exist primarily for compatibility with East Asian computing, where characters traditionally occupy either a "full" or "half" cell width in monospaced layouts.

Fullwidth Characters

Fullwidth forms are twice the visual width of their standard counterparts, matching the width of CJK ideographs:

Fullwidth Code Point Standard Code Point UTF-8 Bytes
A (A) U+FF21 A U+0041 EF BC A1 vs. 41
0 (0) U+FF10 0 U+0030 EF BC 90 vs. 30
! (!) U+FF01 ! U+0021 EF BC 81 vs. 21
@ (@) U+FF20 @ U+0040 EF BC A0 vs. 40

Halfwidth Characters

Halfwidth forms are narrower versions of normally wide characters:

Halfwidth Code Point Standard Code Point
カ (Katakana Ka) U+FF76 U+30AB
ネ (Katakana Ne) U+FF88 U+30CD
¥ (Yen) U+FFE5 ¥ U+00A5

Why They Exist

In the early days of computing, Japanese text systems used fixed-width displays where each cell could hold either one CJK character (fullwidth) or one ASCII character (halfwidth). To maintain alignment, ASCII characters needed fullwidth variants and Katakana needed halfwidth variants. While modern systems handle variable-width text natively, these legacy characters persist in:

  • Japanese data entry: Some systems still use fullwidth numbers and letters
  • Financial systems: Fullwidth numbers are common in Japanese banking
  • Legacy file formats: Older database systems may store fullwidth data
  • Form validation: Some Japanese websites require fullwidth input

Encoding Impact

Every fullwidth character uses 3 bytes in UTF-8, compared to 1 byte for the ASCII original. This means a string of fullwidth Latin characters uses 3x the storage of its standard equivalent. The Unicode Inspector clearly shows this difference, helping you identify unnecessary fullwidth usage that inflates data size.

Normalization and Conversion

Unicode's NFKC and NFKD normalization forms convert fullwidth characters to their standard equivalents:

"\uFF21".normalize("NFKC") === "A"  // true

This is critical for search indexing and data deduplication in multilingual systems.

Use Case

Use this when normalizing Japanese user input that contains fullwidth Latin characters, debugging data imports from legacy CJK systems, implementing search that treats fullwidth and standard characters as equivalent, or calculating accurate storage requirements for mixed-width text.

Try It — Unicode Inspector

Open full tool