Precomposed vs Decomposed Unicode Characters

Understand the difference between precomposed (single code point) and decomposed (base + combining mark) Unicode characters, and how normalization converts between them.

Character Types

Detailed Explanation

Precomposed vs Decomposed Characters

Unicode provides two ways to represent many accented characters: as a single precomposed code point, or as a decomposed sequence of base character plus combining marks.

Examples of Both Representations

Character Precomposed Decomposed
é U+00E9 (1 code point) U+0065 + U+0301 (2 code points)
ö U+00F6 (1 code point) U+006F + U+0308 (2 code points)
ç U+00E7 (1 code point) U+0063 + U+0327 (2 code points)
Å U+00C5 (1 code point) U+0041 + U+030A (2 code points)

Where Each Form Comes From

Precomposed characters typically come from:

  • Direct keyboard input on Windows and Linux
  • Copy-paste from legacy encodings (ISO-8859-1, Windows-1252)
  • NFC normalization

Decomposed characters typically come from:

  • macOS file system (APFS/HFS+ uses NFD)
  • Some input methods on mobile devices
  • Text generated by certain programming libraries
  • NFD normalization

The String Length Problem

"é".length           // 1 (precomposed)
"é".length          // 2 (decomposed)
"é" === "é"    // false!
"é".normalize("NFC") === "é".normalize("NFC")  // true

This is a common source of bugs: string.length gives different results depending on which form is used, even though both represent the same visible character.

Which to Choose?

NFC (precomposed) is generally preferred because:

  • Shorter byte representation
  • More compatible with legacy systems
  • W3C recommendation for web content
  • More predictable string.length behavior

Use Case

Directly relevant to anyone building text processing, search, or file handling systems. macOS developers frequently encounter this issue because the file system returns NFD-normalized filenames, while files created on Windows use NFC. Cross-platform applications must handle both forms.

Try It — Unicode Normalizer

Open full tool