Combining Characters and Diacritical Marks

Understand how combining diacritical marks create visual characters from multiple code points, and why grapheme cluster count differs from code point count.

Multi-byte Characters

Detailed Explanation

Combining Characters: Multiple Code Points, One Visual Character

Unicode allows characters to be composed from a base character plus one or more combining marks. The result looks like a single character but consists of multiple code points.

Example: é Two Ways

Precomposed (NFC):

é  →  U+00E9 (1 code point, 2 UTF-8 bytes)

Decomposed (NFD):

é  →  U+0065 + U+0301 (2 code points, 3 UTF-8 bytes)

Both render identically as é, but:

Metric Precomposed Decomposed
.length 1 2
Code points 1 2
Grapheme clusters 1 1
UTF-8 bytes 2 3

Stacked Combining Marks

You can stack multiple combining marks on a single base character:

à́̂  →  a + grave + acute + circumflex

This creates one grapheme cluster from 4 code points. JavaScript's .length returns 4, but visually it is one character.

Zalgo Text

"Zalgo text" exploits combining marks by stacking dozens of them:

H̶̺̘e̸͈l̷̙l̶̽o̵͓

Each visible letter may have 2-3 combining marks, dramatically inflating the code point count while the grapheme count stays relatively low. The String Length Calculator's grapheme breakdown reveals exactly which combining marks are attached to each base character.

Practical Impact

  1. String truncation: Cutting a string at a fixed code point count may split a combining sequence, producing garbled output. Always truncate at grapheme boundaries.
  2. Input validation: A "5 character" limit should count grapheme clusters, not code points, to avoid rejecting valid text like "é" in decomposed form.
  3. String comparison: "café" (NFC) and "café" (NFD) look identical but are different byte sequences. Normalize before comparing.

Use Case

When building text editors, input validators, or search functionality that handles international text, understanding combining characters prevents bugs like broken truncation, inconsistent search results, and incorrect character counting for user-facing limits.

Try It — String Length Calculator

Open full tool