Zero-Width Characters in Unicode

Q: Zero-Width Characters in Unicode

## Zero-Width Characters Zero-width characters are Unicode code points that have no visible rendering but affect text processing, line breaking, and shaping. They are invisible to the naked eye, making them a common source of bugs and security concerns. ### Common Zero-Width Characters | Code Point | Name | Abbreviation | Purpose | |------------|------|-------------|---------| | U+200B | ZERO WIDTH SPACE | ZWSP | Optional line break opportunity | | U+200C | ZERO WIDTH NON-JOINER | ZWNJ | Prev

Discover invisible zero-width Unicode characters including ZWSP, ZWNJ, ZWJ, and Word Joiner — their code points, purposes, and how to detect them in text.

Special Characters

Detailed Explanation

Zero-Width Characters

Zero-width characters are Unicode code points that have no visible rendering but affect text processing, line breaking, and shaping. They are invisible to the naked eye, making them a common source of bugs and security concerns.

Common Zero-Width Characters

Code Point	Name	Abbreviation	Purpose
U+200B	ZERO WIDTH SPACE	ZWSP	Optional line break opportunity
U+200C	ZERO WIDTH NON-JOINER	ZWNJ	Prevents ligature formation
U+200D	ZERO WIDTH JOINER	ZWJ	Joins adjacent characters (emoji sequences)
U+2060	WORD JOINER	WJ	Prevents line break (replaces deprecated U+FEFF)
U+FEFF	BYTE ORDER MARK	BOM	File encoding marker; legacy ZWNBSP
U+200E	LEFT-TO-RIGHT MARK	LRM	Forces LTR directionality
U+200F	RIGHT-TO-LEFT MARK	RLM	Forces RTL directionality

UTF-8 Encoding

All these characters occupy 3 bytes in UTF-8:

U+200B → E2 80 8B
U+200C → E2 80 8C
U+200D → E2 80 8D
U+FEFF → EF BB BF

Why They're Problematic

String comparison: Two strings that look identical may differ by a hidden zero-width character
Data validation: User input may contain invisible characters that bypass length checks
Security: Zero-width characters can be used for steganography (hiding messages) or confusable attacks
Search/indexing: Hidden characters affect search results and database lookups

Detection with the Unicode Inspector

The Unicode Inspector displays zero-width characters with their code point label (e.g. U+200B) instead of rendering nothing, making them immediately visible. The category column shows the appropriate classification, and the byte count reveals the 3-byte overhead each invisible character adds.

Practical Tips

Use String.prototype.normalize() to standardize text before comparison
Strip zero-width characters with a regex: /[\u200B-\u200D\uFEFF\u2060]/g
Check for unexpected zero-width characters when debugging string comparison failures

Use Case

Use this when debugging invisible character issues in user-submitted text, detecting potential steganography or string manipulation attacks, cleaning data imports that contain hidden characters, or understanding why visually identical strings fail equality checks.

Try It — Unicode Inspector

Open full tool →