Zero-Width Characters in Unicode
Discover invisible zero-width Unicode characters including ZWSP, ZWNJ, ZWJ, and Word Joiner — their code points, purposes, and how to detect them in text.
Detailed Explanation
Zero-Width Characters
Zero-width characters are Unicode code points that have no visible rendering but affect text processing, line breaking, and shaping. They are invisible to the naked eye, making them a common source of bugs and security concerns.
Common Zero-Width Characters
| Code Point | Name | Abbreviation | Purpose |
|---|---|---|---|
| U+200B | ZERO WIDTH SPACE | ZWSP | Optional line break opportunity |
| U+200C | ZERO WIDTH NON-JOINER | ZWNJ | Prevents ligature formation |
| U+200D | ZERO WIDTH JOINER | ZWJ | Joins adjacent characters (emoji sequences) |
| U+2060 | WORD JOINER | WJ | Prevents line break (replaces deprecated U+FEFF) |
| U+FEFF | BYTE ORDER MARK | BOM | File encoding marker; legacy ZWNBSP |
| U+200E | LEFT-TO-RIGHT MARK | LRM | Forces LTR directionality |
| U+200F | RIGHT-TO-LEFT MARK | RLM | Forces RTL directionality |
UTF-8 Encoding
All these characters occupy 3 bytes in UTF-8:
- U+200B →
E2 80 8B - U+200C →
E2 80 8C - U+200D →
E2 80 8D - U+FEFF →
EF BB BF
Why They're Problematic
- String comparison: Two strings that look identical may differ by a hidden zero-width character
- Data validation: User input may contain invisible characters that bypass length checks
- Security: Zero-width characters can be used for steganography (hiding messages) or confusable attacks
- Search/indexing: Hidden characters affect search results and database lookups
Detection with the Unicode Inspector
The Unicode Inspector displays zero-width characters with their code point label (e.g. U+200B) instead of rendering nothing, making them immediately visible. The category column shows the appropriate classification, and the byte count reveals the 3-byte overhead each invisible character adds.
Practical Tips
- Use
String.prototype.normalize()to standardize text before comparison - Strip zero-width characters with a regex:
/[\u200B-\u200D\uFEFF\u2060]/g - Check for unexpected zero-width characters when debugging string comparison failures
Use Case
Use this when debugging invisible character issues in user-submitted text, detecting potential steganography or string manipulation attacks, cleaning data imports that contain hidden characters, or understanding why visually identical strings fail equality checks.