Complete Guide to Unicode Whitespace Characters
A comprehensive reference of all Unicode whitespace and invisible characters. Learn their Unicode code points, purposes, and common sources.
Detailed Explanation
Unicode Whitespace Character Reference
Unicode defines many more whitespace and invisible characters than the basic space and newline. Understanding them is essential for robust text processing.
Standard Whitespace Characters
| Character | Unicode | Name | Marker |
|---|---|---|---|
| Space | U+0020 | Space | · |
| Tab | U+0009 | Character Tabulation | → |
| LF | U+000A | Line Feed | ↵ |
| CR | U+000D | Carriage Return | ← |
| NBSP | U+00A0 | No-Break Space | ° |
Zero-Width Characters
| Character | Unicode | Name | Marker |
|---|---|---|---|
| ZWS | U+200B | Zero Width Space | [ZWS] |
| ZWJ | U+200D | Zero Width Joiner | [ZWJ] |
| ZWNJ | U+200C | Zero Width Non-Joiner | [ZWNJ] |
| SHY | U+00AD | Soft Hyphen | [SHY] |
| BOM | U+FEFF | Byte Order Mark / Zero Width No-Break Space | [BOM] |
Other Unicode Spaces (Not in Visualizer)
These are less common but worth knowing about:
| Unicode | Name | Width |
|---|---|---|
| U+2000 | En Quad | Width of letter N |
| U+2001 | Em Quad | Width of letter M |
| U+2002 | En Space | Half an em |
| U+2003 | Em Space | Full em |
| U+2004 | Three-Per-Em Space | 1/3 em |
| U+2005 | Four-Per-Em Space | 1/4 em |
| U+2006 | Six-Per-Em Space | 1/6 em |
| U+2007 | Figure Space | Width of a digit |
| U+2008 | Punctuation Space | Width of a period |
| U+2009 | Thin Space | 1/5 em |
| U+200A | Hair Space | Very thin |
| U+202F | Narrow No-Break Space | Narrow NBSP |
| U+205F | Medium Mathematical Space | 4/18 em |
| U+3000 | Ideographic Space | Full-width CJK space |
How Characters Get Mixed In
- Copy from web: HTML renders various space entities
- Copy from documents: Word processors use typographic spaces
- Multi-language input: Different input methods produce different spaces
- API responses: External data may contain unexpected Unicode
- Database migration: Character encoding conversion can introduce artifacts
Using the Whitespace Visualizer
The tool detects the 11 most commonly problematic characters listed in the first two tables. Paste any suspicious text to immediately see which invisible characters are present, their exact positions, and their counts. Use the Clean feature to selectively remove specific types.
Use Case
A developer building a text processing library needs to handle all types of Unicode whitespace correctly. They use the Whitespace Visualizer as a reference and testing tool to verify their regex patterns correctly identify each whitespace character type.
Try It — Whitespace Visualizer
Related Topics
Find and Remove Zero-Width Spaces (ZWS)
Zero-Width Characters
Detect Zero-Width Joiners and Non-Joiners (ZWJ/ZWNJ)
Zero-Width Characters
Find and Remove Non-Breaking Spaces (NBSP)
Common Characters
Detect and Remove Byte Order Mark (BOM)
Unicode Whitespace
Detect and Remove Soft Hyphens (SHY)
Zero-Width Characters