ASCII Control Characters in Unicode
Understand ASCII control characters (U+0000 to U+001F) including NULL, TAB, LINE FEED, and CARRIAGE RETURN — their code points, UTF-8 encoding, and roles in text processing.
Detailed Explanation
ASCII Control Characters
The first 32 Unicode code points (U+0000 to U+001F) plus U+007F (DELETE) are control characters inherited from the ASCII standard. These characters are non-printable — they do not render as visible glyphs but instead control how text is processed by terminals, printers, and software.
The Most Common Control Characters
| Code Point | Name | Common Use |
|---|---|---|
| U+0000 | NULL (NUL) | String terminator in C/C++ |
| U+0009 | CHARACTER TABULATION (TAB) | Horizontal tab in text |
| U+000A | LINE FEED (LF) | Newline on Unix/macOS |
| U+000D | CARRIAGE RETURN (CR) | Newline component on Windows (CR+LF) |
| U+001B | ESCAPE (ESC) | Start of ANSI escape sequences |
| U+007F | DELETE (DEL) | Delete character |
UTF-8 Encoding
All ASCII control characters occupy a single byte in UTF-8, with values 0x00 through 0x1F and 0x7F. This one-byte representation means they are indistinguishable from their original ASCII encoding, which is a core design principle of UTF-8.
Why They Matter
Control characters frequently appear in data processing pipelines. A stray NULL byte can truncate strings in C programs. Mixed line endings (LF vs. CR+LF) cause issues when sharing files between operating systems. The ESCAPE character initiates terminal color codes and cursor movement sequences. Understanding these characters is the first step to debugging text encoding issues.
Identifying Hidden Characters
When you paste text into the Unicode Inspector, control characters are displayed with their code point label (e.g. U+000A) rather than an invisible glyph, making them easy to spot. The category column shows "Control" for all characters in this range.
Use Case
Use this when debugging data files that contain unexpected control characters — for example, finding hidden NULL bytes in a CSV export, identifying mixed line endings (LF vs CR+LF) in cross-platform scripts, or detecting stray escape sequences in log files.