Detect and Remove Byte Order Mark (BOM)
Find the invisible BOM character (U+FEFF) at the start of text files. Remove it to prevent issues with shell scripts, JSON, PHP, and HTTP responses.
Detailed Explanation
The Byte Order Mark (BOM)
The Byte Order Mark (BOM, U+FEFF) is a Unicode character that was originally designed to indicate the byte order (endianness) of a text stream for UTF-16 and UTF-32 encodings. For UTF-8, the BOM is unnecessary since UTF-8 has a fixed byte order, but many Windows tools still add it.
Where BOMs Come From
- Notepad (Windows): Prior to Windows 10 version 1903, Notepad saved UTF-8 files with a BOM by default.
- Excel CSV export: Excel often adds a BOM when exporting to UTF-8 CSV.
- PowerShell: The
Out-Filecmdlet writes UTF-8 with BOM by default. - Visual Studio: Some VS configurations add BOM to new files.
- Text editors: Some editors add BOM when "saving as UTF-8".
How the Visualizer Shows BOM
The BOM appears as an orange [BOM] marker, typically at the very beginning of the text. The statistics panel shows the count (usually 0 or 1, but concatenated files may have multiple).
Problems Caused by BOM
| Context | Problem |
|---|---|
| Shell scripts | #!/bin/bash becomes \uFEFF#!/bin/bash, causing "command not found" |
| JSON | JSON spec forbids BOM; parsers may reject the file or include BOM in the first key |
| PHP | BOM before <?php sends output before headers, breaking session and redirect |
| HTTP | BOM in response body can break JSON APIs and XML parsing |
| CSV | First column header includes invisible BOM, causing lookup failures |
| YAML | BOM can confuse YAML parsers or appear in string values |
| Concatenation | Concatenating files with BOM produces BOMs in the middle of the result |
Removing BOM
- Paste your file content into the Whitespace Visualizer.
- Look for [BOM] at the very start of the visualization.
- In the Clean section, enable BOM and click Clean.
- The BOM is stripped and the remaining text is unchanged.
Prevention
Configure your editor to save UTF-8 without BOM:
- VS Code:
"files.encoding": "utf8"(default, no BOM) - Notepad++: Encoding > UTF-8 (not "UTF-8 BOM")
- Vim:
:set nobomb
Use Case
A PHP developer's session management suddenly breaks after a colleague edits a config file on Windows. Headers are already sent before session_start(). The Whitespace Visualizer reveals a BOM at the beginning of the PHP file, added by Notepad. Removing the BOM fixes the session issue.