UTF-8 BOM — Byte Order Mark (EF BB BF)

Q: What is UTF-8 BOM — Byte Order Mark (EF BB BF)?

The UTF-8 Byte Order Mark (BOM) is a three-byte sequence — EF BB BF — that can appear at the very beginning of a text file to signal that the file is encoded in UTF-8. While well-intentioned, the UTF-8 BOM frequently causes bugs and is generally considered unnecessary by modern standards. What is the BOM? The BOM is Unicode character U+FEFF (Zero Width No-Break Space). In UTF-8, this character encodes to three bytes: EF BB BF. It was originally designed for UTF-16 encoding, where it serves the

Q: When is this useful?

Developers encounter BOM issues when debugging PHP 'headers already sent' errors, resolving shell script execution failures, or troubleshooting JSON parsing problems in files created by Windows applications.

Understand the UTF-8 Byte Order Mark (BOM) bytes EF BB BF and when they appear in files. Learn why BOM causes issues in PHP, shell scripts, and JSON parsing.

Encoding

Hex

EF BB BF

ASCII

(invisible BOM)

Detailed Explanation

The UTF-8 Byte Order Mark (BOM) is a three-byte sequence — EF BB BF — that can appear at the very beginning of a text file to signal that the file is encoded in UTF-8. While well-intentioned, the UTF-8 BOM frequently causes bugs and is generally considered unnecessary by modern standards.

What is the BOM?

The BOM is Unicode character U+FEFF (Zero Width No-Break Space). In UTF-8, this character encodes to three bytes: EF BB BF. It was originally designed for UTF-16 encoding, where it serves the essential purpose of indicating byte order — whether the file uses big-endian (FE FF) or little-endian (FF FE) byte ordering. In UTF-8, there is no byte-order ambiguity (bytes are always in the same order), so the BOM serves only as an encoding identifier.

BOMs across Unicode encodings:

Encoding	BOM (Hex)	Size
UTF-8	`EF BB BF`	3 bytes
UTF-16 Big Endian	`FE FF`	2 bytes
UTF-16 Little Endian	`FF FE`	2 bytes
UTF-32 Big Endian	`00 00 FE FF`	4 bytes
UTF-32 Little Endian	`FF FE 00 00`	4 bytes

Problems caused by the UTF-8 BOM:

PHP — A BOM before the <?php tag causes "headers already sent" errors because the three BOM bytes are output before any HTTP headers can be set
Shell scripts — A BOM before #!/bin/bash makes the shebang line unrecognizable, and the script fails to execute
JSON — The JSON specification (RFC 8259) states that JSON text must not begin with a BOM, though many parsers tolerate it
CSV imports — Some spreadsheet programs mishandle BOM-prefixed CSV files, either displaying the BOM as garbled characters or misinterpreting the first field
Concatenation — When combining multiple BOM-prefixed files, the BOM appears in the middle of the output, potentially causing parsing errors

When the BOM is useful:

On Windows, many programs (especially Notepad and older Microsoft tools) rely on the BOM to auto-detect UTF-8 encoding. Without it, they may misinterpret the file as the legacy Windows-1252 encoding. If you are creating files specifically for consumption by Windows tools, including the BOM may prevent encoding detection failures.

Detecting and removing the BOM:

In a hex editor, the BOM is immediately visible at offset 0: EF BB BF. If the file starts with these three bytes and you are experiencing any of the above issues, simply delete them. On the command line, you can detect and strip the BOM using sed, awk, or specialized tools. Many modern editors offer an explicit option to save "UTF-8 without BOM."

The Unicode Consortium's recommendation:

The Unicode standard neither requires nor recommends the use of BOM for UTF-8. It states that the BOM is permitted but not necessary, and its use at the beginning of a data stream may be treated as zero-width no-break space by applications that do not interpret it as a signature.

Use Case

Developers encounter BOM issues when debugging PHP 'headers already sent' errors, resolving shell script execution failures, or troubleshooting JSON parsing problems in files created by Windows applications.

Try It — Hex Editor

Open full tool →

UTF-8 BOM — Byte Order Mark (EF BB BF)

Hex

ASCII

Detailed Explanation

Use Case

Try It — Hex Editor

Related Topics