Byte Order Mark (BOM) and Its Effect on String Length

Learn how the Byte Order Mark (U+FEFF) affects string length, why it appears at the start of files, and how to detect and handle it in your applications.

Encoding Comparison

Detailed Explanation

The Byte Order Mark (BOM)

The Byte Order Mark (U+FEFF) is a special Unicode character that can appear at the beginning of a text file to indicate the encoding and byte order. It is invisible in most text editors but counts toward string length.

BOM Representation in Different Encodings

Encoding BOM Bytes Hex
UTF-8 3 bytes EF BB BF
UTF-16 BE 2 bytes FE FF
UTF-16 LE 2 bytes FF FE
UTF-32 BE 4 bytes 00 00 FE FF
UTF-32 LE 4 bytes FF FE 00 00

Impact on String Length

When you read a file with a BOM and the BOM is not stripped:

// File content: BOM + "Hello"
const text = "\uFEFFHello";
text.length;           // 6 (not 5!)
text.charCodeAt(0);    // 65279 (U+FEFF)
text[0] === "\uFEFF";  // true

The BOM adds:

  • 1 to .length and code point count
  • 3 bytes to UTF-8 size
  • 2 bytes to UTF-16 size
  • 4 bytes to UTF-32 size

Common BOM Problems

  1. JSON parsing failure: JSON.parse("\uFEFFHello") throws an error because the BOM is not valid JSON
  2. CSV first column corruption: The first field in a CSV file may start with the invisible BOM
  3. HTTP header issues: BOM before PHP <?php tag causes "headers already sent" errors
  4. String comparison failure: "\uFEFFhello" !== "hello"
  5. Hash mismatches: Same visible content produces different hashes with and without BOM

Detection and Removal

// Detect BOM
const hasBOM = str.charCodeAt(0) === 0xFEFF;

// Remove BOM
const clean = str.replace(/^\uFEFF/, "");

When BOM Is Useful

  • UTF-16 files: BOM indicates byte order (big-endian vs little-endian), which is essential
  • Windows Notepad: Saves UTF-8 files with BOM by default (a common source of problems)
  • Excel CSV: Expects UTF-8 BOM to correctly interpret Unicode characters

Best Practice

For UTF-8 files on the web, do not use a BOM. The Unicode standard recommends against it for UTF-8 because UTF-8 has no byte-order ambiguity. If you receive files with a BOM, strip it during processing.

Use Case

When processing text files from different sources (Windows Notepad, Excel exports, legacy systems), detecting and handling the Byte Order Mark prevents parsing errors, hash mismatches, and invisible character issues in data pipelines.

Try It — String Length Calculator

Open full tool