Unicode Normalization in File Systems

Learn how different operating systems and file systems handle Unicode normalization for filenames. Understand the macOS NFD issue and cross-platform filename compatibility.

Use Cases

Detailed Explanation

File System Normalization

Different operating systems handle Unicode filenames differently, which is a common source of cross-platform bugs.

macOS (APFS / HFS+)

macOS file systems normalize filenames:

  • HFS+: Forces NFD normalization (actually a variant called "NFD with Apple modifications")
  • APFS: Preserves the original form but performs NFD-like normalization for comparison

This means if you create a file named café.txt (NFC) on macOS, the file system may store it as café.txt (NFD). When you read the filename back, you get the decomposed form.

Windows (NTFS)

NTFS stores filenames exactly as provided — no normalization is applied. This means NFC and NFD forms of the same name can coexist as separate files:

café.txt  (NFC)  →  Stored as-is
café.txt (NFD)  →  Stored as-is (different file!)

Linux (ext4, XFS, Btrfs)

Most Linux file systems, like NTFS, store filenames as raw bytes with no normalization. Different Unicode representations create different files.

Cross-Platform Pitfalls

A common bug pattern:

  1. Developer on macOS creates résumé.pdf
  2. macOS stores it as résumé.pdf (NFD)
  3. File is committed to Git
  4. Developer on Windows checks out the file
  5. A script searching for résumé.pdf (NFC) fails to find it

Best Practice

import path from 'path';
import fs from 'fs';

// Always normalize filenames before comparing
const target = 'résumé.pdf'.normalize('NFC');
const files = fs.readdirSync(dir).map(f => f.normalize('NFC'));
const found = files.includes(target);

Git and Normalization

Git has a core.precomposeUnicode setting (default: true on macOS) that converts NFD filenames to NFC in its index, helping mitigate cross-platform issues.

Use Case

Critical for developers building cross-platform applications, file synchronization tools (like Dropbox or rsync), backup systems, and build tools that must handle filenames consistently across macOS, Windows, and Linux.

Try It — Unicode Normalizer

Open full tool