Unicode Normalization in File Systems
Learn how different operating systems and file systems handle Unicode normalization for filenames. Understand the macOS NFD issue and cross-platform filename compatibility.
Detailed Explanation
File System Normalization
Different operating systems handle Unicode filenames differently, which is a common source of cross-platform bugs.
macOS (APFS / HFS+)
macOS file systems normalize filenames:
- HFS+: Forces NFD normalization (actually a variant called "NFD with Apple modifications")
- APFS: Preserves the original form but performs NFD-like normalization for comparison
This means if you create a file named café.txt (NFC) on macOS, the file system may store it as café.txt (NFD). When you read the filename back, you get the decomposed form.
Windows (NTFS)
NTFS stores filenames exactly as provided — no normalization is applied. This means NFC and NFD forms of the same name can coexist as separate files:
café.txt (NFC) → Stored as-is
café.txt (NFD) → Stored as-is (different file!)
Linux (ext4, XFS, Btrfs)
Most Linux file systems, like NTFS, store filenames as raw bytes with no normalization. Different Unicode representations create different files.
Cross-Platform Pitfalls
A common bug pattern:
- Developer on macOS creates
résumé.pdf - macOS stores it as
résumé.pdf(NFD) - File is committed to Git
- Developer on Windows checks out the file
- A script searching for
résumé.pdf(NFC) fails to find it
Best Practice
import path from 'path';
import fs from 'fs';
// Always normalize filenames before comparing
const target = 'résumé.pdf'.normalize('NFC');
const files = fs.readdirSync(dir).map(f => f.normalize('NFC'));
const found = files.includes(target);
Git and Normalization
Git has a core.precomposeUnicode setting (default: true on macOS) that converts NFD filenames to NFC in its index, helping mitigate cross-platform issues.
Use Case
Critical for developers building cross-platform applications, file synchronization tools (like Dropbox or rsync), backup systems, and build tools that must handle filenames consistently across macOS, Windows, and Linux.