Detecting and Fixing Duplicate Keys in YAML
Learn how duplicate keys in YAML cause silent data loss. Understand how different parsers handle duplicates, how to detect them with validators, and best practices for avoiding this common pitfall.
Detailed Explanation
Duplicate Keys in YAML
Duplicate keys in YAML mappings are one of the most dangerous YAML pitfalls because they are technically valid according to the YAML 1.1 specification but cause silent data loss — the last occurrence silently overwrites previous ones.
The Problem
database:
host: primary.db.example.com
port: 5432
host: replica.db.example.com # Silently overwrites first 'host'
After parsing, the host value will be replica.db.example.com with no error or warning from most parsers.
Why Duplicates Occur
- Large files — In files with hundreds of lines, it is easy to add a key without realizing it already exists elsewhere in the same mapping
- Merge conflicts — Git merges can produce duplicate keys when both branches modify the same YAML section
- Copy-paste errors — Duplicating a block and forgetting to update key names
- Team collaboration — Multiple people adding keys to the same section without coordination
YAML Spec Versions
- YAML 1.1 — Duplicate keys produce a warning but are valid; last value wins
- YAML 1.2 — Duplicate keys are an error; documents with duplicates are invalid
Most parsers (including PyYAML, js-yaml, and SnakeYAML) follow YAML 1.1 behavior and silently accept duplicates.
Detection Methods
- Strict-mode parsers — Some parsers offer a strict mode that rejects duplicates
- Linters — Tools like
yamllinthave aforbid-duplicated-keysrule - Online validators — YAML validators can flag duplicate keys with their line numbers
- IDE plugins — VS Code YAML extensions highlight duplicate keys in real time
Prevention Strategies
- Use a YAML linter in your CI pipeline that rejects duplicate keys
- Configure your editor to warn about duplicates
- For large configuration files, consider splitting into multiple smaller files
- Use anchors and aliases instead of duplicating entire blocks
- Always review YAML diffs carefully during merge conflict resolution
Nested Duplicates
Duplicate keys are only an issue within the same mapping level. The same key name at different nesting levels is perfectly valid:
# This is fine - different mapping levels
production:
host: prod.example.com
staging:
host: staging.example.com
Use Case
Duplicate key detection is essential for teams managing large configuration files. In a 500-line Kubernetes manifest or Ansible inventory, a duplicate key from a bad merge can silently change application behavior. Integrating a duplicate key check into CI prevents production incidents caused by configuration errors that parsers silently accept.