Detecting and Fixing Duplicate Keys in YAML

Learn how duplicate keys in YAML cause silent data loss. Understand how different parsers handle duplicates, how to detect them with validators, and best practices for avoiding this common pitfall.

Validation

Detailed Explanation

Duplicate Keys in YAML

Duplicate keys in YAML mappings are one of the most dangerous YAML pitfalls because they are technically valid according to the YAML 1.1 specification but cause silent data loss — the last occurrence silently overwrites previous ones.

The Problem

database:
  host: primary.db.example.com
  port: 5432
  host: replica.db.example.com    # Silently overwrites first 'host'

After parsing, the host value will be replica.db.example.com with no error or warning from most parsers.

Why Duplicates Occur

  1. Large files — In files with hundreds of lines, it is easy to add a key without realizing it already exists elsewhere in the same mapping
  2. Merge conflicts — Git merges can produce duplicate keys when both branches modify the same YAML section
  3. Copy-paste errors — Duplicating a block and forgetting to update key names
  4. Team collaboration — Multiple people adding keys to the same section without coordination

YAML Spec Versions

  • YAML 1.1 — Duplicate keys produce a warning but are valid; last value wins
  • YAML 1.2 — Duplicate keys are an error; documents with duplicates are invalid

Most parsers (including PyYAML, js-yaml, and SnakeYAML) follow YAML 1.1 behavior and silently accept duplicates.

Detection Methods

  1. Strict-mode parsers — Some parsers offer a strict mode that rejects duplicates
  2. Linters — Tools like yamllint have a forbid-duplicated-keys rule
  3. Online validators — YAML validators can flag duplicate keys with their line numbers
  4. IDE plugins — VS Code YAML extensions highlight duplicate keys in real time

Prevention Strategies

  • Use a YAML linter in your CI pipeline that rejects duplicate keys
  • Configure your editor to warn about duplicates
  • For large configuration files, consider splitting into multiple smaller files
  • Use anchors and aliases instead of duplicating entire blocks
  • Always review YAML diffs carefully during merge conflict resolution

Nested Duplicates

Duplicate keys are only an issue within the same mapping level. The same key name at different nesting levels is perfectly valid:

# This is fine - different mapping levels
production:
  host: prod.example.com
staging:
  host: staging.example.com

Use Case

Duplicate key detection is essential for teams managing large configuration files. In a 500-line Kubernetes manifest or Ansible inventory, a duplicate key from a bad merge can silently change application behavior. Integrating a duplicate key check into CI prevents production incidents caused by configuration errors that parsers silently accept.

Try It — YAML Formatter & Validator

Open full tool