Regex to Detect Duplicate Consecutive Words
Detect repeated consecutive words like 'the the' or 'is is' in text using backreferences. Useful for proofreading and grammar checking. Free online regex tester.
Regular Expression
/\b(\w+)\s+\1\b/gi
Token Breakdown
| Token | Description |
|---|---|
| \b | Word boundary assertion |
| ( | Start of capturing group |
| \w | Matches any word character (letter, digit, underscore) |
| + | Matches the preceding element one or more times (greedy) |
| ) | End of group |
| \s | Matches any whitespace character (space, tab, newline) |
| + | Matches the preceding element one or more times (greedy) |
| \1 | Escaped character '1' |
| \b | Word boundary assertion |
Detailed Explanation
This regex detects duplicate consecutive words in text, a common typographical error. Here is the token-by-token breakdown:
\b — A word boundary assertion at the start, ensuring the match begins at the start of a complete word and does not match partial words within longer words.
(\w+) — Capturing group 1 matches one or more word characters (letters, digits, underscores). This captures the first occurrence of the potentially duplicated word.
\s+ — Matches one or more whitespace characters between the two words. This handles single spaces, multiple spaces, tabs, and even newlines between the duplicated words.
\1 — A backreference to capturing group 1, matching the exact same text that was captured by the first group. This is the key mechanism that ensures the second word is identical to the first. The regex engine compares the actual matched text, not just the pattern.
\b — A word boundary assertion at the end, ensuring the match ends at a complete word boundary. This prevents false matches where the duplicated text is part of a longer word.
The g flag enables global matching to find all duplicate word pairs in the text, and the i flag makes the comparison case-insensitive so that The the and THE the are also detected as duplicates.
This pattern is invaluable for proofreading, grammar checking, and text quality assurance. Common duplicate word errors include the the, is is, to to, and and and. Word processors and writing tools use similar patterns to highlight potential errors. This pattern is also useful in automated content review, editorial workflows, and educational writing tools.
Example Test Strings
| Input | Expected |
|---|---|
| the the | Match |
| this is is wrong | Match |
| no duplicates here | No Match |
| that that | Match |
| abcabc | No Match |
Try It — Interactive Tester
Match Highlighting(3 matches)
Matches & Capture Groups
14 charsFlags: giMatches: 3Ctrl+Shift+C to copy regex