Regex to Detect Duplicate Consecutive Words

Detect repeated consecutive words like 'the the' or 'is is' in text using backreferences. Useful for proofreading and grammar checking. Free online regex tester.

Regular Expression

/\b(\w+)\s+\1\b/gi

Token Breakdown

TokenDescription
\bWord boundary assertion
(Start of capturing group
\wMatches any word character (letter, digit, underscore)
+Matches the preceding element one or more times (greedy)
)End of group
\sMatches any whitespace character (space, tab, newline)
+Matches the preceding element one or more times (greedy)
\1Escaped character '1'
\bWord boundary assertion

Detailed Explanation

This regex detects duplicate consecutive words in text, a common typographical error. Here is the token-by-token breakdown:

\b — A word boundary assertion at the start, ensuring the match begins at the start of a complete word and does not match partial words within longer words.

(\w+) — Capturing group 1 matches one or more word characters (letters, digits, underscores). This captures the first occurrence of the potentially duplicated word.

\s+ — Matches one or more whitespace characters between the two words. This handles single spaces, multiple spaces, tabs, and even newlines between the duplicated words.

\1 — A backreference to capturing group 1, matching the exact same text that was captured by the first group. This is the key mechanism that ensures the second word is identical to the first. The regex engine compares the actual matched text, not just the pattern.

\b — A word boundary assertion at the end, ensuring the match ends at a complete word boundary. This prevents false matches where the duplicated text is part of a longer word.

The g flag enables global matching to find all duplicate word pairs in the text, and the i flag makes the comparison case-insensitive so that The the and THE the are also detected as duplicates.

This pattern is invaluable for proofreading, grammar checking, and text quality assurance. Common duplicate word errors include the the, is is, to to, and and and. Word processors and writing tools use similar patterns to highlight potential errors. This pattern is also useful in automated content review, editorial workflows, and educational writing tools.

Example Test Strings

InputExpected
the theMatch
this is is wrongMatch
no duplicates hereNo Match
that thatMatch
abcabcNo Match

Try It — Interactive Tester

//gi
gimsuy

Match Highlighting(3 matches)

the the this is is wrong no duplicates here that that abcabc

Matches & Capture Groups

#1the theindex 0
Group 1:the
#2is isindex 13
Group 1:is
#3that thatindex 44
Group 1:that
Pattern: 14 charsFlags: giMatches: 3

Ctrl+Shift+C to copy regex

Customize this pattern →