Regex to Match DOI Identifiers
Validate Digital Object Identifier (DOI) strings starting with 10. prefix followed by a registrant code and suffix. Matches scholarly article identifiers. Free regex.
Regular Expression
/^10\.\d{4,9}/[^\s]+$/
Token Breakdown
| Token | Description |
|---|---|
| ^ | Anchors at the start of the string (or line in multiline mode) |
| 1 | Matches the literal character '1' |
| 0 | Matches the literal character '0' |
| \. | Matches a literal dot |
| \d | Matches any digit (0-9) |
| {4,9} | Matches between 4 and 9 times |
| / | Matches the literal character '/' |
| [^\s] | Negated character class — matches any character NOT in \s |
| + | Matches the preceding element one or more times (greedy) |
| $ | Anchors at the end of the string (or line in multiline mode) |
Detailed Explanation
This regex validates Digital Object Identifiers (DOIs) as standardized by the International DOI Foundation. Here is the token-by-token breakdown:
^ — Anchors the match at the start of the string.
10 — Matches the literal number 10, which is the DOI directory indicator. All DOIs begin with 10. followed by a registrant code.
. — Matches a literal dot separating the directory indicator from the registrant code. The dot is escaped because it is a regex metacharacter.
\d{4,9} — Matches 4 to 9 digits for the registrant code (also called the prefix). The registrant code identifies the organization that registered the DOI. Common registrant codes include 1000 series for test DOIs, 1002 for Wiley, 1038 for Nature, and 1109 for IEEE.
/ — Matches the literal forward slash that separates the prefix (registrant code) from the suffix.
[^\s]+ — Matches one or more non-whitespace characters for the DOI suffix. The suffix is assigned by the registrant and can contain letters, digits, dots, hyphens, underscores, and other characters. It uniquely identifies the content item within the registrant's namespace.
$ — Anchors the match at the end of the string.
No flags are used since this validates a single DOI string.
DOIs are persistent identifiers used to uniquely identify academic papers, datasets, and other scholarly objects. Examples include: 10.1000/xyz123, 10.1038/nature12373, and 10.1109/5.771073. This pattern is essential for academic citation systems, research databases, and library catalogs.
Example Test Strings
| Input | Expected |
|---|---|
| 10.1000/xyz123 | Match |
| 10.1038/nature12373 | Match |
| 10.1109/5.771073 | Match |
| 11.1000/xyz | No Match |
| 10.12/short | No Match |
Try It — Interactive Tester
20 charsFlags: noneMatches: 0Ctrl+Shift+C to copy regex