Regex to Match HTML Entities

Match HTML entities including named (&), decimal ({), and hexadecimal (ÿ) forms with this regex pattern. Free online regex tester.

Regular Expression

/&(?:#[0-9]+|#x[0-9a-fA-F]+|[a-zA-Z][a-zA-Z0-9]*);/g

Token Breakdown

TokenDescription
&Matches the literal character '&'
(?:Start of non-capturing group
#Matches the literal character '#'
[0-9]Character class — matches any one of: 0-9
+Matches the preceding element one or more times (greedy)
|Alternation — matches the expression before OR after the pipe
#Matches the literal character '#'
xMatches the literal character 'x'
[0-9a-fA-F]Character class — matches any one of: 0-9a-fA-F
+Matches the preceding element one or more times (greedy)
|Alternation — matches the expression before OR after the pipe
[a-zA-Z]Character class — matches any one of: a-zA-Z
[a-zA-Z0-9]Character class — matches any one of: a-zA-Z0-9
*Matches the preceding element zero or more times (greedy)
)End of group
;Matches the literal character ';'

Detailed Explanation

This regex matches all three types of HTML entities: named, decimal numeric, and hexadecimal numeric. Here is the token-by-token breakdown:

& — Matches the literal ampersand character that begins every HTML entity reference.

(?: — Opens a non-capturing group for the three entity type alternatives.

#[0-9]+ — First alternative: matches decimal numeric entities. The hash symbol indicates a numeric reference, followed by one or more digits representing the Unicode code point in decimal. For example, © represents the copyright symbol.

| — Alternation operator.

#x[0-9a-fA-F]+ — Second alternative: matches hexadecimal numeric entities. The #x prefix indicates hexadecimal notation, followed by one or more hexadecimal digits (0-9, a-f, A-F). For example, © also represents the copyright symbol.

| — Another alternation.

[a-zA-Z][a-zA-Z0-9]* — Third alternative: matches named entities. The name must start with a letter followed by zero or more alphanumeric characters. This covers entities like amp, lt, gt, nbsp, copy, and hundreds of others defined in the HTML specification.

) — Closes the non-capturing group.

; — Matches the literal semicolon that terminates every properly formed HTML entity.

The g flag enables global matching to find all entities in the text. This pattern is useful for HTML sanitization, entity decoding, text processing, and content migration. It correctly identifies all standard entity formats while rejecting malformed references.

Example Test Strings

InputExpected
&Match
©Match
©Match
 Match
& not an entityNo Match
&;No Match

Try It — Interactive Tester

//g
gimsuy

Match Highlighting(4 matches)

& © ©   & not an entity &;

Matches & Capture Groups

#1&index 0
#2©index 6
#3©index 13
#4 index 22
Pattern: 49 charsFlags: gMatches: 4

Ctrl+Shift+C to copy regex

Customize this pattern →