Regex to Remove HTML Tags — Strip Markup, Keep Text
Regex to strip HTML tags from a string, keeping only the text content. Includes variants for whitelisting safe tags and handling self-closing tags and comments.
Detailed Explanation
Stripping HTML Tags with Regex
Removing HTML tags from a string is a common need for plain-text previews, search index input, and email subject lines. For untrusted input always use DOMPurify plus a whitelist; for trusted input regex is fast and dependency-free.
Basic Strip
str.replace(/<[^>]+>/g, "")
Removes any tag-shaped substring.
Strip Tags AND Comments
str.replace(/<!--[\s\S]*?-->/g, "").replace(/<[^>]+>/g, "")
The first pass removes comments (which can span multiple lines), the second removes tags.
Strip Tags but Keep Whitespace Where They Were
To turn <p>hello</p><p>world</p> into hello world (with a space) instead of helloworld:
str.replace(/<\/?(?:p|div|br|li|h[1-6])[^>]*>/gi, " ")
.replace(/<[^>]+>/g, "")
.replace(/\s+/g, " ")
.trim()
Tested Examples
| Input | Output |
|---|---|
<p>hello</p> |
hello |
<a href="x">link</a> |
link |
<!-- comment -->visible |
visible (with comment-strip) |
<p>a</p><p>b</p> |
ab (basic) / a b (block-aware) |
<script>alert(1)</script> |
`` (basic strips tag, leaves alert(1)) |
Why You Still Want a Real Parser
Regex strip leaves behind:
- Script and style content (
alert(1)survives the strip) - HTML entities (
&is unchanged) - Malformed input that can be exploited
To handle script content, strip those blocks first:
str.replace(/<(script|style)[^>]*>[\s\S]*?<\/\1>/gi, "")
Decode Entities
After stripping tags, decode entities to get readable text:
const txt = document.createElement("textarea");
txt.innerHTML = stripped;
const decoded = txt.value;
Recommendation
For server-rendered preview text, the script/style strip plus tag strip plus entity decode is sufficient. For untrusted user input that will be re-rendered, never trust regex; sanitize with a battle-tested library.
Use Case
Generating plain-text excerpts for blog post meta descriptions, indexing HTML content into a search engine like Algolia, or producing a clean preview of rich-text comments for notification emails.