Regex to Extract URLs from Text — Find All Links
Regex patterns to extract URLs from plain text, Markdown, and code comments. Handles http, https, paths, query strings, fragments, and trailing punctuation.
Detailed Explanation
Extracting URLs from Text
Auto-linking, link audits, and broken-link detection all start with finding URLs in plain text. The challenge is balancing recall (catch every URL) with precision (don’t swallow trailing punctuation).
Practical Pattern
https?:\/\/[\w\-._~:/?#[\]@!$&'()*+,;=%]+
This covers most URL components defined in RFC 3986.
Stricter Pattern Excluding Trailing Punctuation
A URL at the end of a sentence often gets a stray ., ,, or ). Use a negated tail:
https?:\/\/[^\s<>\"\)\]\.,!?;:]+(?:[\w/]+)
Or trim the trailing punctuation in code after extraction.
Tested Examples
| Input | Extracted |
|---|---|
"see https://example.com for info." |
https://example.com |
"links: http://a.io and https://b.io/path?q=1" |
http://a.io, https://b.io/path?q=1 |
"(https://example.com/page)" |
https://example.com/page (with trim rule) |
"http://localhost:3000/api/users#section" |
http://localhost:3000/api/users#section |
Match Bare Domains Too
To also catch URLs without a scheme (example.com, www.example.com):
\b(?:https?:\/\/|www\.)[\w.-]+\.[a-z]{2,}(?:\/[\w\-._~:/?#[\]@!$&'()*+,;=%]*)?
Markdown Links
Markdown wraps URLs in (…). To extract URLs only from Markdown link syntax, see extract-markdown-links.
JavaScript Convenience
const urls = text.match(/https?:\/\/\S+/g) ?? [];
const cleaned = urls.map(u => u.replace(/[.,;:!?\)\]]+$/, ""));
Practical Recommendations
For complex inputs (HTML, PDFs), prefer a parser. For server logs, a regex pass plus URL trimming is fast enough. For chat messages, combine with new URL(…) to discard malformed candidates.
Use Case
Auto-linking URLs in user comments, building a broken-link checker for documentation, or extracting outbound references from raw email bodies for SEO analysis.