Regex to Extract URLs from Text — Find All Links

Q: Regex to Extract URLs from Text — Find All Links

## Extracting URLs from Text Auto-linking, link audits, and broken-link detection all start with finding URLs in plain text. The challenge is balancing recall (catch every URL) with precision (don’t swallow trailing punctuation). ### Practical Pattern https?:\/\/[\w\-._~:/?#[\]@!$&'()*+,;=%]+ This covers most URL components defined in RFC 3986. ### Stricter Pattern Excluding Trailing Punctuation A URL at the end of a sentence often gets a stray ., ,, or ). Use a negated tail: https?:\/

Regex patterns to extract URLs from plain text, Markdown, and code comments. Handles http, https, paths, query strings, fragments, and trailing punctuation.

Extraction

Detailed Explanation

Extracting URLs from Text

Auto-linking, link audits, and broken-link detection all start with finding URLs in plain text. The challenge is balancing recall (catch every URL) with precision (don’t swallow trailing punctuation).

Practical Pattern

https?:\/\/[\w\-._~:/?#[\]@!$&'()*+,;=%]+

This covers most URL components defined in RFC 3986.

Stricter Pattern Excluding Trailing Punctuation

A URL at the end of a sentence often gets a stray ., ,, or ). Use a negated tail:

https?:\/\/[^\s<>\"\)\]\.,!?;:]+(?:[\w/]+)

Or trim the trailing punctuation in code after extraction.

Tested Examples

Input	Extracted
`"see https://example.com for info."`	`https://example.com`
`"links: http://a.io and https://b.io/path?q=1"`	`http://a.io`, `https://b.io/path?q=1`
`"(https://example.com/page)"`	`https://example.com/page` (with trim rule)
`"http://localhost:3000/api/users#section"`	`http://localhost:3000/api/users#section`

Match Bare Domains Too

To also catch URLs without a scheme (example.com, www.example.com):

\b(?:https?:\/\/|www\.)[\w.-]+\.[a-z]{2,}(?:\/[\w\-._~:/?#[\]@!$&'()*+,;=%]*)?

Markdown Links

Markdown wraps URLs in (…). To extract URLs only from Markdown link syntax, see extract-markdown-links.

JavaScript Convenience

const urls = text.match(/https?:\/\/\S+/g) ?? [];
const cleaned = urls.map(u => u.replace(/[.,;:!?\)\]]+$/, ""));

Practical Recommendations

For complex inputs (HTML, PDFs), prefer a parser. For server logs, a regex pass plus URL trimming is fast enough. For chat messages, combine with new URL(…) to discard malformed candidates.

Use Case

Auto-linking URLs in user comments, building a broken-link checker for documentation, or extracting outbound references from raw email bodies for SEO analysis.

Try It — Regex Cheat Sheet

Open full tool →