Regex for URL Matching — HTTP, HTTPS, and URI Patterns
Regex patterns for matching URLs including HTTP, HTTPS, and general URI formats. Covers protocol, domain, path, query parameters, and fragment matching.
Common Patterns
Detailed Explanation
URL Matching with Regex
Matching URLs is a frequent requirement in text processing, link extraction, and input validation. Here are patterns from simple to comprehensive.
Simple HTTP/HTTPS Pattern
https?://[\w.-]+(?:/[\w./?%&=-]*)?
This covers basic URLs with optional paths and query strings:
https://example.comhttp://sub.domain.com/path/to/pagehttps://api.example.com/v1/users?page=1&limit=10
Token Breakdown
| Token | Purpose |
|---|---|
https? |
"http" or "https" |
:// |
Protocol separator |
[\w.-]+ |
Domain name (letters, digits, dots, hyphens) |
(?:/[\w./?%&=-]*)? |
Optional path and query string |
More Comprehensive Pattern
For URLs with ports, authentication, and fragments:
https?://(?:[\w-]+(?::[\w-]+)?@)?[\w.-]+(?::\d{1,5})?(?:/[\w./?%&=#-]*)?
This additionally matches:
- Port numbers:
https://localhost:3000/api - Basic auth:
https://user:pass@example.com - Fragment identifiers:
https://example.com/page#section
Extracting URL Components
Using named capture groups to extract parts:
(?<protocol>https?)://(?<domain>[\w.-]+)(?::(?<port>\d+))?(?<path>/[^?#]*)?(?:\?(?<query>[^#]*))?(?:#(?<fragment>.*))?
Important Caveats
- No regex can validate all possible URLs per RFC 3986
- Consider using the URL constructor (
new URL(str)) in JavaScript for reliable parsing - These patterns may match invalid domains; DNS resolution is the true validation
Use Case
You are building a tool that extracts links from plain text, validates user-submitted URLs in a form, or processes log files to find all referenced endpoints.