Regex for Log File Parsing — Apache, Nginx, and Application Logs
Regex patterns for parsing common log formats including Apache Combined Log, Nginx access logs, and application log files. Extract timestamps, IPs, methods, and status codes.
Common Patterns
Detailed Explanation
Log File Parsing with Regex
Server and application logs follow semi-structured formats that regex excels at parsing. Named capture groups make field extraction clean and maintainable.
Apache Combined Log Format
(?<ip>[\d.]+) - (?<user>\S+) \[(?<time>[^\]]+)\] "(?<method>\w+) (?<path>\S+) (?<proto>[^"]+)" (?<status>\d{3}) (?<size>\d+|-) "(?<referrer>[^"]*)" "(?<agent>[^"]*)"
Example log line:
192.168.1.1 - admin [15/Jan/2024:10:30:00 +0000] "GET /api/users HTTP/1.1" 200 1234 "https://example.com" "Mozilla/5.0"
Nginx Access Log
Nginx uses a similar format by default:
(?<ip>[\d.]+) - (?<user>\S+) \[(?<time>[^\]]+)\] "(?<request>[^"]*)" (?<status>\d{3}) (?<bytes>\d+) "(?<referrer>[^"]*)" "(?<agent>[^"]*)"
Application Log Pattern
Many apps use a timestamp-level-message format:
\[(?<timestamp>[^\]]+)\] (?<level>DEBUG|INFO|WARN|ERROR|FATAL) (?<logger>[\w.]+) - (?<message>.+)
Matches:
[2024-01-15 10:30:00.123] ERROR com.example.App - Connection timeout after 30s
Filtering by Status Code
Find all 5xx errors:
" 5\d{2} "
Find all non-200 responses:
" (?!200 )\d{3} "
Tips for Log Parsing
- Use non-greedy quantifiers or negated character classes for fields enclosed in delimiters
- Named groups make the extracted data self-documenting
- For production log analysis, dedicated tools (ELK stack, Loki) are more appropriate than regex
- Test patterns against real log samples, as formats often have subtle variations
Use Case
You are analyzing server access logs to find error patterns, building a simple log viewer that needs to parse and colorize log levels, or extracting specific request information from web server logs for debugging.