Parsing Apache Combined Access Logs
Parse Apache Combined Log Format entries to extract IP, timestamp, request, status code, bytes, referrer, and user agent fields.
Detailed Explanation
Apache Combined Log Format
The Apache Combined Log Format is the most widely used web server log format. It extends the Common Log Format with referrer and user-agent fields, giving you a complete picture of each HTTP request.
Format Structure
%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"
Each field maps to:
| Field | Symbol | Example |
|---|---|---|
| Remote host | %h |
192.168.1.1 |
| Identity | %l |
- (usually empty) |
| User | %u |
admin or - |
| Timestamp | %t |
[15/Jan/2024:10:30:00 +0000] |
| Request line | %r |
GET /api/users HTTP/1.1 |
| Status code | %>s |
200 |
| Bytes sent | %b |
1234 |
| Referrer | %{Referer}i |
https://example.com |
| User agent | %{User-Agent}i |
Mozilla/5.0 ... |
Example Log Line
192.168.1.1 - admin [15/Jan/2024:10:30:00 +0000] "GET /api/users HTTP/1.1" 200 1234 "https://example.com" "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
Severity Inference
Since Apache access logs do not include an explicit severity level, the parser infers severity from the HTTP status code:
- 2xx and 3xx responses are classified as INFO
- 4xx responses (client errors like 404 Not Found) are classified as WARN
- 5xx responses (server errors like 500 Internal Server Error) are classified as ERROR
This mapping makes it easy to quickly filter for problematic requests in large access log files.
Use Case
Analyzing web server traffic patterns, identifying 404 errors and broken links, tracking referrer sources, debugging slow or failed HTTP requests, and auditing access patterns by IP address.