Parsing Nginx Access Logs
Parse Nginx access logs that follow the Combined Log Format to extract request details, status codes, and client information.
Detailed Explanation
Nginx Access Log Format
By default, Nginx uses a format called combined for access logs, which is identical to the Apache Combined Log Format. This makes the two interchangeable for parsing purposes.
Default Nginx Log Format
The default combined format in nginx.conf:
log_format combined '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent"';
Example Log Lines
10.0.0.1 - - [15/Jan/2024:10:30:00 +0000] "GET /index.html HTTP/1.1" 200 612 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)"
10.0.0.2 - admin [15/Jan/2024:10:30:01 +0000] "POST /api/login HTTP/1.1" 403 89 "https://example.com/login" "curl/8.1.2"
10.0.0.3 - - [15/Jan/2024:10:30:02 +0000] "GET /missing-page HTTP/1.1" 404 162 "-" "Googlebot/2.1"
Custom Nginx Log Formats
Many teams customize the Nginx log format to include additional variables:
log_format extended '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent" '
'$request_time $upstream_response_time';
The additional $request_time and $upstream_response_time fields are valuable for performance monitoring. The parser handles the standard combined format fields and treats any additional fields as part of the extended data.
Identifying Bot Traffic
One practical use of parsing Nginx access logs is identifying bot traffic. By examining the user-agent field, you can filter for crawlers (Googlebot, bingbot), monitoring tools (UptimeRobot), and potential scraping bots. Combined with IP and request pattern analysis, this helps with traffic classification.
Use Case
Monitoring web application traffic patterns and response codes, identifying bot traffic vs human visitors, analyzing request rates and bandwidth usage, debugging 4xx and 5xx errors, and tracking performance through request timing fields.