Detect Bots and Crawlers from User-Agent
Comprehensive guide to detecting bots, crawlers, and automated clients from User-Agent strings. Covers search engines, social media, SEO tools, and command-line clients.
Bot Detection
Detailed Explanation
Detecting Bots and Crawlers from User-Agent Strings
Bot traffic can account for 30-50% of all web traffic. Accurately identifying bots from their User-Agent strings is critical for security, analytics, and performance.
Categories of Bots
Search Engine Crawlers:
Googlebot— Google Searchbingbot— Microsoft BingSlurp— Yahoo SearchDuckDuckBot— DuckDuckGoBaiduspider— Baidu (China)YandexBot— Yandex (Russia)Applebot— Apple (Siri, Spotlight)
Social Media Crawlers:
facebookexternalhit— Facebook link previewTwitterbot— Twitter/X card previewLinkedInBot— LinkedIn link preview
AI Agent Crawlers:
ChatGPT-User— OpenAI's ChatGPTGPTBot— OpenAI's general crawlerClaudeBot— Anthropic's ClaudeBytespider— ByteDance (TikTok)
SEO Tools:
AhrefsBot— Ahrefs backlink crawlerSemrushBot— Semrush SEO crawlerMJ12bot— Majestic SEODotBot— Moz
Command-Line Clients:
curl/X.Y.Z— curl HTTP clientWget/X.Y— GNU Wgetpython-requests/X.Y— Python Requests library
Detection Strategies
- Exact match — Check for known bot tokens (
Googlebot,bingbot) - Generic patterns — Match keywords like
bot,crawler,spider,scraper - Non-browser UA — UAs that lack typical browser tokens (no
Mozilla, no rendering engine) - Empty UA — Some bots send an empty or missing User-Agent header
Important Caveats
- Bots can spoof any User-Agent — UA detection is not a security measure
- Some legitimate monitoring services use bot-like UAs
- Progressive Web App requests and Service Workers may have atypical UAs
- Server-to-server requests (webhooks, APIs) may look like bots
Use Case
Security teams detect bots to prevent scraping, rate-limit automated requests, and filter bot traffic from analytics. CDN configurations often use UA-based rules to challenge suspicious bots. Marketing teams exclude bot traffic from conversion tracking to maintain accurate campaign metrics.