Detect Bots and Crawlers from User-Agent

Comprehensive guide to detecting bots, crawlers, and automated clients from User-Agent strings. Covers search engines, social media, SEO tools, and command-line clients.

Bot Detection

Detailed Explanation

Detecting Bots and Crawlers from User-Agent Strings

Bot traffic can account for 30-50% of all web traffic. Accurately identifying bots from their User-Agent strings is critical for security, analytics, and performance.

Categories of Bots

Search Engine Crawlers:

  • Googlebot — Google Search
  • bingbot — Microsoft Bing
  • Slurp — Yahoo Search
  • DuckDuckBot — DuckDuckGo
  • Baiduspider — Baidu (China)
  • YandexBot — Yandex (Russia)
  • Applebot — Apple (Siri, Spotlight)

Social Media Crawlers:

  • facebookexternalhit — Facebook link preview
  • Twitterbot — Twitter/X card preview
  • LinkedInBot — LinkedIn link preview

AI Agent Crawlers:

  • ChatGPT-User — OpenAI's ChatGPT
  • GPTBot — OpenAI's general crawler
  • ClaudeBot — Anthropic's Claude
  • Bytespider — ByteDance (TikTok)

SEO Tools:

  • AhrefsBot — Ahrefs backlink crawler
  • SemrushBot — Semrush SEO crawler
  • MJ12bot — Majestic SEO
  • DotBot — Moz

Command-Line Clients:

  • curl/X.Y.Z — curl HTTP client
  • Wget/X.Y — GNU Wget
  • python-requests/X.Y — Python Requests library

Detection Strategies

  1. Exact match — Check for known bot tokens (Googlebot, bingbot)
  2. Generic patterns — Match keywords like bot, crawler, spider, scraper
  3. Non-browser UA — UAs that lack typical browser tokens (no Mozilla, no rendering engine)
  4. Empty UA — Some bots send an empty or missing User-Agent header

Important Caveats

  • Bots can spoof any User-Agent — UA detection is not a security measure
  • Some legitimate monitoring services use bot-like UAs
  • Progressive Web App requests and Service Workers may have atypical UAs
  • Server-to-server requests (webhooks, APIs) may look like bots

Use Case

Security teams detect bots to prevent scraping, rate-limit automated requests, and filter bot traffic from analytics. CDN configurations often use UA-based rules to challenge suspicious bots. Marketing teams exclude bot traffic from conversion tracking to maintain accurate campaign metrics.

Try It — User-Agent Parser & Analyzer

Open full tool