Detect AI Agent Bots from User-Agent

Identify AI agent crawlers like ChatGPT-User, GPTBot, and ClaudeBot from User-Agent strings. Understand how to manage AI crawler access with robots.txt.

Bot Detection

Detailed Explanation

Detecting AI Agent Crawlers

With the rise of large language models (LLMs), a new category of web crawlers has emerged: AI agent bots. These crawlers fetch web content to train AI models or to provide real-time web browsing capabilities.

Known AI Agent User-Agents

OpenAI:

  • ChatGPT-User — Used when ChatGPT browses the web in real-time
  • GPTBot/1.0 — OpenAI's general training data crawler
  • OAI-SearchBot/1.0 — OpenAI's search product crawler

Anthropic:

  • ClaudeBot — Anthropic's web crawler for Claude

Others:

  • Bytespider — ByteDance (powers TikTok's AI features)
  • CCBot/2.0 — Common Crawl (open dataset used by many AI companies)
  • Google-Extended — Google's AI training crawler (separate from Googlebot)
  • PerplexityBot — Perplexity AI's search crawler

Example UA Strings

ChatGPT browsing:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ChatGPT-User/1.0; +https://openai.com/bot)

GPTBot:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

Managing AI Crawler Access

Use robots.txt to control AI crawler access:

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

# Allow search engine crawlers
User-agent: Googlebot
Allow: /

Key Considerations

  • AI crawlers are a rapidly evolving space — new bots appear frequently
  • Some AI companies respect robots.txt; others may not
  • Blocking GPTBot does not affect Googlebot or regular Google Search
  • Google-Extended controls Gemini/Bard training but is separate from Google Search indexing
  • Many AI crawlers identify themselves voluntarily, but some use generic or spoofed UAs

Use Case

Content publishers and website operators use AI bot detection to control whether their content is used for AI training purposes. Legal and compliance teams monitor AI crawler activity to enforce copyright policies. DevOps teams implement rate limiting specifically for AI crawlers to manage server load.

Try It — User-Agent Parser & Analyzer

Open full tool