Fetch, validate, and parse any domain's robots.txt file. See which bots are allowed or blocked, and inspect all crawl directives at a glance.
Enter a domain above, or drop a robots.txt file here
Domain lookup fetches the live file (1 credit). File upload and paste are free — analyzed locally in your browser.
Written by Ishan Karunaratne · Last reviewed:
robots.txt is a plain text file located at the root of a website — for example, https://example.com/robots.txt. It uses the Robots Exclusion Protocol, standardized in RFC 9309 (published June 2022 by the IETF), to tell web crawlers which parts of your site they are and are not allowed to access. The protocol was originally proposed by Martijn Koster in 1994 and remained an informal convention for nearly three decades before being formally standardized.
Search engines like Google, Bing, and Yandex check robots.txt before crawling any page on your domain. According to Google's documentation, if the file returns HTTP 200, the rules within it are enforced. A 404 means the entire site is considered crawlable. A 5xx response causes Google to temporarily stop crawling — and after 30 days of consecutive 5xx responses, Google treats the last cached version of robots.txt as authoritative (RFC 9309 §2.3).
A misconfigured robots.txt can have serious consequences. Accidentally blocking Disallow: / on your wildcard user-agent will cause search engine bots to stop crawling your entire site, and pages can disappear from search results within days. Similarly, blocking CSS and JavaScript resources prevents search engines from rendering your pages, which can harm indexing and rankings even if the HTML itself is crawlable.
It's important to understand that robots.txt is an advisory protocol, not a security mechanism. It does not prevent access — anyone can view your robots.txt file, and malicious crawlers will ignore it entirely. For pages that must not be indexed, use the noindex meta tag or X-Robots-Tag HTTP header instead.
This tool fetches your robots.txt file server-side and performs a comprehensive analysis based on RFC 9309 and search engine best practices:
Content-Type: text/plain. Per RFC 9309 §2.2, crawlers should parse the file as UTF-8 encoded text. If it returns HTML, most bots will fail to parse it correctly.You can also drag and drop a local robots.txt file or paste content from your clipboard to analyze it offline without using any credits. Local analysis performs all checks except HTTP status, Content-Type, and sitemap reachability.
The robots.txt file supports several directives, each controlling a different aspect of crawler behavior. Below are all directives recognized by major search engines and AI crawlers, with references to the relevant specifications.
Identifies which bot the following rules apply to. User-agent: * targets all bots. Use specific names like Googlebot or Bingbot for bot-specific rules. Per RFC 9309 §2.2.1, a crawler must use the most specific matching group — if both * and Googlebot groups exist, Googlebot only follows its own group.
Blocks the bot from crawling the specified path and everything under it. Disallow: / blocks the entire site. Disallow: (empty value) allows all paths. Path matching is case-sensitive and prefix-based (RFC 9309 §2.2.2). Google and Bing also support * wildcards and $ end-of-URL anchors in paths.
Overrides a Disallow rule for a more specific path. Useful when blocking a directory but allowing certain files within it. For example, Disallow: /private/ combined with Allow: /private/public.html lets crawlers access that one file. Formally defined in RFC 9309 §2.2.2 — when Allow and Disallow match the same URL, the most specific (longest) rule wins.
Declares the full URL of your XML sitemap. Multiple Sitemap lines are allowed — useful for sitemap index files or separate sitemaps for different content types. This directive is not part of the core RFC 9309 specification but is universally supported by Google, Bing, and Yandex. See Google's sitemap documentation for format requirements.
Requests the bot wait the specified number of seconds between requests. Supported by Bingbot, Yandex, and others. Not supported by Googlebot — you must configure Googlebot's crawl rate directly in Google Search Console. Values over 30 seconds can significantly slow indexing and are flagged by this checker.
A non-standard directive historically used by Yandex to specify the preferred domain (e.g. www vs non-www). Not recognized by Google or Bing. If you need to set a canonical domain, use rel="canonical" or 301 redirects instead, which are universally supported.
Not every crawler supports every robots.txt directive. The table below shows which directives are recognized by major search engine and AI crawlers, based on official documentation and observed behavior as of March 2026.
| Directive | Googlebot | Bingbot | Yandex | GPTBot | ClaudeBot |
|---|---|---|---|---|---|
| User-agent | Yes | Yes | Yes | Yes | Yes |
| Disallow | Yes | Yes | Yes | Yes | Yes |
| Allow | Yes | Yes | Yes | Yes | Yes |
| Sitemap | Yes | Yes | Yes | — | — |
| Crawl-delay | No | Yes | Yes | — | — |
| Host | No | No | Yes | No | No |
| Wildcards (*,$) | Yes | Yes | No | — | — |
A dash (—) indicates the directive is not relevant to that crawler's function. AI crawlers like GPTBot and ClaudeBot primarily respect User-agent, Disallow, and Allow directives.
With the rise of large language models (LLMs), a new generation of web crawlers has emerged. Companies like OpenAI (GPTBot, ChatGPT-User), Anthropic (ClaudeBot, anthropic-ai), Google (Google-Extended), Meta (FacebookBot for AI training), and others now crawl the web to build training datasets and power AI-powered search features.
These crawlers generally respect robots.txt rules. You can block them using specific User-agent directives — for example, User-agent: GPTBot followed by Disallow: / prevents OpenAI from crawling your site. The Generator tab in this tool includes presets for blocking all known AI crawlers.
Note that blocking AI crawlers is separate from blocking search engine crawlers. You can allow Googlebot to index your site for search results while simultaneously blocking GPTBot from using your content for AI training. Each bot uses its own User-agent string and follows its own group in robots.txt.
The table below lists the major AI crawlers active as of 2026, their User-agent strings, and whether they respect robots.txt directives.
| User-agent | Owner | Purpose | Respects robots.txt |
|---|---|---|---|
| GPTBot | OpenAI | Training data collection | Yes |
| OAI-SearchBot | OpenAI | ChatGPT search results | Yes |
| ChatGPT-User | OpenAI | Real-time browsing in ChatGPT | Yes |
| ClaudeBot | Anthropic | Training data and web features | Yes |
| PerplexityBot | Perplexity | AI-powered search answers | Yes |
| Google-Extended | Gemini AI training (not Search) | Yes | |
| Bytespider | ByteDance | TikTok / Douyin AI training | Yes |
| CCBot | Common Crawl | Open dataset used by many AI labs | Yes |
A well-configured robots.txt file helps search engines and AI crawlers index your site efficiently while protecting server resources and preventing sensitive paths from appearing in search results. These best practices are based on RFC 9309 and official guidance from Google, Bing, and other major crawlers.
https://yourdomain.com/robots.txt — no subdirectory, no alternate filename. Crawlers only check this exact path.Content-Type: text/plain. If your server returns HTML (common with custom 404 pages or reverse proxies), crawlers cannot parse the directives and may treat the file as invalid.User-agent: *, create targeted blocks for individual crawlers — especially when managing AI crawler access separately from search engines.Sitemap: https://yourdomain.com/sitemap.xml helps crawlers discover all your pages, even those not linked from your navigation.noindex meta tag.The official IETF internet standard (published June 2022) defining robots.txt syntax, file access rules, precedence logic, and crawler behavior. The authoritative reference for all robots.txt implementations.
Google's implementation details including supported directives (User-agent, Allow, Disallow, Sitemap only), the 500KB file size limit, and wildcard pattern matching.
Bing's official guide covering Crawl-delay support, BingBot-specific behavior, and how Bing processes robots.txt differently from Google.
Best practices for XML sitemaps, sitemap indexes, the Sitemap directive in robots.txt, and how to submit sitemaps to Google Search Console.
Yandex's robots.txt guide covering Host directive support, Clean-param, and Yandex-specific extensions not available in other search engines.
The original community resource for the Robots Exclusion Protocol, including the historical 1994 specification and practical usage examples.
Look up DNS records (A, MX, TXT, NS, CNAME) for any domain.
Analyze a site's HTTP security headers and get a letter grade.
Check if your domain is flagged for malware or phishing across 17 vendors.
Look up domain registration details, registrar, and expiry dates.
Audit 70+ on-page SEO factors including robots.txt access, meta tags, and structured data.