robots.txt
A plain-text file at `/robots.txt` that tells web crawlers which paths on a site they may or may not fetch, formalised as the Robots Exclusion Protocol in RFC 9309.
robots.txt is the file every well-behaved crawler reads before requesting any other URL on a host. Each record names one or more `User-agent` lines and a set of `Allow` and `Disallow` rules. `Disallow: /admin/` asks crawlers to stay out of that subtree; `Disallow:` (empty) allows everything. The standard is advisory, so it stops Googlebot and friends but not malicious scrapers. robots.txt does not prevent indexing of URLs that are linked from elsewhere; it only blocks the crawl. To keep a page out of search results, use a `noindex` meta tag or `X-Robots-Tag` header instead.
Reference
Related terms
See also
Referenced on
- Branch on the response:
- DNS Checker Bot & Scanner Documentation
- Free DNS & Network Tools
- Free On-Page SEO Checker
- How to Identify and Manage Web Crawlers: A Sysadmin's Guide to robots.txt, AI Bots, and SEO Crawlers
- How to Report a DDoS Attack to Your ISP: Evidence, Templates, and Escalation Steps
- How to Report IP Address Abuse: The Complete Guide to Filing Reports That Get Results
- Page Speed Test
- Privacy Policy - DNS Checker
- Robots.txt Checker
- SEO Tools
- Terms of Service - DNS Checker
- Web Inspection Tools