robots.txt

A plain-text file at `/robots.txt` that tells web crawlers which paths on a site they may or may not fetch, formalised as the Robots Exclusion Protocol in RFC 9309.

robots.txt is the file every well-behaved crawler reads before requesting any other URL on a host. Each record names one or more `User-agent` lines and a set of `Allow` and `Disallow` rules. `Disallow: /admin/` asks crawlers to stay out of that subtree; `Disallow:` (empty) allows everything. The standard is advisory, so it stops Googlebot and friends but not malicious scrapers. robots.txt does not prevent indexing of URLs that are linked from elsewhere; it only blocks the crawl. To keep a page out of search results, use a `noindex` meta tag or `X-Robots-Tag` header instead.

Reference

RFC 9309

Related terms

Canonical URL
User-Agent

Referenced on

Branch on the response:
DNS Checker Bot & Scanner Documentation
Free DNS & Network Tools
Free On-Page SEO Checker
How to Identify and Manage Web Crawlers: A Sysadmin's Guide to robots.txt, AI Bots, and SEO Crawlers
How to Report a DDoS Attack to Your ISP: Evidence, Templates, and Escalation Steps
How to Report IP Address Abuse: The Complete Guide to Filing Reports That Get Results
Page Speed Test
Privacy Policy - DNS Checker
Robots.txt Checker
SEO Tools
Terms of Service - DNS Checker
Web Inspection Tools

Back to the full glossary

Reference

Related terms

See also

Referenced on