Free Robots.txt Tester

Parse and validate robots.txt for any domain. See every user-agent group, allow/disallow paths, sitemap directives, and crawl-delay. Detects common syntax mistakes per RFC 9309.

Who uses this tool?

SEO specialists — verify that Googlebot and friends can crawl the right paths
Developers — catch accidental "Disallow: /" deployments before they tank organic traffic
Agencies — audit client robots.txt for misconfigurations
Security analysts — see what paths a domain hopes to hide from automated discovery
Competitive research — discover sitemap URLs from competitor robots.txt

One bad line in robots.txt can wipe out a site organic traffic

robots.txt is the single most consequential text file on your domain. A misplaced Disallow: / under User-agent: * tells Googlebot to stop crawling your entire site — pages drop out of the index within days, organic traffic collapses, and recovery takes weeks even after the fix. We see this happen several times a month in our scans, usually after a staging environment gets deployed to production with its "block everything" robots.txt still in place.

The flip side matters too: a robots.txt that allows everything but forgets the Sitemap: directive forces crawlers to discover your URLs the slow way, through internal links. Adding one line at the top of robots.txt can speed up indexation of new pages from weeks to hours.

How robots.txt actually works (RFC 9309)

File location and discovery

Crawlers fetch https://example.com/robots.txt before any other URL on a domain. The file must live at the root path; /blog/robots.txt or /en/robots.txt are ignored. Each subdomain has its own robots.txt — blog.example.com/robots.txt is separate from example.com/robots.txt.

User-agent groups

Each User-agent: line starts a new group of rules. Consecutive User-agent: lines merge into a single group — this is the spec rule most parsers get wrong. User-agent: * is the catch-all, applied when a crawler does not find its own name. Specific names like User-agent: Googlebot override the wildcard for that bot.

Allow and Disallow paths

Paths are interpreted as URL prefixes. Disallow: /admin blocks /admin, /admin/, and /admin/users. Disallow: / blocks the entire site. An empty Disallow: with no value explicitly allows everything. Allow: creates an exception inside a broader Disallow — the longest matching path wins.

Wildcards and end-anchors

* matches any sequence of characters; $ anchors the end of the URL. Disallow: /*? blocks every URL with a query string. Disallow: /*.pdf$ blocks PDFs but allows /page.pdf.html. Googlebot and Bingbot support wildcards; some legacy bots ignore them.

Sitemap directive

Sitemap: https://example.com/sitemap.xml can appear anywhere in the file and is independent of user-agent groups. You can list multiple sitemaps — one per language, one for images, one for news. Always use the full absolute URL with scheme.

Crawl-delay

The most misunderstood directive. Google explicitly ignores Crawl-delay — you control Googlebot crawl rate from Search Console under Settings, Crawl rate. Bing, Yahoo, and Yandex respect it (values are in seconds between requests).

Robots.txt mistakes that have killed real sites

The staging-to-prod deploy. Staging environments routinely use a User-agent: * with Disallow: / robots.txt. When CI deploys staging config to production, every page disappears from search within 48 hours. The fix is a 2-line edit; the recovery is 2-6 weeks.
Blocking JavaScript and CSS. Disallow: /static/ seems harmless. But Googlebot needs to fetch your JS and CSS to render the page properly for ranking. Blocking them causes Google to see a broken layout and downrank for usability.
Using robots.txt as a security control. robots.txt is publicly readable — your "hidden" admin URLs become a roadmap for attackers. Block sensitive paths with authentication and X-Robots-Tag: noindex headers instead.
Disallow with noindex meta. Mutually contradictory: if you Disallow a page, Google cannot fetch it to see the noindex meta tag, so the URL might still appear in SERPs as a "URL only" result.

A robots.txt template for typical sites

The minimum useful robots.txt for a content or SaaS site: allow everything, declare your sitemap, block the genuinely sensitive paths.

User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /*?utm_

Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-blog.xml

The wildcard group applies to every crawler; /admin/ and /api/ are not indexable anyway so we save Google crawl budget by skipping them; /*?utm_ blocks every URL with a UTM parameter from being indexed as a duplicate of the canonical; two sitemaps cover the site and the blog independently.

How to use (step-by-step)

Step 1 — Enter the domain: Type a bare domain — no scheme or path.
Step 2 — Click Check: /robots.txt is fetched over HTTPS, with HTTP fallback if needed.
Step 3 — Review the parse: Groups, sitemaps, crawl-delays, and a list of warnings (consecutive user-agent merges, blocked-everything cases, missing sitemaps).

Frequently asked questions

What is robots.txt?

A plain-text file at the root of a domain (e.g. example.com/robots.txt) that tells web crawlers which paths they may or may not request. It is part of the Robots Exclusion Protocol, standardized as RFC 9309 in 2022.

Does robots.txt enforce security?

No. It is purely advisory — well-behaved crawlers respect it, but malicious bots and scrapers ignore it. Never use robots.txt to hide sensitive content; use authentication, robots meta noindex, or X-Robots-Tag instead.

Why does Google ignore my Crawl-delay?

Google explicitly does not support Crawl-delay. You control crawl rate via Google Search Console > Settings > Crawl rate. Bing, Yahoo, and Yandex do respect Crawl-delay.

Should I add a Sitemap directive?

Yes — putting "Sitemap: https://example.com/sitemap.xml" in robots.txt lets crawlers discover your sitemap without manual submission. You can list multiple Sitemap lines for multilingual or media-specific sitemaps.

Related free tools

Meta Tags Analyzer — Title, description, OG, Twitter cards
HTTP Headers Checker — Security score and CDN detection
DNS Lookup — A, MX, TXT, NS records
Redirect Checker — Trace 301/302 chains
All Free Domain Tools — Complete toolkit for SEO research

Try it on popular domains: github.com, google.com, cloudflare.com, openai.com.