🤖

Generate Robots.txt Files Instantly (Control Crawlers, SEO-Ready)

Generate robots.txt file to control search engine crawlers. Create user-agent rules, allow/disallow paths, set crawl delays, and add sitemap URLs. Perfect for managing bot access to your website.

Text_toolDevOps & Infrastructure
Loading tool...

How to Use Robots.txt Generator

What is Robots.txt?

Robots.txt is a text file placed in your website root directory that tells search engine crawlers which pages or sections of your site they can or cannot access. It is part of the Robots Exclusion Protocol (REP), a group of web standards that regulate how robots crawl the web. While not all crawlers respect robots.txt, major search engines like Google, Bing, and Yahoo follow these directives.

How to Use This Tool

Step 1: Choose a Preset

Start with one of the preset configurations:

  • Allow All Crawlers: Permit all search engines to crawl everything (default for most sites)
  • Standard Website: Block admin, private, and API directories
  • Blog/News Site: Block WordPress admin while allowing content directories
  • E-commerce Store: Block cart, checkout, and account pages while allowing products
  • Block All Crawlers: Prevent all crawlers from indexing your site (staging/development)

Click any preset to instantly load its configuration.

Step 2: Select User-agent

The user-agent specifies which crawler(s) the rules apply to:

All Crawlers (*)

  • Applies rules to all search engine bots
  • Most common choice for general sites
  • Can be overridden by specific user-agent rules

Specific Crawlers:

  • Googlebot: Google Search crawler
  • Googlebot-Image: Google Image Search
  • Bingbot: Microsoft Bing crawler
  • Slurp: Yahoo Search crawler
  • DuckDuckBot: DuckDuckGo crawler
  • Baiduspider: Baidu (Chinese search engine)
  • YandexBot: Yandex (Russian search engine)
  • facebookexternalhit: Facebook link preview crawler
  • Twitterbot: Twitter/X link preview crawler

You can create multiple user-agent blocks with different rules for each crawler.

Step 3: Configure Allow Rules

Allow rules explicitly permit crawlers to access specific paths:

When to use Allow:

  • Override broader Disallow rules
  • Permit access to specific subdirectories within blocked directories
  • Example: Block /admin/ but allow /admin/public/

Path Syntax:

  • / = Allow root and everything (if no disallow rules)
  • /blog/ = Allow blog directory and all subdirectories
  • /products/ = Allow products directory
  • Leave empty if you want to block everything

Best Practices:

  • Allow rules take precedence over Disallow rules at the same specificity level
  • Use Allow sparingly, primarily to create exceptions
  • Most sites do not need explicit Allow rules

Step 4: Configure Disallow Rules

Disallow rules block crawlers from accessing specific paths:

Common Paths to Block:

  • /admin/ = Admin panel, control panel
  • /wp-admin/ = WordPress admin dashboard
  • /private/ = Private files and directories
  • /temp/ or /tmp/ = Temporary files
  • /api/ = API endpoints
  • /cgi-bin/ = CGI scripts
  • /search/ = Search results pages (duplicate content)
  • /cart/ = Shopping cart pages
  • /checkout/ = Checkout flow pages
  • /account/ = User account pages
  • /login/ and /register/ = Authentication pages

Path Syntax:

  • / = Block everything
  • /admin/ = Block admin directory and all subdirectories
  • /secret.html = Block specific file
  • /*? = Block all URLs with query parameters
  • /*.pdf$ = Block all PDF files
  • /*sessionid= = Block URLs with session IDs

Wildcards:

  • * = Matches any sequence of characters
  • $ = End of URL
  • Example: /private/*.pdf$ blocks all PDFs in private directory

Step 5: Set Crawl Delay (Optional)

Crawl-delay specifies the number of seconds crawlers should wait between requests:

When to use:

  • Limit server load from aggressive crawlers
  • Prevent bandwidth exhaustion
  • Protect resource-intensive pages

Values:

  • 0 = No delay (not recommended, omit the directive instead)
  • 1-5 = Light delay for fast servers
  • 10 = Standard delay for most sites (recommended)
  • 30-60 = Heavy delay for slow servers or heavy scrapers

Important Notes:

  • Google ignores Crawl-delay; use Google Search Console instead
  • Bing and Yandex respect Crawl-delay
  • Too high values may reduce crawling frequency
  • Most modern sites do not need this unless experiencing crawler issues

Step 6: Add Sitemap URL (Optional but Recommended)

Sitemap directive tells crawlers where to find your XML sitemap:

Format:

  • Sitemap: https://example.com/sitemap.xml
  • Must be absolute URL (include https://)
  • Can list multiple sitemaps on separate lines

Benefits:

  • Helps search engines discover all your pages
  • Improves indexing efficiency
  • Provides metadata about page priority and update frequency

Common Sitemap Locations:

  • /sitemap.xml = Root level (most common)
  • /sitemap_index.xml = Sitemap index file
  • /blog/sitemap.xml = Subdirectory sitemap
  • Multiple sitemaps are allowed

Step 7: Copy or Download the File

Two options to save your robots.txt:

Copy Button:

  • Copies content to clipboard
  • Paste into a text editor
  • Save as robots.txt (no file extension)

Download Button:

  • Downloads file directly as robots.txt
  • Ready to upload to your server
  • Preserves correct formatting

Step 8: Upload to Your Website

Upload robots.txt to your website root directory:

File Location:

  • Must be at: https://yoursite.com/robots.txt
  • NOT in subdirectories: /blog/robots.txt (will not work)
  • NOT with different names: robots.txt.txt (invalid)
  • Case-sensitive: robots.txt not Robots.txt

Upload Methods:

FTP/SFTP:

  1. Connect to your server via FTP client (FileZilla, Cyberduck)
  2. Navigate to root directory (public_html, www, or htdocs)
  3. Upload robots.txt file
  4. Set file permissions to 644 (readable by all)

cPanel File Manager:

  1. Log into cPanel
  2. Open File Manager
  3. Navigate to public_html directory
  4. Upload robots.txt file
  5. Verify file is not hidden

WordPress:

  1. Use FTP to upload to root directory (same level as wp-config.php)
  2. Or use All in One SEO / Yoast SEO plugin robots.txt editor
  3. Some WordPress plugins auto-generate robots.txt (check first)

Next.js/Vercel:

  1. Place robots.txt in /public/ directory
  2. Deployed to root automatically
  3. Or use next-sitemap package for dynamic generation

Nginx:

  1. Upload to web root directory (usually /var/www/html)
  2. Ensure proper permissions (644)
  3. Restart Nginx if needed

Step 9: Test Your Robots.txt

Verify your robots.txt file is working correctly:

Manual Check:

  1. Visit: https://yoursite.com/robots.txt
  2. Verify content displays correctly in browser
  3. Check for any 404 errors

Google Search Console:

  1. Go to: search.google.com/search-console
  2. Select your property
  3. Navigate to Legacy Tools & Reports → robots.txt Tester
  4. Enter a URL to test if it is blocked or allowed
  5. Submit robots.txt for indexing

Bing Webmaster Tools:

  1. Go to: bing.com/webmasters
  2. Select your site
  3. Go to Configure My Site → Crawl Control
  4. View current robots.txt
  5. Test URLs against rules

Online Validators:

  • Ryte.com robots.txt validator
  • Technical SEO robots.txt tester
  • Screaming Frog robots.txt analyzer

Robots.txt Syntax and Rules

Basic Structure

User-agent: * Allow: / Disallow: /private/ Crawl-delay: 10 Sitemap: https://example.com/sitemap.xml

Multiple User-agent Blocks

You can define different rules for different crawlers:

# Allow Googlebot to access everything User-agent: Googlebot Allow: / # Block other crawlers from certain paths User-agent: * Disallow: /private/ Disallow: /admin/

Path Matching Rules

Exact Path:

Disallow: /admin/

Blocks: /admin/, /admin/users/, /admin/settings.php Allows: /administrator/ (different path)

Wildcard (*):

Disallow: /search*

Blocks: /search, /search/, /search?q=test, /searchresults

End Anchor ($):

Disallow: /*.pdf$

Blocks: /documents/file.pdf, /downloads/guide.pdf Allows: /pdf/ (directory, not file)

Query Parameters:

Disallow: /*?

Blocks all URLs with query parameters

Case Sensitivity:

  • Paths are case-sensitive: /Admin//admin/
  • User-agent names are case-insensitive: Googlebot = googlebot

Allow vs Disallow Priority

When rules conflict, most specific rule wins:

User-agent: * Disallow: /admin/ Allow: /admin/public/

Result: /admin/public/ is allowed, rest of /admin/ is blocked

Comments

Use # for comments:

# Block admin area User-agent: * Disallow: /admin/ # Admin panel # Allow public resources Allow: /public/

Common Use Cases

Standard Public Website

User-agent: * Allow: / Sitemap: https://example.com/sitemap.xml

Allows all crawlers to index everything.

WordPress Site

User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/ Allow: /wp-admin/admin-ajax.php Sitemap: https://example.com/sitemap.xml

Blocks WordPress admin while allowing AJAX endpoints.

E-commerce Store

User-agent: * Allow: / Allow: /products/ Allow: /categories/ Disallow: /cart/ Disallow: /checkout/ Disallow: /account/ Disallow: /search/ Disallow: /admin/ Sitemap: https://example.com/sitemap.xml

Allows product pages, blocks transactional pages.

Staging/Development Site

User-agent: * Disallow: /

Blocks all crawlers from accessing the entire site.

Block Specific Crawlers

# Block bad bots User-agent: BadBot User-agent: ScraperBot Disallow: / # Allow good bots User-agent: * Allow: /

Blocks specific malicious crawlers.

Prevent Image Indexing

User-agent: Googlebot-Image Disallow: /images/ User-agent: * Allow: /

Blocks Google from indexing images while allowing text crawling.

Important Limitations

Robots.txt is NOT Security

What it does NOT do:

  • Does not prevent malicious bots from accessing pages
  • Does not hide pages from search results if linked externally
  • Does not remove pages already indexed by search engines
  • Can be ignored by any crawler (it is just a request, not enforcement)

For actual security:

  • Use password protection (.htaccess, server authentication)
  • Implement IP whitelisting
  • Use noindex meta tags or X-Robots-Tag headers
  • Apply proper file permissions

Cannot Remove Indexed Pages

If pages are already indexed:

  • Robots.txt will not remove them from search results
  • Use noindex meta tag instead: <meta name="robots" content="noindex">
  • Or use X-Robots-Tag HTTP header
  • Then request removal in Google Search Console

File Must Be Accessible

  • Robots.txt must return 200 OK status
  • Must be plain text (text/plain)
  • Must be UTF-8 encoded
  • Maximum size: 500 KiB (recommended under 100 KB)
  • Cannot use redirects (301/302)

Troubleshooting

Robots.txt Not Working?

Check File Location:

  • Must be at exact path: /robots.txt
  • Not in subdirectory or with wrong name
  • Case-sensitive filename

Verify File Permissions:

  • Set to 644 (readable by all)
  • Not executable (do not use 777)

Test Accessibility:

Syntax Errors:

  • No syntax errors in directives
  • Check spelling: User-agent not User-Agent or Useragent
  • No extra spaces or special characters

Pages Still Being Indexed?

Solutions:

  • Add noindex meta tag: <meta name="robots" content="noindex">
  • Wait for next crawl (can take weeks)
  • Use Google Search Console Removals tool
  • Check for external links pointing to blocked pages

Frequently Asked Questions

Most Viewed Tools

🔐

TOTP Code Generator

2,997 views

Generate time-based one-time passwords from a TOTP secret key. Enter your base32 secret, choose a period and digit length, and get the current and next codes with a live countdown timer. Useful for testing and debugging 2FA integrations.

Use Tool →
{ }

JSON to Zod Schema Generator

2,982 views

Generate Zod validation schema code from a JSON sample object. Infers z.string(), z.number(), z.boolean(), z.array(), z.object(), and z.null() types automatically. Handles nested objects, arrays of objects with optional field detection, and outputs copy-ready TypeScript with import and z.infer type alias.

Use Tool →
{}

JSONL / NDJSON Formatter

2,912 views

Format, validate, and inspect JSON Lines (JSONL) and NDJSON files. Validates each line individually, reports parse errors by line number, outputs compact JSONL or a pretty-print preview, and lets you download the cleaned file.

Use Tool →
🔍

Secret and Credential Scanner

2,521 views

Scan pasted text, code, or config files for accidentally exposed API keys, tokens, passwords, and private keys. Detects 50+ secret types across AWS, GitHub, Stripe, OpenAI, and more — all client-side, nothing leaves your browser.

Use Tool →
🔐

TLS Cipher Suite Checker

2,486 views

Check TLS protocol version compatibility and cipher suite strength ratings against current best practices. Supports IANA and OpenSSL cipher names — rates each suite as Strong, Weak, or Deprecated and explains why.

Use Tool →
🔑

Password Entropy Calculator

2,484 views

Calculate the information-theoretic bit entropy of any password or API key. Detects character set pools automatically, shows the total number of possible combinations, and estimates crack time across five attack scenarios from rate-limited web logins to GPU cracking clusters.

Use Tool →

TOML Config Validator

2,247 views

Validate TOML configuration file syntax and report errors with line numbers. Paste any TOML content — Cargo.toml, pyproject.toml, config.toml — and instantly see a green checkmark with key counts and structure stats, or a precise error message pointing to the exact line. Includes a collapsible JSON structure preview to confirm what was parsed.

Use Tool →
🔒

Content Security Policy Generator

2,112 views

Build Content Security Policy headers interactively. Toggle directives like script-src, style-src, and img-src, select allowed source tokens, and add custom origins. Instantly outputs your CSP as an HTTP header, meta tag, Nginx directive, or Apache header.

Use Tool →

Related DevOps & Infrastructure Tools

🔗

Query String Parser

Parse URL query strings into readable key-value pairs. Decode parameters and inspect URL search queries with ease.

Use Tool →
📋

API Response Formatter

Format and beautify API responses for better readability. JSON formatter with minify and prettify options.

Use Tool →
🔒

SSL Certificate Validator

Paste a PEM certificate to instantly validate expiry, signature algorithm, key strength, SAN presence, and trust chain. Get a clear pass/warn/fail report for each check.

Use Tool →
🍪

Cookie Parser

Parse HTTP cookie strings into readable key-value pairs. Decode URL-encoded values and inspect cookies from browser requests.

Use Tool →

Cron Expression Validator

Validate cron expressions, get a plain-English explanation of what they mean, and see the next scheduled run times — all in your browser.

Use Tool →
🤖

robots.txt Validator

Validate your robots.txt file against the Robots Exclusion Protocol. Checks directive syntax, path formats, Crawl-delay values, and Sitemap URLs. Previews crawl rules per user-agent group. Free and runs entirely in your browser.

Use Tool →
🗺️

Sitemap Validator

Validate XML sitemaps against the sitemap protocol specification. Checks structure, required fields, URL count, changefreq values, and priority ranges. Supports both URL sitemaps and sitemap index files. Free and runs entirely in your browser.

Use Tool →
🔍

HTTP Header Analyzer

Parse and analyze HTTP request or response headers. Identifies categories, explains each header, flags missing security headers, and detects duplicates or suspicious values — entirely in your browser.

Use Tool →

Share Your Feedback

Help us improve this tool by sharing your experience

We will only use this to follow up on your feedback