Robots.txt Generator
Generate robots.txt file to control search engine crawlers. Create user-agent rules, allow/disallow paths, set crawl delays, and add sitemap URLs. Perfect for managing bot access to your website.
Text_toolHow to Use Robots.txt Generator
What is Robots.txt?
Robots.txt is a text file placed in your website root directory that tells search engine crawlers which pages or sections of your site they can or cannot access. It is part of the Robots Exclusion Protocol (REP), a group of web standards that regulate how robots crawl the web. While not all crawlers respect robots.txt, major search engines like Google, Bing, and Yahoo follow these directives.
How to Use This Tool
Step 1: Choose a Preset
Start with one of the preset configurations:
- Allow All Crawlers: Permit all search engines to crawl everything (default for most sites)
- Standard Website: Block admin, private, and API directories
- Blog/News Site: Block WordPress admin while allowing content directories
- E-commerce Store: Block cart, checkout, and account pages while allowing products
- Block All Crawlers: Prevent all crawlers from indexing your site (staging/development)
Click any preset to instantly load its configuration.
Step 2: Select User-agent
The user-agent specifies which crawler(s) the rules apply to:
All Crawlers (*)
- Applies rules to all search engine bots
- Most common choice for general sites
- Can be overridden by specific user-agent rules
Specific Crawlers:
- Googlebot: Google Search crawler
- Googlebot-Image: Google Image Search
- Bingbot: Microsoft Bing crawler
- Slurp: Yahoo Search crawler
- DuckDuckBot: DuckDuckGo crawler
- Baiduspider: Baidu (Chinese search engine)
- YandexBot: Yandex (Russian search engine)
- facebookexternalhit: Facebook link preview crawler
- Twitterbot: Twitter/X link preview crawler
You can create multiple user-agent blocks with different rules for each crawler.
Step 3: Configure Allow Rules
Allow rules explicitly permit crawlers to access specific paths:
When to use Allow:
- Override broader Disallow rules
- Permit access to specific subdirectories within blocked directories
- Example: Block
/admin/but allow/admin/public/
Path Syntax:
/= Allow root and everything (if no disallow rules)/blog/= Allow blog directory and all subdirectories/products/= Allow products directory- Leave empty if you want to block everything
Best Practices:
- Allow rules take precedence over Disallow rules at the same specificity level
- Use Allow sparingly, primarily to create exceptions
- Most sites do not need explicit Allow rules
Step 4: Configure Disallow Rules
Disallow rules block crawlers from accessing specific paths:
Common Paths to Block:
/admin/= Admin panel, control panel/wp-admin/= WordPress admin dashboard/private/= Private files and directories/temp/or/tmp/= Temporary files/api/= API endpoints/cgi-bin/= CGI scripts/search/= Search results pages (duplicate content)/cart/= Shopping cart pages/checkout/= Checkout flow pages/account/= User account pages/login/and/register/= Authentication pages
Path Syntax:
/= Block everything/admin/= Block admin directory and all subdirectories/secret.html= Block specific file/*?= Block all URLs with query parameters/*.pdf$= Block all PDF files/*sessionid== Block URLs with session IDs
Wildcards:
*= Matches any sequence of characters$= End of URL- Example:
/private/*.pdf$blocks all PDFs in private directory
Step 5: Set Crawl Delay (Optional)
Crawl-delay specifies the number of seconds crawlers should wait between requests:
When to use:
- Limit server load from aggressive crawlers
- Prevent bandwidth exhaustion
- Protect resource-intensive pages
Values:
0= No delay (not recommended, omit the directive instead)1-5= Light delay for fast servers10= Standard delay for most sites (recommended)30-60= Heavy delay for slow servers or heavy scrapers
Important Notes:
- Google ignores Crawl-delay; use Google Search Console instead
- Bing and Yandex respect Crawl-delay
- Too high values may reduce crawling frequency
- Most modern sites do not need this unless experiencing crawler issues
Step 6: Add Sitemap URL (Optional but Recommended)
Sitemap directive tells crawlers where to find your XML sitemap:
Format:
Sitemap: https://example.com/sitemap.xml- Must be absolute URL (include https://)
- Can list multiple sitemaps on separate lines
Benefits:
- Helps search engines discover all your pages
- Improves indexing efficiency
- Provides metadata about page priority and update frequency
Common Sitemap Locations:
/sitemap.xml= Root level (most common)/sitemap_index.xml= Sitemap index file/blog/sitemap.xml= Subdirectory sitemap- Multiple sitemaps are allowed
Step 7: Copy or Download the File
Two options to save your robots.txt:
Copy Button:
- Copies content to clipboard
- Paste into a text editor
- Save as
robots.txt(no file extension)
Download Button:
- Downloads file directly as
robots.txt - Ready to upload to your server
- Preserves correct formatting
Step 8: Upload to Your Website
Upload robots.txt to your website root directory:
File Location:
- Must be at:
https://yoursite.com/robots.txt - NOT in subdirectories:
(will not work)/blog/robots.txt - NOT with different names:
(invalid)robots.txt.txt - Case-sensitive:
robots.txtnotRobots.txt
Upload Methods:
FTP/SFTP:
- Connect to your server via FTP client (FileZilla, Cyberduck)
- Navigate to root directory (public_html, www, or htdocs)
- Upload robots.txt file
- Set file permissions to 644 (readable by all)
cPanel File Manager:
- Log into cPanel
- Open File Manager
- Navigate to public_html directory
- Upload robots.txt file
- Verify file is not hidden
WordPress:
- Use FTP to upload to root directory (same level as wp-config.php)
- Or use All in One SEO / Yoast SEO plugin robots.txt editor
- Some WordPress plugins auto-generate robots.txt (check first)
Next.js/Vercel:
- Place robots.txt in
/public/directory - Deployed to root automatically
- Or use next-sitemap package for dynamic generation
Nginx:
- Upload to web root directory (usually /var/www/html)
- Ensure proper permissions (644)
- Restart Nginx if needed
Step 9: Test Your Robots.txt
Verify your robots.txt file is working correctly:
Manual Check:
- Visit:
https://yoursite.com/robots.txt - Verify content displays correctly in browser
- Check for any 404 errors
Google Search Console:
- Go to: search.google.com/search-console
- Select your property
- Navigate to Legacy Tools & Reports → robots.txt Tester
- Enter a URL to test if it is blocked or allowed
- Submit robots.txt for indexing
Bing Webmaster Tools:
- Go to: bing.com/webmasters
- Select your site
- Go to Configure My Site → Crawl Control
- View current robots.txt
- Test URLs against rules
Online Validators:
- Ryte.com robots.txt validator
- Technical SEO robots.txt tester
- Screaming Frog robots.txt analyzer
Robots.txt Syntax and Rules
Basic Structure
User-agent: *
Allow: /
Disallow: /private/
Crawl-delay: 10
Sitemap: https://example.com/sitemap.xml
Multiple User-agent Blocks
You can define different rules for different crawlers:
# Allow Googlebot to access everything
User-agent: Googlebot
Allow: /
# Block other crawlers from certain paths
User-agent: *
Disallow: /private/
Disallow: /admin/
Path Matching Rules
Exact Path:
Disallow: /admin/
Blocks: /admin/, /admin/users/, /admin/settings.php Allows: /administrator/ (different path)
Wildcard (*):
Disallow: /search*
Blocks: /search, /search/, /search?q=test, /searchresults
End Anchor ($):
Disallow: /*.pdf$
Blocks: /documents/file.pdf, /downloads/guide.pdf Allows: /pdf/ (directory, not file)
Query Parameters:
Disallow: /*?
Blocks all URLs with query parameters
Case Sensitivity:
- Paths are case-sensitive:
/Admin/≠/admin/ - User-agent names are case-insensitive:
Googlebot=googlebot
Allow vs Disallow Priority
When rules conflict, most specific rule wins:
User-agent: *
Disallow: /admin/
Allow: /admin/public/
Result: /admin/public/ is allowed, rest of /admin/ is blocked
Comments
Use # for comments:
# Block admin area
User-agent: *
Disallow: /admin/ # Admin panel
# Allow public resources
Allow: /public/
Common Use Cases
Standard Public Website
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml
Allows all crawlers to index everything.
WordPress Site
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Allow: /wp-admin/admin-ajax.php
Sitemap: https://example.com/sitemap.xml
Blocks WordPress admin while allowing AJAX endpoints.
E-commerce Store
User-agent: *
Allow: /
Allow: /products/
Allow: /categories/
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /search/
Disallow: /admin/
Sitemap: https://example.com/sitemap.xml
Allows product pages, blocks transactional pages.
Staging/Development Site
User-agent: *
Disallow: /
Blocks all crawlers from accessing the entire site.
Block Specific Crawlers
# Block bad bots
User-agent: BadBot
User-agent: ScraperBot
Disallow: /
# Allow good bots
User-agent: *
Allow: /
Blocks specific malicious crawlers.
Prevent Image Indexing
User-agent: Googlebot-Image
Disallow: /images/
User-agent: *
Allow: /
Blocks Google from indexing images while allowing text crawling.
Important Limitations
Robots.txt is NOT Security
What it does NOT do:
- Does not prevent malicious bots from accessing pages
- Does not hide pages from search results if linked externally
- Does not remove pages already indexed by search engines
- Can be ignored by any crawler (it is just a request, not enforcement)
For actual security:
- Use password protection (.htaccess, server authentication)
- Implement IP whitelisting
- Use noindex meta tags or X-Robots-Tag headers
- Apply proper file permissions
Cannot Remove Indexed Pages
If pages are already indexed:
- Robots.txt will not remove them from search results
- Use
noindexmeta tag instead:<meta name="robots" content="noindex"> - Or use X-Robots-Tag HTTP header
- Then request removal in Google Search Console
File Must Be Accessible
- Robots.txt must return 200 OK status
- Must be plain text (text/plain)
- Must be UTF-8 encoded
- Maximum size: 500 KiB (recommended under 100 KB)
- Cannot use redirects (301/302)
Troubleshooting
Robots.txt Not Working?
Check File Location:
- Must be at exact path:
/robots.txt - Not in subdirectory or with wrong name
- Case-sensitive filename
Verify File Permissions:
- Set to 644 (readable by all)
- Not executable (do not use 777)
Test Accessibility:
- Visit https://yoursite.com/robots.txt in browser
- Should display text content
- Check for 404, 403, or 500 errors
Syntax Errors:
- No syntax errors in directives
- Check spelling:
User-agentnotUser-AgentorUseragent - No extra spaces or special characters
Pages Still Being Indexed?
Solutions:
- Add noindex meta tag:
<meta name="robots" content="noindex"> - Wait for next crawl (can take weeks)
- Use Google Search Console Removals tool
- Check for external links pointing to blocked pages
Frequently Asked Questions
Related Marketing & SEO Tools
Meta Tag Generator
Generate HTML meta tags for SEO optimization. Create title, description, keywords, viewport, charset, robots, and author meta tags. Perfect for improving search engine rankings and social sharing.
Use Tool →Google SERP Simulator
Preview how your title and meta description appear in Google search results. See real-time character counts, pixel width estimates, and desktop/mobile previews to optimize your SEO.
Use Tool →FAQ Schema Generator
Generate JSON-LD FAQPage schema markup for SEO. Add questions and answers to create structured data that helps search engines display FAQ rich snippets in search results.
Use Tool →Breadcrumb Schema Generator
Generate JSON-LD BreadcrumbList schema markup for SEO. Add breadcrumb items with names and URLs to create structured data that helps search engines understand your site hierarchy.
Use Tool →Twitter Card Generator
Generate Twitter Card meta tags for Twitter/X sharing. Create summary cards, large image cards, app cards, and player cards. Optimize how your links appear on Twitter with custom titles, descriptions, and images.
Use Tool →Open Graph Generator
Generate Facebook Open Graph meta tags for social media sharing. Create og:title, og:description, og:image, og:url, and og:type tags. Perfect for optimizing how your links appear on Facebook, LinkedIn, WhatsApp, and Slack.
Use Tool →Product Schema Generator
Generate JSON-LD Product schema markup for SEO. Add product details like name, price, brand, rating, and availability to create structured data for rich search results.
Use Tool →Share Your Feedback
Help us improve this tool by sharing your experience