How to Create a Robots.txt file your Magento/ Magento 2 Store?
What is a Robots.txt file?
Robots.txt file is primary indication for search engine and other bots to indicate the crawlers which pages to be crawled and which should not be. This will allow great security of content for websites to protect the sensitive information through bots and to avoid data leakage.
Understanding the top robots.txt commands
Creation of robots.txt file has 3 different components, below are the details of those.
User-agent : * , This command indicated that all the robots of search engines, in case if you wanted to create robots.txt file for one specific search engine you can also do that.
Allow: / , this indicates bots are allowed to crawl the entire files under the root.
Disallow:/, this indicated all the files under the root should not be crawled and indexed.
Alternatively one can make use of robots Meta tags to indicate bots on which pages to crawl and which to index. This will be applied in complex situations like where you wanted the robot to crawl your pages and do want some of them to get indexed in that case one can user the Meta tag “follow, no-index”.
How does Robots.txt differ for normal websites to Magento?
Magento is one of the most used eCommerce CMS , since eCommerce deals with most sensitive data like user email, name, password , banking details and etc it’s important to indicate search engine bots on what files to be not crawled and indexed. Below are some examples.
Top 5 Robots.txt examples for your Magento/Magento2 Store
- When you are not ready and you do not want any search engines to index your website. This robots.txt file blocks search engine robots from crawling and indexing your website.User-agent:*
- When you are ready and don’t have anything that needs to be restricted from search engine bots. This robots.txt file will give full library index of your website and index all pages.
- When you want to search engine crawlers give full access and at the same time restrict the bots from crawling and indexing of the information.User-agent:*
- When you need to have clean url index by removing indexing of search pages and etc.User-agent:*
- List of directories suggested to block for Magento.
# # Robots.txt for Magento Community and Enterprise
# # GENERAL SETTINGS
# Enables robots.txt rules for all crawlers
# # Crawl-delay parameter: The number of seconds you want to wait between successful requests to the same server.
# # Set a crawl rate, if your server’s traffic problems. Please note that Google ignore crawl-delay setting in Robots.txt. You can set up this in Google Webmaster tool
# Crawl-delay: 30
# # Magento sitemap: URL to your sitemap file in Magento
# Sitemap: http://www.mystore.com/sitemap/sitemap.xml
# # Settings that relate to the UNDER CONSTRUCTION
# # Do not allow indexing files and folders that are required during development: CVS, SVN directory and dump files
Disallow: / CVS
Disallow: / *. Svn $
Disallow: / *. Idea $
Disallow: / *. Sql $
Disallow: / *. Tgz $
# # GENERAL SETTINGS For MAGENTO
# # Do not index the page Magento admin
Disallow: / admin /
# # Do not index the general technical Magento directory
Disallow: / app /
Disallow: / downloader /
Disallow: / errors /
Disallow: / includes /
Disallow: / lib /
Disallow: / pkginfo /
Disallow: / shell /
Disallow: / var /
# # Do not index the shared files Magento
Disallow: / api.php
Disallow: / cron.php
Disallow: / cron.sh
Disallow: / error_log
Disallow: / get.php
Disallow: / install.php
Disallow: / LICENSE.html
Disallow: / LICENSE.txt
Disallow: / LICENSE_AFL.txt
Disallow: / README.txt
Disallow: / RELEASE_NOTES.txt
# # MAGENTO SEA IMPROVEMENT
# # Do not index the page subcategories that are sorted or filtered.
Disallow: / *? Dir *
Disallow: / *? Dir = desc
Disallow: / *? Dir = asc
Disallow: / *? Limit = all
Disallow: / *? Mode *
# # Do not index the second copy of the home page (example.com / index.php /). Un-comment only if you have activated Magento SEO URLs.
# # Disallow: / index.php /
# # Do not index the link from the session ID
Disallow: / *? SID =
# # Do not index the page checkout and user account
Disallow: / checkout /
Disallow: / onestepcheckout /
Disallow: / customer /
Disallow: / customer / account /
Disallow: / customer / account / login /
# # Do not index the search page and CEO, non-optimized link categories
Disallow: / catalogsearch /
Disallow: / catalog / product_compare /
Disallow: / catalog / category / view /
Disallow: / catalog / product / view /
# # Server Settings
# # Do not index the general technical directories and files on a server
Disallow: / cgi-bin /
Disallow: / cleanup.php
Disallow: / apc.php
Disallow: / memcache.php
Disallow: / phpinfo.php
# # SETTINGS Image indexing
# # Optional: If you do not want to Google and Bing to index your images
# User-agent: Googlebot-Image
# Disallow: /
# User-agent: msnbot-media
# Disallow: /