Create a robots.txt for your Magento 2 shop

in Magento

As Magento 2 provides a mechanism for creating a robots.txt file, there is no need to manually create one. All you need to do is add some configuration in nginx and Magento itself and a robots.txt will be generated periodically using cron.

Configure magento for generating robots.txt

To generate a robots.txt file, use the following steps:

  • Login in your Magento admin backend
  • Select Store -> Configuration
  • Select General -> Design
  • Select the dropdown Search Engine Robots.
  • Select the default storefront or one you want to create a robots.txt for.

  • For Default Robots, use one of the available options:

    • INDEX, FOLLOW: Tell crawlers to index the site and to keep doing this periodically.
    • NOINDEX, FOLLOW: Tell crawlers not to index the site but to check for changes of this policy periodically.
    • INDEX, NOFOLLOW: Tell crawlers to index the shop just once and don’t check for changes periodically.
    • NOINDEX, NOFOLLOW: Tell crawlers not to index the shop and don’t check for changes periodically.

After this is done, click the Reset to Default button to add the default robots.txt instructions to the custom instructions field.
You can now add your custom instructions to the default. After editing, click the Save Config button to save your robots.txt file to disk.

Nginx configuration

Configure nginx to serve one robots.txt for all storefronts

When you use a single robots.txt file for all your storefronts, configuring your robots.txt within nginx is fairly simple.

  • Create a symlink in /data/web/public that points to the robots.txt file in /data/web/magento:
    ln -s /data/web/magento2/robots.txt /data/web/public/robots.txt

NB: There are some known open bugs about the location of the robots.txt. It is possible that this location will change in the near future as the magento developers find a solution to circumvent these bugs.

Configure nginx serve a different robots.txt for each storefront

Magento 2 currently does not support multiple robots.txt files for each storefront of website, which is surprising as the rest of the suite is fully designed for use with multiple storefronts. Therefore we should use a little workaround and move the robots.txt file to another location after each time we save the changes in our robots.txt files.

The Magento2 robots.txt implementation is currently very buggy. If you use multiple storefronts, Magento 2 will save each robots.txt over the previous one, erasing all changes made for other storefronts.

  • As a workaround, we need to create a new location for our robots.txt files:
    mkdir -p /data/web/magento2/robots
    
  • Then move the default (global) robots.txt to this directory.
    We will use the current robots.txt as the fallback when the robots.txt for the corresponding storefront is missing:

    cp /data/web/magento2/robots.txt /data/web/magento2/robots/robots.txt
    
  • Now adjust the robots.txt settings in the backend for the first individual storefront.
    For each storefront, adjust the default settings to your needs and save your robots.txt using the big Save Config button.

  • After you clicked save, copy or move the generated storefront to it’s definitive location.
    We use the naming convention robots_$storecode.txt. IE: If your storecode is shop_nl, then use the filename robots_shop_nl.txt:

    cp /data/web/magento2/robots.txt /data/web/magento2/robots/robots_shop_nl.txt
    
  • Alternatively you can manually create a working robots.txt for each storefront and save those in /data/web/magento2/robots and edit those afterwards:
    for storefront in $( n98-magerun2 sys:store:list --format=csv | sed 1d | cut -d, -f2 ); do
      cp /data/web/magento2/robots.txt /data/web/magento2/robots/robots_${storefront}.txt
    done
    
  • When this is done we need to add some nginx configuration to make sure the correct storefront is served with each storefront.
    Save a snippet as /data/web/nginx/server.robots with the following content:

    location @robots_fallback {
      root /data/web/magento2/robots;
    }
    
    location = /robots.txt {
      alias /data/web/magento2/robots/robots_$storecode.txt;
      error_page 404 = @robots_fallback;
    }
    

Now test your robots.txt by requesting and verify whether the right sitemap is served:

curl -v https://www.example.com/robots.txt

Examples

Frequently used robots.txt examples

  • Allow full access to all directories and pages:
    User-agent:*
    Disallow:

  • Don’t allow access for any user-agent to any directory and page:
    User-agent:*
    Disallow:

  • The recommended default settings for magento 2:

Additional resources

  • https://developers.google.com/webmasters/control-crawl-index/
  • https://support.google.com/webmasters/answer/6062596?hl=en
  • http://www.robotstxt.org/

0