As of today we’re happy to introduce the Robot Exclusion Protocol (historic name), or as it’s more commonly known: a robots.txt file for your platform!
What is a Robots.txt file?
First introduced as far back as 1994, the Robots.txt file was created as a solution to keep badly-written crawlers or search engines away from your content. Moving on to present times, it’s a pretty standard way to tell Search Engines what content you would like to have be made available in their search results, and what content you would like to remain hidden.What can I do with a Robots.txt file?
With this file you can tell search engines how to crawl and index the content of your platform, you can configure the following settings in Control:- Write crawl and index guidelines for (specific) user-agents
- Tell user-agents (Search Engines) which content they are allowed to crawl and index (allow) and which they can not crawl and index (disallow)
- Link to a hosted XML sitemap.
FAQ
Q: How do I setup a Robots.txt file for my platformA: Read this guide to get started: How To Setup a Robots.txt file
Q: Where can I see the Robots.txt file in Control
A: This feature is available to Administrators and can be found in Control > System Configuration > Robots.txt
Q: I don’t have a Robots.txt file. Will search engines still crawl my platform
A: Yes. If search engines can’t find a Robots.txt file they assume there will be no guidelines and they will crawl your entire platform.
Q: Which search engines support a Robots.txt file
A: Google, Bing, Yahoo, DuckDuckGo, Yandex, Baidu
Q: What can I do with a sitemap
A: Sitemaps tell search engines which content is available and where to find it. Although it’s a different functionality it is sometimes pointed to in the Robots.txt file. We don’t offer the functionality to create an XML sitemap inside our platform, but if you created an XML sitemap yourself you can use the Robots.txt file to link to it.
Q: Can I link to sitemaps
A: Yes. Make sure you point to an absolute URL and your sitemap is an .xml format. e.g. https://www.example.com/sitemap1.xml