release

Now available: Robots.txt file

Now available: Robots.txt file
Ugh, Robots…! They can do anything, are taking over the most mundane tasks and worst of all: they are scanning my platform! We have received a lot of feedback via inSpired, support and the CSM team that you want to be able to control the search-engine crawlers. We listened:





As of today we’re happy to introduce the Robot Exclusion Protocol (historic name), or as it’s more commonly known: a robots.txt file for your platform!





What is a Robots.txt file?

First introduced as far back as 1994, the Robots.txt file was created as a solution to keep badly-written crawlers or search engines away from your content. Moving on to present times, it’s a pretty standard way to tell Search Engines what content you would like to have be made available in their search results, and what content you would like to remain hidden.





What can I do with a Robots.txt file?

With this file you can tell search engines how to crawl and index the content of your platform, you can configure the following settings in Control:




  • Write crawl and index guidelines for (specific) user-agents

  • Tell user-agents (Search Engines) which content they are allowed to crawl and index (allow) and which they can not crawl and index (disallow)

  • Link to a hosted XML sitemap.

FAQ

Q: How do I setup a Robots.txt file for my platform


A: Read this guide to get started: How To Setup a Robots.txt file





Q: Where can I see the Robots.txt file in Control


A: This feature is available to Administrators and can be found in Control > System Configuration > Robots.txt





Q: I don’t have a Robots.txt file. Will search engines still crawl my platform


A: Yes. If search engines can’t find a Robots.txt file they assume there will be no guidelines and they will crawl your entire platform.





Q: Which search engines support a Robots.txt file


A: Google, Bing, Yahoo, DuckDuckGo, Yandex, Baidu





Q: What can I do with a sitemap


A: Sitemaps tell search engines which content is available and where to find it. Although it’s a different functionality it is sometimes pointed to in the Robots.txt file. We don’t offer the functionality to create an XML sitemap inside our platform, but if you created an XML sitemap yourself you can use the Robots.txt file to link to it.





Q: Can I link to sitemaps


A: Yes. Make sure you point to an absolute URL and your sitemap is an .xml format. e.g. https://www.example.com/sitemap1.xml
@Yoeri Great post, really clear! And LOL because of 'Ugh, Robots…!' 🤣
Very, very minor improvement request (doesn't even have directly to do with the "robots.txt" function):





The Control Panel navigation has two levels. When you click on the first level element, it opens the second level - aka the function / setting. After opening the function / setting the second level navigation stays open.


The only function / setting where this is not happening is the "robots.txt". ;)





Did I say it was very, very minor?