October 17, 2013

Using the Sitemap XML Module in a Hardened, Multi-site Environment

By: Craig Taylor
October 17, 2013

using the sitemap xml module in a multi-site environment

We are using the Sitecore Sitemap XML module (2.0) for one of our clients.  Well, we're using it for a number of our clients, but let's just talk about the one client for now.

The module is pretty neat in that it generates a robots.txt and sitemap.xml files for your sites automatically on publish.  It also has the ability to automatically submit the generated sitemap(s) to a number of specified search engines.

The Problem

The module allows for multi-site instances through the configuration file.  You simply specify the Sitecore sites that should be included and the module will generate the necessary robots and sitemap file entries.  So if the module is already allowing for multi-site instances, what's the problem then?  The problem is that if you have followed the security hardening guide for your Sitecore environment, (you did, didn't you??)  you have prevented Sitecore from serving XML files.  The search engine crawlers are expecting the sitemap in an XML format and since Sitecore has been hardened, it refuses to serve the sitemap files.

The Solution

What we need to do is allow Sitecore to serve the sitemap XML files, but not serve up any other XML files that we want to keep secure. (license.xml file comes to mind. . . )

The first step is to tell sitecore to ignore requests to the sitemap files.  I have a naming convention in place in which all sitemaps on the client's instance are configured as "sitemap_<site_name>.xml."  With this standard in place, I updated "IgnoreUrlPrefixes" setting in the web.config as follows:

<setting name="IgnoreUrlPrefixes" value="/sitecore/default.aspx|/trace.axd|/webresource.axd|...|/sitemap_*.xml" />

This let's Sitecore know that it should not try to resolve to an item in the content tree if a request comes in for an XML sitemap.

The second step is to allow Sitecore to serve the XML file.  My client is using IIS 7, so I updated the handlers in the "system.webServer" node in the web.config.  I added a handler that only lets the sitemap XML files be served:

<add verb="GET" path="sitemap_*.xml" type="System.Web.StaticFileHandler" name="allow xml sitemap" />

I now have a multi-site solution that will automatically generate and manage the robots.txt and XML sitemaps that can be properly indexed by the search engines while still remaining secure.

0 comments:

Post a Comment