October 22, 2013

Publishing Your XML Sitemaps to Your Content Delivery Servers

October 22, 2013

Publishing Your XML Sitemaps to Your Content Delivery Servers
My most-recent post talked about how you can utilize the Sitecore Sitemap XML module in order to automatically build robots.txt and XML sitemap files in a mulit-site, hardened environment so that the search engines can see your sitemap(s).  This works great, but a new issue arises if you have separate Content Managment (CM) and Content Delivery (CD) servers.

The Problem

When your content is published, the Sitemap XML module waits for the "publish:end" event to start building the robots.txt and XML sitemap(s).  It does so and stores the files in the file system.  The issue with this is that the processing and creation of files all happens on the CM server, which is behind a firewall.  The CD servers are the web-facing servers that will be serving the sitemap and robots.txt files.  We need the files on the CD servers.

The Solution

I went through a number of iterations in my head about how I could get the robots.txt and XML sitemap files to the CD servers.  I thought about scheduled jobs in Sitecore.  I thought about using an external application that is used for file replication.  I thought about a custom application inside of Sitecore so that a content author could manually kick off a deploy.

After running some of these ideas by a fellow Arke employee and Sitecore MVP, Andy Uzick, I came to realize that I was over thinking my solution by far.  This sort of problem had been solved long ago and is *WAY* simpler than I had originally thought.

The solution is to change the "SitemapXML.config" to subscribe to the "publish:end:remote" event instead of the "publish:end" event.  This event fires after a successful publish and let's remote servers perform the actions specified.  In our case, this lets the SitemapXML module know to build the robots.txt and sitemap(s) files.  Just ensure that you have the SitemapXML.DLL and SitemapXML.config files present on all your CD servers and you're good-to-go!

Very simple solution and now I don't have to manually copy files between servers!

October 17, 2013

Using the Sitemap XML Module in a Hardened, Multi-site Environment

October 17, 2013

using the sitemap xml module in a multi-site environment

We are using the Sitecore Sitemap XML module (2.0) for one of our clients.  Well, we're using it for a number of our clients, but let's just talk about the one client for now.

The module is pretty neat in that it generates a robots.txt and sitemap.xml files for your sites automatically on publish.  It also has the ability to automatically submit the generated sitemap(s) to a number of specified search engines.

The Problem

The module allows for multi-site instances through the configuration file.  You simply specify the Sitecore sites that should be included and the module will generate the necessary robots and sitemap file entries.  So if the module is already allowing for multi-site instances, what's the problem then?  The problem is that if you have followed the security hardening guide for your Sitecore environment, (you did, didn't you??)  you have prevented Sitecore from serving XML files.  The search engine crawlers are expecting the sitemap in an XML format and since Sitecore has been hardened, it refuses to serve the sitemap files.

The Solution

What we need to do is allow Sitecore to serve the sitemap XML files, but not serve up any other XML files that we want to keep secure. (license.xml file comes to mind. . . )

The first step is to tell sitecore to ignore requests to the sitemap files.  I have a naming convention in place in which all sitemaps on the client's instance are configured as "sitemap_<site_name>.xml."  With this standard in place, I updated "IgnoreUrlPrefixes" setting in the web.config as follows:

<setting name="IgnoreUrlPrefixes" value="/sitecore/default.aspx|/trace.axd|/webresource.axd|...|/sitemap_*.xml" />

This let's Sitecore know that it should not try to resolve to an item in the content tree if a request comes in for an XML sitemap.

The second step is to allow Sitecore to serve the XML file.  My client is using IIS 7, so I updated the handlers in the "system.webServer" node in the web.config.  I added a handler that only lets the sitemap XML files be served:

<add verb="GET" path="sitemap_*.xml" type="System.Web.StaticFileHandler" name="allow xml sitemap" />

I now have a multi-site solution that will automatically generate and manage the robots.txt and XML sitemaps that can be properly indexed by the search engines while still remaining secure.