sjkblog

Automatically generating a sitemap with Jekyll

Posted on 30 Jan 2013.

Sitemaps are a way to tell Google and other search engines about pages on your website that they may otherwise have trouble to discover. You can read more about sitemaps at Google Webmaster Tools.

Generating a sitemap by hand can be tedious, and various web frameworks have therefore included automated means of sitemap generation. This website is using Jekyll, which is a static site generator written in Ruby. Michael Levin has authored a sitemap generator plugin for Jekyll, and it is being used on this site amongst others.

Let’s start off by creating the _plugins directory if it doesn’t already exist:

cd /path/to/website
mkdir -p _plugins

Now, download the sitemap generator plugin:

cd _plugins
wget --no-check-certificate \
  https://github.com/kinnetica/jekyll-plugins/raw/master/sitemap_generator.rb

Now edit _plugins/sitemap_generator.rb with your favourite editor. There are a few configuration variables that are well commented. For my website, I changed the following variables:

MY_URL = "http://sjk.ankeborg.nu"
PAGES_INCLUDE_POSTS = ["index.html, archive.html"]

Save and exit and run jekyll and you will have a fresh sitemap.xml in your _site/ directory. Woohoo! Upload it to your site and submit it to Google and other search engines!

Incorrect dates

I keep this website in a git repository. I write a post, commit it to the repository, build the site and rsync the output to my web host. For backup purposes, I also push my changes to a remote repository.

A few days ago, I accidentally deleted the website from my local computer. Ruth roh! No problem, I just cloned the repository again. This created a small problem though – the posts’ ‘last modified’ times were reset to the current date, which meant that my sitemap was messed up. The sitemap generator looks at the mtimes and not the filenames when deciding what date a post was created.

I fixed this by changing the mtimes back using touch:

cd _posts
for a in *;
    do touch -mat `echo $a |perl -ne 's,(\d+)-(\d+)-(\d+).*,$1$2${3}0000,; print'` $a;
done

A simple ls -l showed that the dates were fixed. However, when building the website the sitemap dates were still wrong. I turned to the plugin documentation and found the following:

The last modified date is determined by the latest date of the following: system modified date of the page or post, system modified date of included layout, system modified date of included layout within that layout, …

Aha! This means that I also have to change the date of the layout files that are used by posts. I changed the mtime of those to 19700101:

cd _layouts
touch -mat 7001010000 *

Voila!

blog comments powered by Disqus