Search Engine Optimization using Sitemaps in Drupal 6

Exclusive offer: get 50% off this eBook here
Drupal 6 Search Engine Optimization

Drupal 6 Search Engine Optimization — Save 50%

Rank high in search engines with professional SEO tips, modules, and best practices for Drupal web sites

$26.99    $13.50
by Ben Finklea | September 2009 | MySQL Content Management Drupal Open Source PHP

In this article by Ben Finklea, we will discuss Sitemaps in detail, right from the origin of sitemaps to how they are used to make sure our entire site is crawled by the search engines. We will cover:

  • What sitemaps are and why you should use them
  • How to install sitemaps on your Drupal site
  • How to submit the XML sitemaps to Google

Let's get started.

As smart as the Google spider is, it's possible for them to miss pages on your site. Maybe you've got an orphaned page that isn't in your navigation anymore. Or, perhaps you have moved a link to a piece of content so that it's not easily accessible. It's also possible that your site is so big that Google just can't crawl it all without completely pulling all your server's resources—not pretty!

The solution is a sitemap.

In the early 2000s, Google started supporting XML sitemaps. Soon after Yahoo came out with their own standard and other search engines started to follow suit. Fortunately, in 2006, Google, Yahoo, Microsoft, and a handful of smaller players all got together and decided to support the same sitemap specification. That made it much easier for site owners to make sure every page of their web site is crawled and added to the search engine index. They published their specification at http://sitemaps.org. Shortly thereafter, the Drupal community stepped up and created a module called (surprise!) the XML sitemap module. This module automatically generates an XML sitemap containing every node and taxonomy on your Drupal site. Actually, it was written by Matthew Loar as part of the Google Summer of Code. The Drupal 6 version of the module was developed by Kiam LaLuno. Finally, in mid-2009, Dave Reid began working on a version 2.0 of the module to address performance, scalability, and reliability issues. Thanks, guys!

According to www.sitemaps.org:

Sitemaps are an easy way for Webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL (when it was last updated, how often it usually changes, and how important it is, relative to other URLs in the site) so that search engines can more intelligently crawl the site.

Web crawlers usually discover pages from links within the site and from other sites. Sitemaps supplement this data to allow crawlers that support Sitemaps to pick up all URLs in the Sitemap and learn about those URLs using the associated metadata.

Using a sitemap does not guarantee that every page will be included in the search engines. Rather, it helps the search engine crawlers find more of your pages. In my experience, submitting an XML Sitemap to Google will greatly increase the number of pages when you do a site: search.

The keyword site: searches show you how many pages of your site are included in the search engine index, as shown in the following screenshot:

Drupal 6 Search Engine Optimization

Setting up the XML Sitemap module

The XML Sitemap module creates a sitemap that conforms to the sitemap.org specification.

Which XML Sitemap module should you use?

There are two versions of the XML Sitemap module for Drupal 6. The 1.x version is, as of this writing, considered the stable release and should be used for production sites. However, if you have a site with more than about 2000 nodes, you should probably consider using the 2.x version. From www.drupal.org: 'The 6.x-2.x branch is a complete refactoring with considerations for performance, scalability, and reliability. Once the 6.x-2.x branch is tested and upgradeable, the 6.x-1.x branch will no longer be supported'. What this means is that in the next few months (quite possibly by the time you're reading this) everyone should be using the 2.x version of this module. That's the beauty of open source software—there are always improvements coming that make your Drupal site better Search Engine Optimized.

Carry out the following steps to set up the XML Sitemap module:

  1. Download the XML Sitemap module and install it just like a normal Drupal module.  When you go to turn on the module, you'll be presented with a list that looks similar to the following screenshot:

    Now that you have the XML sitemap module properly installed and configured, you can start defining the priority of the content on your site—by default, the priority is .5. However, there are times when you may want Google to visit some content more often and other times when you may not want your content in the sitemap at all (like the comment or contact us submission forms).

    Each node now has an XML sitemap section that looks like the following screenshot:

    Drupal 6 Search Engine Optimization

    Before you turn on any included modules, consider what pieces of content on your site you want to show up in the search engines and only turn on the modules you need.

    • The XML sitemap module is required. Turn it on.
    • XML sitemap custom allows you to add your own customized links to the sitemap. Turn it on.
    • XML sitemap engines will automatically submit your sitemap to the search engines each time it changes. This is not necessary and there are better ways to submit your sitemap. However, it does a nice job of helping you verify your site with each search engine. Turn it on.
    • XML sitemap menu adds your menu items to the sitemap. This is probably a good idea. Turn it on.
    • XML sitemap node adds all your nodes. That's usually the bulk of your content so this is a must-have. Turn it on.
    • XML sitemap taxonomy adds all your taxonomy term pages to the sitemap. Generally a good idea but some might not want this listed. Term pages are good category pages so I recommend it. Turn it on.
    • Don't forget to click Save configuration.
  2. Go to http://www.yourDrupalsite.com/admin/settings/xmlsitemap or go to your admin screen and click on Administer | Site Configuration | XML sitemap link. You'll be able to see the XML sitemap, as shown in the following screenshot:

    Drupal 6 Search Engine Optimization

  3. Click on Settings and you'll see a few options, as shown in the following screenshot:

    Drupal 6 Search Engine Optimization

    • Minimum sitemap lifetime: It determines that minimum amount of time that the module will wait before renewing the sitemap. Use this feature if you have an enormous sitemap that is taking too many server resources. Most sites should leave this set on No minimum.
    • Include a stylesheet in the: The sitemaps will generate a simple css file to include with the sitemap that is generated. It's not necessary for the search engines but very helpful for troubleshooting or if any humans are going to view the sitemap. Leave it checked.
    • Generate sitemaps for the following languages: In the future, this option will allow you to actually specify sitemaps for different languages. This is very important for international sites who want to show up in localized search engines. For now, English is the only option and should remain checked.
  4. Click the Advanced settings drop-down and you'll see several additional options.

    Drupal 6 Search Engine Optimization

    • Number of links in each sitemap page allows you to specify how many links to pages on your web site will be in each sitemap. Leave it on Automatic unless you are having trouble with the search engines accepting the sitemap.
    • Maximum number of sitemap links to process at once sets the number of additional links that the module will add to your sitemap each time the cron runs. This highlights one of the biggest differences between the new XML sitemap and the old one. The new sitemap only processes new nodes and updates the existing sitemap instead of reprocessing every time the sitemap is accessed. Leave this setting alone unless you notice that cron is timing out.
    • Sitemap cache directory allows you to set where the sitemap data will be stored. This is data that is not shown to the search engines or users; it's only used by the module.
    • Base URL is the base URL of your site and generally should be left as it is.
  5. Click on the Front page drop-down and set these options:
    • Front page priority: 1.0 is the highest setting you can give a page in the XML sitemap. On most web sites, the front page is the single most important part of your site so, this setting should probably be left at 1.0.
    • Front page change frequency: Tells the search engines how often they should revisit your front page. Adjust this setting to reflect how often the front page of your site changes.

      What is priority and how does it work?

      Priority is an often-misunderstood part of a sitemap. For instance, the priority is only used to compare pages of your own site and you cannot increase your ranking in the Search Engine Results Page (SERPS) by increasing the priority of your pages. However, it does help let the search engines know which pages of your site you feel are more important. They could use this information to select between two different pages on your site when deciding which page to show to a search engine user.

Drupal 6 Search Engine Optimization Rank high in search engines with professional SEO tips, modules, and best practices for Drupal web sites
Published: September 2009
eBook Price: $26.99
Book Price: $44.99
See more
Select your format and quantity:
  • Open the Content types drop-down and you will see the following screenshot:

    Drupal 6 Search Engine Optimization

    • Here, you will see each Content type listed separately. You probably want to leave these settings alone so that all your content shows up in the sitemap.
    • If you do want to adjust the Content types settings in the sitemap, you'll need to go to the content type screen. Click on the name of the content type to go to that screen.
    • On the content type screen, open the XML sitemap drop-down and you'll see two options.

      Drupal 6 Search Engine Optimization

    • Include in sitemap sets the default action for that content type – if you check this box then it will be included in the sitemap.
    • Default priority allows you to set the default for each node that you create of that content type. Default is usually .5 but you can adjust it if you want certain pages of a higher or lower priority.
    • Click on Save content type.
    • Repeat for each content type that you wish to change.
  • Click Save configuration.
  • Now, you need to run cron. Cron is a recurring script that takes care of many maintenance issues in Drupal including populating the XML sitemap. To run cron, point your browser to http://www.yourDrupalsite.com/cron.php and wait until the page stops loading. You will not receive any indication that it's complete except that your browser will stop loading the page.
  • Point your browser to http://www.yourDrupalsite.com/sitemap.xml. If you see a bunch of gobbledygook that looks like the following screenshot:
    Drupal 6 Search Engine Optimization

    Or a screen similar to the following screenshot:

    Drupal 6 Search Engine Optimization
  •  

  • If yes, then you've done it right! If you view the source code of the sitemap, you'll see something like the following screenshot:
    Drupal 6 Search Engine Optimizationd

    The XML Sitemap will only update when cron runs. On a normal Drupal installation, you should have set cron to run periodically—nightly for most sites or more often for high-traffic sites.

  • Specifying the XML sitemap priority for nodes

    Now that you have the XML sitemap module properly installed and configured, you can start defining the priority of the content on your site—by default, the priority is .5. However, there are times when you may want Google to visit some content more often and other times when you may not want your content in the sitemap at all (like the comment or contact us submission forms).

    Each node now has an XML sitemap section that looks like the following screenshot:

    Drupal 6 Search Engine Optimization

    You can adjust the priority on a node-by-node basis by changing the default. You can even omit nodes from the sitemap by selecting Not in site map.

     

    Submitting your XML sitemap to Google

    Carry out the following steps in order to submit your XML sitemap to Google:

    1. If you have not already done so, you need to verify your web site with Google Webmaster Tools.
    2. Now point your browser to Google's Webmaster Tools at http://www.google.com/Webmasters/. Click on Sign in to Webmaster Tools, as shown in the following screenshot:
      Drupal 6 Search Engine Optimization
    3. You should see a list of your sites. Click on the Add link in the Sitemap column, located to the right of your site link, as shown in the following screenshot:
      Drupal 6 Search Engine Optimization
    4. Double-check that your sitemap is working.
    5. Copy and paste your sitemap URL (http://www.yourDrupalsite.com/sitemap.xml) into the blank space provided and click on Submit Sitemap. NOTE: If you get an error in Google, try tweaking your URL by adding ?q= after the /, as follows:
      http://www.yourDrupalsite.com/?q=sitemap.xml)
      Drupal 6 Search Engine Optimization
    6. You should see a confirmation message that looks like this:

      Drupal 6 Search Engine Optimization

    7. Now wait for several hours…or days.
    8. Log in to Google Webmaster Tools, click your domain and then click the Sitemaps | Overview. If the status is still Pending then wait a bit longer. When your sitemap has been crawled, it will say OK.

    Track who sees the XML sitemap
    You can easily see who has accessed your XML Sitemap by visiting your Watchdog log: http://www.yourDrupalsite.com/admin/reports/dblog. You can see how recently each search engine has visited your sitemap.

    What about all those other search engines out there? It's easy to let them all know where your XML Sitemap is located by adding it to your robots.txt file.

    Summary

    In this article, we discussed the origin of sitemaps and how they are used to make sure your entire site is crawled by the search engines. We also learnt how to setup the XML sitemap module, specify  XML sitemap priority for nodes, and finally how to go about submitting  your XML sitemap to Google.

    If you have read this article you may be interested to view :

    Drupal 6 Search Engine Optimization Rank high in search engines with professional SEO tips, modules, and best practices for Drupal web sites
    Published: September 2009
    eBook Price: $26.99
    Book Price: $44.99
    See more
    Select your format and quantity:

    About the Author :


    Ben Finklea

    Ben Finklea is the founder and CEO of Drupal SEO firm Volacci Search Marketing. He is the creator of the Drupal SEO Checklist module and he contributes to other SEO-related modules in the Drupal community. Ben is an internationally-known consultant, speaker, and trainer on topics related to SEO, Drupal, and building successful high-tech businesses. He lives with his wife and sons near Austin, Texas

    Books From Packt

    jQuery 1.3 with PHP
    jQuery 1.3 with PHP

    Joomla! 1.5 SEO
    Joomla! 1.5 SEO

    Symfony 1.3 Web Application Development
    Symfony 1.3 Web Application Development

    Drupal 6 JavaScript and jQuery
    Drupal 6 JavaScript and jQuery

    Drupal 6 Content Administration
    Drupal 6 Content Administration

    Drupal 6 Site Blueprints
    Drupal 6 Site Blueprints

    Plone 3 Theming
    Plone 3 Theming

    ASP.NET 3.5 CMS Development
    ASP.NET 3.5 CMS Development

     

    No votes yet

    Post new comment

    CAPTCHA
    This question is for testing whether you are a human visitor and to prevent automated spam submissions.
    S
    E
    Z
    p
    b
    t
    Enter the code without spaces and pay attention to upper/lower case.
    Code Download and Errata
    Packt Anytime, Anywhere
    Register Books
    Print Upgrades
    eBook Downloads
    Video Support
    Contact Us
    Awards Voting Nominations Previous Winners
    Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
    Resources
    Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software