Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Drupal 6 Search Engine Optimization

You're reading from  Drupal 6 Search Engine Optimization

Product type Book
Published in Sep 2009
Publisher Packt
ISBN-13 9781847198228
Pages 280 pages
Edition 1st Edition
Languages
Concepts
Author (1):
Ben Finklea Ben Finklea
Profile icon Ben Finklea

Table of Contents (19) Chapters

Drupal 6 Search Engine Optimization
Credits
About the Author
1. Acknowledgement
About the Reviewers
2. Preface
1. The Tools You'll Need 2. Keyword Research 3. On-Page Optimization 4. More On-Page Optimization 5. Sitemaps 6. robots.txt, .htaccess, and W3C Validation 7. RSS Feeds, Site Speed, and SEO Testing 8. Content is King 9. Taking Control of Your Content 10. Increasing the Conversion Rate of Your Drupal Web site 10 SEO Mistakes to Avoid A Drupal SEO Checklist Drupal SEO Case Study for Acquia Product Launch

Chapter 6. robots.txt, .htaccess, and W3C Validation

Much of the SEO that we've accomplished so far is visible to your visitors (for example, titles, headings, body text, and even a sitemap or two). In this chapter, we're going to address some of the more technical aspects of on-page SEO. Over the last ten years, many elements have been added to the HTML specification. The search engines themselves have developed other elements to help you communicate better with them. Since our ultimate goal is to do well by the search engines and our visitors, it's time to embrace your inner geek and get technical with your SEO. Pocket protectors ready? Let's do this thing.

In this chapter, we're going to cover:

  • The robots.txt files and common directives used in these files

  • Problems with Drupal's standard robots.txt and how to fix them

  • Adding the XML Sitemap to the robots.txt

  • Understanding and editing the .htaccess file

  • W3C Validation

Note

Take care when upgrading your Drupal installation!

In this chapter, we...

Optimizing the robots.txt file


The robots.txt file is a file that sits at the root level of your web site and asks spiders and bots to behave themselves when they're on your site. You can take a look at it by pointing your browser to http://www.yourDrupalsite.com/robots.txt. Think of it like an electronic No Trespassing sign that can easily tell the search engines not to crawl a certain directory or page of your site. Using wildcards, you can even tell the engines not to crawl certain file types like .jpg or .pdf. This means none of your JPEG images or PDF files will show up in the search engines. (I'm not recommending that you do that…but you could.)

Note

The robots.txt file is required by Google

On December 1, 2008, John Mueller, a Google analyst, said that if the Googlebot can't access the robots.txt file (say the server is unreachable or returns a 5xx error result code) then it won't crawl the web site at all. In other words, the robots.txt file must be there if you want the web site...

Mastering the .htaccess file


There is a server configuration file at the root level of your Drupal 6 site called the .htaccess file. This file is a list of instructions to your web server software, usually Apache. These instructions are very helpful for cleaning up some redirects and otherwise making your site function a bit better for the search engines. In Chapter 1, The Tools You'll Need, we told Google Webmaster Tools that we wanted our site to show up in Google with or without the www in the URL. The .htaccess file allows you to do the same thing directly on your web site. Why are both necessary? In Google's tool, you're only telling Google how you want them to display your URLs; you're not actually changing the URLs on your web site. With the .htaccess file, you're actually affecting how the files are served. This will change how your site is displayed in all search engines.

Note

Hey, why can't I can't see the .htaccess file?

In Unix/Linux Operating Systems, any file that begins with...

W3C markup validation


Drupal is a well-written piece of software that produces well-formed web sites. However, don't assume that it will still be that way when you're done with it. Not all of the modules, themes, or content on your site will pass muster. This is especially true if your site is open to users to create their own content.

You should run a comprehensive scan of the site to check for improperly formed code, broken links, and other oversights that could hinder your search engine positioning. Obviously, Google can't reject sites just because they have bad markup (most of the sites out there have at least one thing wrong with them). However, bad HTML can confuse the search engine spiders. They're not as forgiving as a modern browser is to technical issues. By eliminating any problem markup, you can remove this concern from your site.

There is a great, and free, tool that you can use to scan your site. It's called the W3C HTML Validator.

Scanning your site with the W3C HTML Validator...

Summary


In this chapter, we covered some of the most technical aspects of a good SEO. We discussed:

  • The robots.txt file

  • The .htaccess files

  • W3C Validation

We've got one more chapter of technical, on-page optimization, and then you'll be ready to start populating your site with content.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Drupal 6 Search Engine Optimization
Published in: Sep 2009 Publisher: Packt ISBN-13: 9781847198228
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}