This chapter lays the foundation for what's to come later in the book. It introduces basic concepts, terms, and fundamental information needed to understand the rationale behind the techniques discussed in the subsequent chapters. While some of the content in this chapter will be known to experienced users, it will be essential content for newbies and those who are not SEO specialists.
The topics covered in this chapter include:
An introduction to the SEO process
An SEO vocabulary
An explanation of how search engines view your site
At its most basic, SEO is an acronym for Search Engine Optimization. More importantly, for the purposes of the philosophy espoused in this text, SEO is a process—a series of planning and execution steps that lead to a website being optimized to perform its best on the search engines.
Notice the emphasis on process—SEO is not something you do once and then forget about. While an intensive period of attention to your site's optimization factors can lay a solid foundation and get you off to a proper start, if you do not continue to make efforts to improve and respond to market conditions, your rankings will stagnate and then erode over time. Moreover, your efforts do not exist in isolation; there are others out there competing for rankings and traffic. In order to succeed, you need to do your best to stay ahead of the others fighting for ranking for their sites.
When we talk about the search engines in this text, we mean Google, Bing, Baidu, or other similar sites focused on allowing the general public to search for and find information on the Web. Typically, what works for one search engine will work for others. Though there are peculiarities and optimization strategies that can be applied to target-specific engines, most SEO techniques are search-engine agnostic.
The competition for attention online should never be underestimated. If you are in a competitive business vertical—be it travel, finance, gambling, web design, property, or others—the battle for traffic from the search engines is cutthroat. Never forget that the major players out there have dedicated SEO teams that do nothing every day but tweak, optimize, build links, create content, and generally do their best to out-compete all other similar business vying for the top spots on the search engines.
In this book, we put forward a methodology for search engine optimization. The process we advocate can be viewed broadly as having two parts—foundations and on-going efforts. We start by looking at how to lay a great foundation for your site, that is, the basics of creating a search engine friendly site. In later chapters, we turn our attention to on-going techniques for maintaining and improving your rankings over time. Along the way, we look at how to formulate and implement a coherent search engine strategy.
Never forget, for most site owners the actual goal is traffic generation, not pure search engine ranking.
While many of the issues in SEO relate to technical aspects of the site, there is much more to SEO than just getting the technical aspects of your Joomla! site in order. One of the fundamental principles advocated in this book is to focus on the creation of useful, unique content. There is a strong, positive correlation between high quality content and high site ranking. This is one of the few areas where the search engines provide specific guidance about what they are looking for in a site. On the subject of quality, Google provides the following guidance:
Avoid tricks intended to improve search engine rankings. A good rule of thumb is whether you'd feel comfortable explaining what you've done to a website that competes with you. Another useful test is to ask, "Does this help my users? Would I do this if search engines didn't exist?"
For more insights from Google, visit http://support.google.com/webmasters/bin/answer.py?hl=en&answer=35769.
Ensure content is built based on keyword research to match content to what users are searching for
Produce deep, content-rich pages; be an authority for users by producing excellent content
Set a schedule and produce new content frequently
Don't try to outsmart Google—it's not going to work. Even if you find a way to artificially manipulate your rankings, there will come a day—very soon—when Google will pick up on it and make adjustments to their algorithms. When that happens, your site rankings will plummet and you will go from hero to zero.
While content is critical, it should not be your only concern. SEO practitioners often disagree about the relative importance of various factors in site rankings, but there is general agreement on which factors play a part. The search engine business is very competitive and companies such as Google and Bing do not disclose details of how their algorithms work. Fortunately for us, there is a considerable body of third-party research focused on discerning trends and patterns in search engine ranking. One of the best sources of information on this topic is SeoMoz's Search Ranking Factors, a report they publish free of charge and update annually. The data in the report comes from interviews of more than 130 SEO specialists and from a large data set that seeks to identify correlations between site variables and search engine rank.
View the report online by visiting http://www.seomoz.org/article/search-ranking-factors.
Keywords in the domain name
Keywords in a page's URL
Keywords in the content title
Keyword placement on page
Keyword repetition on page
Uniqueness of content
Freshness of content
Twitter activity, including influence of account tweeting
Social media up votes and comments
Click through rate for the site
Number, quality, and content of links to this site
Number of internal links
Number of errors on site
Speed of the site
In sum, SEO is a process that requires a multifaceted strategy. At a minimum, you need to make an effort to create a site that is search engine friendly, but in order for your site to excel in the rankings, you must do more. SEO requires concerted effort across time and you must also focus on the creation of unique, quality content.
The future of SEO
SEO is a moving target. The search engines are constantly adjusting their algorithms and practitioners are constantly trying new strategies and modifying their approach. While it is impossible to predict with any accuracy what the future of SEO will bring, there is some consensus among experts about which direction it is moving in. Generally speaking, we believe the future will see a continued emphasis on determining the perceived value of each site. This will be done by looking at not only the quality of the site's content, but also social media signals and site traffic patterns. Site performance will also continue to be a factor, with faster, better built sites being preferred over slow, badly engineered sites.
These factors are consistent with what we know about the general goals the search engines aspire to, that is, to be able to perceive sites more like users perceive them, rather than as a purely mathematical exercise.
The SEO field is replete with esoteric terminology and peculiar expressions. An awareness of the discipline's vocabulary is essential to clear understanding. In this section of the chapter, we provide definitions for the most commonly-used terms.
is a configuration file for your web server. In the context of SEO, it is used to help your web server determine how to route HTTP traffic. In the world of SEO, the
.htaccess file is most commonly discussed in the context of URL aliases, which are often used to create search engine friendly URLs.
A 301 redirect is an instruction given to the web server, informing it that a page that was previously located at one URL has been moved permanently to a new URL. The 301 redirect is most commonly used in situations where a site has been rebuilt and the URLs have changed. By adding 301 redirects to the site, you are able to avoid missed connections caused by traffic going to the old URL. When a 301 redirect is used, the search engines will also update their indexes to remove the old URL for the page and substitute the new one, thereby preserving the page's indexing.
A 302 redirect, like a 301 redirect, informs the web server that a page has moved. Unlike a 301 redirect, a 302 redirect indicates that the move is temporary. This option is a disfavored option as some search engines will penalize for the use of this sort of redirect.
When a person visits a URL to a page that no longer exists (or has been moved), or types in an incorrect URL, the visitor will automatically be shown a 404 error message. The default message informs the visitor that the page cannot be found. Many sites build custom pages specifically designed to be displayed when a 404 error occurs.
AdSense is a Google advertising program aimed at website owners. Site owners can sign up for the AdSense program and then display it on their site. (The ad inventory is provided by Google, often from the AdWords program, discussed next). The website owner will be paid a percentage of the revenues generated when someone clicks on one of the ads displayed on his or her site.
AdWords is a Google commercial advertising program aimed at advertisers. If you want to advertise on the Google network, you can sign up for the AdWords program, build an ad and set a daily budget for the display of that ad. The ad will then appear in the Google network and you will be charged when someone clicks on one of the ads (or, alternatively, you can elect to be charged according to the number of views of the ad).
Alexa.com provides a website ranking service that attempts to rate all the sites on the Web in order of their popularity. Like a golf score, the lower the score, the better. The most popular site on the Web (typically Google.com) has an Alexa Rank of 1. The service, though not 100 percent accurate and the subject of some criticism, is yet another way of tracking the success of your efforts to raise your site's profile. To learn more visit http://alexa.com.
The HTML image tag (
is used to place images on the page. The tag includes an option to specify a value for the attribute
alt. This attribute is intended to allow webmasters to specify an alternative description for the image, typically for the benefit of users who are using screen readers or browsers with the image display disabled.
Canonical URLs are URLs that have been standardized into a consistent form. For the search engines, this typically implies making sure all your pages use consistent URL structures, for example, making sure all your URLs start with "www".
Crawl depth is a measure of how deeply the search engine spider has indexed a website. This is typically an issue relevant for sites with a complex hierarchy of pages. The deeper the spider indexes the site, the better.
Doorway page is a page built specifically to point users to another page. This technique is used legitimately when a site owner holds multiple domain names and wishes to channel all the traffic into a primary domain. The technique is often used inappropriately by some black hat SEO practitioners as a way to create highly optimized pages targeting a specific term or terms, then push the users to another site—an online variation of the old bait and switch routine.
Duplicate content penalty is a theory that the search engines penalize sites that repeat content, or use content that is duplicated from another source. The theory is controversial, with many believing that the penalty may not exist, or may only be enforced in situations where there are other factors that indicate bad intent.
KEI is an acronym standing for Keyphrase Effectiveness Index. KEI is normally used during keyphrase research in an attempt to find the optimal keyphrases for a site. It is a simple ratio, most often defined as, "Frequency of search engine queries for the term/number of pages competing for the term".
More the number of searches, more the potential traffic. The lower the competition, the easier it is to rank highly in the SERP. The most ideal term will have low competition and a high number of searches.
Keyphrase density is a calculation done by looking at all the text on a page, then calculating a ratio that represents the total number of words to the number of times a particular keyphrase or keyword appears on that page.
Keyphrase stuffing is the over-optimizing of a page for a particular keyphrase. This is a disfavored practice that can have a negative impact on your site's ranking as it is viewed by the search engines as an attempt to exert inappropriate influence on the rankings for the page.
Link farm is a site that includes an excessive number of links. These sites are typically built purely to generate links for SEO purposes. Sites of this nature are disfavored by the search engines, which view them as inappropriate attempts to exert influence over rankings.
When you create a hyperlink on a page by wrapping a text string with an
<a> tag, the text wrapped by the tag is referred to as the link or anchor text. There is a search engine optimization benefit to using text for hyperlinks, as the text can then be indexed in conjunction with the hyperlink.
In general terms, the long tail of a distribution is the trailing end of the distribution. In the context of SEO, the term is used to refer to targeting longer and more specific search queries, where there is usually less competition.
Metadata is, quite literally, data about data. On the Web, meta tags are the most common implementation of metadata and in the past were a key part of search engine indexing. Today, meta tags are still in use on the Web and can be found in the head section of web pages.
PageRank is a ranking algorithm created by and named for Larry Page at Google. The ranking criteria is unknown, but the scale ranges from zero at the low end to ten at the high end. The higher the score the more persuasive a website is deemed to be. There is argument, however, that the rank is no longer in use at Google and may not continue to evolve.
PPC is an acronym for Pay Per Click advertising. If you use a PPC advertising scheme, you pay every time someone clicks on one of your ads. The most popular PPC system is the Google AdWords program. It is also sometimes called "pay for performance advertising".
Redirect is an instruction given to the web server to redirect traffic seeking one URL to a different URL. There are different types of redirects, such as 301 redirect and 302 redirect , as we have seen earlier in this chapter.
SEF URLs is an acronym for Search Engine Friendly URLs. The term refers to the creation of URLs that use natural words and phrases, rather than query strings and other abstract values (such as numbers) not associated with the page content.
SEM is an acronym for Search Engine Marketing. The term is broad and applies to not only search engine optimization, but also to other techniques, such as social media, pay per click advertising, and other marketing techniques focused on search engines.
SEOMoz is a popular commercial SEO consultancy service. Learn more at http://www.seomoz.org.
SMO is an acronym for Social Media Optimization. The process of using social media to drive traffic to your site and the related process of making your site suitable for social media, for example, by including social bookmarking tools and other social sharing devices on the site's pages.
Stop words are words included in search queries that are not actively indexed, unless included in quotations (phrase search). Typical examples include articles and conjunctions such as the, a, and or.
is available on a number of HTML elements. It is used to provide a description for a link, a table, a frame, an image, or other elements. Some search engines index the
title attribute and it therefore provides another option for on page optimization. Some browsers will also display the content of the
title attribute as a tool tip when you move your mouse over the object.
XML sitemaps lists the pages on a website in a format that is easily digestible by search engine agents. The sitemaps follow a standard convention agreed upon by all the major search engines. The XML sitemap is typically not visible to site visitors, and should not be confused with the normal sitemaps often used on the frontend of websites.
Search engines all function in approximately the same fashion—a software agent, known as a bot, spider, or crawler, visits a page, gathers the content, and stores it in the search engine's data repository. Once the information is in the repository, it is indexed. The crawling and indexing processes are constant and on-going. Each of the major search engines maintain multiple crawlers that work tirelessly to refresh their index. The spiders find new pages by a variety of methods, typically including XML sitemaps, URLs already in the index, links to pages discovered while indexing, and URLs submitted for inclusion by users. How frequently they visit a specific site, and how deeply they spider the site on each visit, varies.
When a user visits the search engine and runs a search, the search engine extracts (from the search engine's index) a list of pages that are relevant to the query and then displays that list of pages to the user. The output on the search results page is defined according to each search engine's own criteria. The ranking methodology used by each engine is the result of the search engine's secret algorithm.
The search engine's crawler is primarily interested in certain types of information on the page, particularly the URL, the text, and the links on the page. Formatting is not indexed. Images and other media are indexed by most search engines, but to varying degrees of depth. Some types of media, such as Flash or attached files, are rarely indexed, though there are exceptions.
Seeing what the spider sees
If you have a Google Webmaster account, you can see a web page exactly as the Googlebot (the name of the Google crawler) sees it. To do this, log in to Google Webmaster Tools (http://www.google.com/webmasters/) and click on a site profile. In the navigation menu on the left, select the Diagnostics menu and then select the option Fetch as Googlebot . Type the URL of the page you want to see and after a delay, the system will produce the results. You can see a webpage, as shown in the following screenshot, followed by the Googlebot's view of the same page:
This chapter seeks to acquaint you with the basic principles of search engine optimization, including the terminology used. As noted at the outset, the philosophy that is promoted in this book emphasizes SEO as an on-going process intended to optimize a website to perform its best on the search engines. Throughout this book, the techniques discussed will all reinforce this process-oriented approach to SEO.
At the conclusion of this chapter, you should have gained an awareness of the most commonly used terms in the SEO field and you should have also gained insights into what is indexed by the search engines and how it is used to produce search engine results. At the outset of this chapter we stated the importance of quality and original content; at the end of this chapter, where we provided an example of how a search engine spider views your page, you can once again see how the content is key to your efforts.
In the next chapter, we take our first steps towards laying the foundations of SEO for your site, as we look at the default SEO options that are available on your Drupal site.