Taxonomy and Thesauri in Drupal 6

Exclusive offer: get 50% off this eBook here
Building Powerful and Robust Websites with Drupal 6

Building Powerful and Robust Websites with Drupal 6 — Save 50%

Build your own professional blog, forum, portal or community website with Drupal 6

€20.99    €10.50
by David Mercer | February 2009 | Content Management Drupal Open Source

In this article by David Mercer, we will see how Taxonomy makes Drupal's classification system so powerful. The method of categorizing content in Drupal makes it one of the most sophisticated content management systems around. Take the time to master working with taxonomy in Drupal, because not only will this help you to work out how to manage content better, but it will also really set your site apart from others because of the flexible and intuitive manner in which the content is organized. These attributes allow you to manage a site of pretty much any size imaginable (just in case what you are working on is "the next big thing").

What and Why?

Taxonomy is described as the science of classification. In terms of how it applies to Drupal, it is the method by which content is organized using several distinct types of relationship between terms. Simple as that! This doesn't really encompass how useful it is, though, but before we move on to that, there is a bit of terminology to pick up first:

  • Term: A term used to describe content (also known as a descriptor)
  • Vocabulary: A grouping of related terms
  • Thesaurus: A categorization of content that describes is similar to relationships
  • Taxonomy: A categorization of content into a hierarchical structure
  • Tagging: The process of associating a term (descriptor) with content
  • Synonym: Can be thought of as another word for the current term.

It may help to view the following diagram in order to properly grasp how these terms inter-relate.

Building Powerful and Robust Websites with Drupal 6

This serves to illustrate the fact that there are two main types of vocabulary. Each type consists of a set of terms, but the relationships between them are different in that a taxonomy deals with a hierarchy of information, and a thesaurus deals with relationships between terms. The terms (shown as small boxes) and their relationships (shown as arrows) play a critical role in how content is organized.

One of the things that makes the Drupal taxonomy system so powerful, is that it allows content to be categorized on the fly (as and when it is created). This unburdens administrators because it is no longer necessary to moderate every bit of content coming into the site in order to put it into pre-determined categories. We'll discuss these methods in some detail in the coming sections, but it's also worth noting quickly that it is possible to tag a given node more than once. This means that content can belong to several vocabularies, at once. This is very useful for cross-referencing purposes because it highlights relationships between terms or vocabularies through the actual nodes.

Implementing Controlled Taxonomies in Drupal

The best way to talk about how to implement some form of categorization is to see it in action. There are quite a few settings to work with and consider in order to get things up and running. Let's assume that the demo site has enlisted a large number of specialists who will maintain their own blogs on the website so that interested parties can keep tabs on what's news according to the people in the know.

Now, some people will be happy with visiting their blog of choice and reading over any new postings there. Some people, however, might want to be able to search for specific topics in order to see if there are correlations or disagreements between bloggers on certain subjects. As there is going to be a lot of content posted once the site has been up and running for a few months, we need some way to ensure that specific topics are easy to find, regardless of who has been discussing them on their blogs.

Introduction to Vocabularies

Let's quickly discuss how vocabularies are dealt with in the administration tool in order to work out how to go about making sure this requirement is satisfied. If you click on the Taxonomy link under Content management, you will be presented with a page listing the current vocabularies. Assuming you have created a forum before, you should have something like this:

Building Powerful and Robust Websites with Drupal 6

Before we look at editing terms and vocabularies, let's take a look at how to create a vocabulary for ourselves. Click on the add vocabulary tab to bring up the following page that we can use to create a vocabulary, manually:

Building Powerful and Robust Websites with Drupal 6

By way of example, this vocabulary will deal with the topic of hunting. This vocabulary only applies to blog entries because that is the only content (or node) type for which it is enabled—you can select as many or as few as you like, depending on how many content types it should apply to.

Looking further down the page, there are several other options that we will discuss in more detail, shortly. Clicking on Submit adds this vocabulary to the list, so that the main page now looks like this:

Building Powerful and Robust Websites with Drupal 6

So far so good, but this will not be of much use to us as it stands! We need to add some terms (descriptors) in order to allow tagging to commence.

Dealing with Terms

Click on add terms link for the Hunting vocabulary to bring up the following page:

Building Powerful and Robust Websites with Drupal 6

The term Trapping has been added here, with a brief description of the term itself. We could, if we choose, associate the term Poaching with Trapping by making it a related term or synonym (of course, you would need to create this term first in order to make it a related term). Click on the Advanced options link to expose the additional features, as shown here:

Building Powerful and Robust Websites with Drupal 6

In this case, the term Trapping is specified as being related to Poaching and by way of example, gin traps is a synonym. Synonyms don't actually do anything useful at the moment, so don't pay too much mind to them yet, but there are modules that expose additional functionality based on related terms and synonyms, such as the Similar by Terms module.

The Parents option at the start of the Advanced options warrants a closer inspection, but as it relates more closely to the structure of hierarchies, we'll look at it in the section on Hierarchies that's coming up.

For now, add a few more terms to this vocabulary so that the list looks something like this:

Building Powerful and Robust Websites with Drupal 6

It's now time to make use of these terms by posting some blog content.

Posting Content with Categories Enabled

Using any account with the requisite permissions to add blog content, attempt to post to the site. You should now be able to view the newly inserted Hunting category, as shown here:

Building Powerful and Robust Websites with Drupal 6

Now comes the clever bit! Once this blog node has been posted, users can view the blog as normal, except that it now has its term displayed along with the post (bottom right):

Building Powerful and Robust Websites with Drupal 6

Where does the descriptor link take us? Click on the term, in this case Canned hunting, and you will be taken to a page listing all of the content that has been tagged with this term. This should really have you quite excited, because with very little work, users can now find focused content without having to look that hard—this is what content management is all about!

Building Powerful and Robust Websites with Drupal 6 Build your own professional blog, forum, portal or community website with Drupal 6
Published: April 2008
eBook Price: €20.99
Book Price: €34.99
See more
Select your format and quantity:

Hierarchies

You can build an entire hierarchy of terms in a vocabulary to give you a fairly complex taxonomy. Remember that if it is a hierarchy you are building, then the broadest terms should be towards the top of the pile, with the more focused terms near the bottom. At the moment, though, we don't really have a hierarchy, but rather, more of a flat structure.

What if we wanted a set of more specific terms that would allow bloggers to tag their content (which focuses on specific types of Trapping, for example)? The answer lies in restructuring the vocabulary by dragging and dropping its terms not only up and down the list, but right to left—this is done when viewing the list terms page of the vocabulary.

For this example, I added a term entitled Snaring to the vocabulary, and then dragged it under and to the right of the term Trapping to indicate that it is lower in the hierarchy:

Building Powerful and Robust Websites with Drupal 6

Saving this change leaves us with the same page, only the description of the hierarchy has moved from flat to single:

Building Powerful and Robust Websites with Drupal 6

That was fairly easy to do, and now we are free to create either flat hierarchies or single depth ones (i.e. one parent term with one child term—no grandchildren). If you wanted to create a deep hierarchy structure, then this is easily achieved by dragging either additional terms under Snaring, or moving Trapping under something else, like this:

Building Powerful and Robust Websites with Drupal 6

This should not be confused with creating multiple hierarchies—notice that the hierarchy description in this screenshot still describes Hunting as a single hierarchy vocabulary.

But what happens if your topic is slightly more complex than a straightforward hierarchy? For example, it's quite possible that the terms Pits (referring to hunting pits) could be equally at home under both Trapping and Poaching (which in turn may also have multiple parents). In the event that one term has several parent terms, the phrase used to describe this structure is multiple hierarchy.

Recall that when dealing with terms previously, there was an Advanced option in the term edit page that allowed us to specify one or more parent terms. Selecting more than one parent, like so:

Building Powerful and Robust Websites with Drupal 6

...leads Drupal to warn us with the following page:

Building Powerful and Robust Websites with Drupal 6

Basically, it is necessary to warn users that the normal drag and drop facility for vocabularies are not implementable when terms have a complex hierarchy involving several parents—that said, drag and drop will still be enabled if it is at all possible, and the structure will still be shown on the List page. If you want a multiple hierarchy, then the structural editing of the hierarchy must be done by hand in each term's edit form.

Go ahead and click Set multiple parents—you might want to add a few terms and set each of these to have multiple parents, to make the structure a little more complex. With that done, note that the drag-and-drop features of the list page are disabled:

Building Powerful and Robust Websites with Drupal 6

The hierarchy structure is useful when the topics of discussion fall fairly neatly into some sort of natural hierarchy—forums are the best example of this. However, it may well be that a given piece of content overlaps several terms and should really be tagged with more than one term. To achieve this, head back to the vocabulary editing page and select the Multiple select option in the Settings section:

Save this and then post some new content. Now, instead of being presented with a single term to associate with the post, it is possible to select as many as are relevant:

When this post is viewed on the site, it has several tags associated with it, and users can click on any of these tags to immediately locate more content that is of specific interest to them:

Notice that the terms presented do not, in any way, indicate their underlying structure to the reader—it simply tells them that these are all terms of this bit of content.

Content Structure

What if, in the demo site's case, we have the term Trapping available to tag content with (blog posts in this case), but someone is really talking about something other than hunting entirely, and there happens to be some sort of content overlap? An example scenario might be as follows:

  • Several specialists are contracted to maintain blogs about the African continent.
  • They tag their content using a new Africa vocabulary that contains terms like nature, gazelle, predators, lakes, rivers, mountains, hunting, weather, and tourism.
  • You wish to be able to allow material that is created from the Africa blogs to be cross-referenced by hunting-related topics in the Hunting specialists' blogs.

In order to achieve this, it is necessary to create a new vocabulary called Africa. Attach this vocabulary to the blog content type, and then create several terms, ensuring that one of them is entitled Hunting, as follows:

Now, when users attempt to post content, they are presented with not one but two options to classify their content, and assuming you have correctly ordered the vocabularies on the Taxonomy page, you can apply a kind of hierarchy to the tags. For example, a blog post on poaching by one of the Africa bloggers might look like this:

Once this is posted to the site, it is then possible to view both categories on the content page, instead of just one. In other words, the node has been tagged with several terms in what is known as faceted tagging.

Faceted tagging uses a bottom up system of classification, where facets or properties of the content are described by the terms. In this way, a very intuitive method of classifying content can be created without users needing to understand the top-down path of a content hierarchy in order to find the content they are after. In this case, the specific method of tagging used here helps to elucidate the hierarchy of terms too (i.e. Canned hunting is a child of Hunting):

Taking a look at this posting on the site confirms that users can now go directly to both the Hunting and Canned hunting category pages by clicking on the links provided in the posting.

What happens if one of the Hunting bloggers simply wants to make an entry and tag it with the Canned hunting term from the Hunting vocabulary, without having to first specify that this content also belongs to the Africa vocabulary? The answer lies once again in editing the vocabulary page, which contains a Required checkbox right at the bottom. If this option is enabled, then posters must select at least one tag from the vocabulary, but if we leave it unselected, then posters can choose whether to include a term from that vocabulary or not.

Talking of new options, there is one more that we should take a look at quickly—tagging. Since tagging has a number of considerations to consider before implementation, we treat it in its own section.

Implementing Thesauri in Drupal (Tags)

Tagging is an interesting option because it allows posters to choose their own terms for their content. While posters effectively have free reign when it comes to tagging their posts, Drupal understands that a hundred different people might come up with a hundred different terms to describe the same post, and this can be very detrimental to the usability of the site.

In order to combat this effect, Drupal provides helpful clues to keep the tagging of posts as uniform as possible, without placing restrictions on what can and is used for tagging. Enabling Tags for the Hunting vocabulary, for example, means posters are given the following category options when creating a blog entry:

Notice that there is a red asterisk superscript above the Hunting category. This is because, despite the fact that we are using free tagging, the Required option on the edit vocabulary page is still enabled—so something has to be entered here. Secondly, there is a drop-down list of all the tags available (containing with whatever letter(s) you type). This means that giving people free reign to type in their own tags is not as random as it may at first seem, because they can still be guided as to what terms are already available using this drop-down list. In this way, Drupal can encourage a more coherent body of terms.

"But Poaching doesn't being with a C", some of you may be remarking. That is quite correct, but Poaching contains a c so it is displayed here nevertheless—it's a good way to provide a range of available tags that narrows down quickly as the user types.

Tagging has some pros in that it is far more flexible. People can tag their content exactly as they please—making the tagging system fit the content far more snugly. The problem is, however, that the vocabulary may well become unwieldy, because similar content could be tagged with entirely different terms, making it hard for users to find what they are looking for.

Allowing free tagging is a very powerful method for categorizing content. Be wary though, it can lead to a lot of redundant tags that in turn lead to content that is hard to find.

You should make note of the fact that it is not possible to create a hierarchy of terms using the free tagging system, because every new tag is on the same level as all the other tags. So what you end up with is really a thesaurus, instead of a taxonomy (hence this section's heading).

Remember that it is still possible to moderate a thesaurus because any and all terms that posters create will still be added to the list of terms in the vocabulary, and they can be viewed, edited or deleted as you like.

It is interesting to note that a middle ground between controlled taxonomies and free tagging is achievable using the already mentioned Multiple select option and disabling Free tagging. This allows users to tag their posts with as many terms as are made available by the creator of the vocabulary—giving you control over the terms used, and posters the freedom to choose which ones they make use of.

Summary

Drupal's taxonomy system sets it apart from other CMS technologies and provides the flexibility and power to implement pretty much any type of structure that we can imagine for our content. With powerful features like tagging available at the click of a button, you are sitting at the controls of one of the best systems around.

Building Powerful and Robust Websites with Drupal 6 Build your own professional blog, forum, portal or community website with Drupal 6
Published: April 2008
eBook Price: €20.99
Book Price: €34.99
See more
Select your format and quantity:

About the Author :


David Mercer

David Mercer was born in August 1976 in Harare, Zimbabwe. Having always had a strong interest in science, David came into regular contact with computers at university where he graduated "cum laude" with majors in applied math and math.

His technical books are now sold worldwide and have been translated into French, German, Polish, Greek, Spanish and many more. His book on Drupal 6 was reviewed on Slashdot and went on to become a best seller. Mercer's books are recommended reading at higher learning institutes like MIT, and he has a write-up on Wikipedia.

David divides his time between consulting for companies and organizations in a wide variety of industries, helping them to develop and implement cutting edge systems. He also contributes to interesting web-based projects like design-a-webpage (http://www.design-a-webpage.com) that allows people to create beautiful, effective landing pages in minutes - absolutely no code or Web experience required.

He also maintains a blog, training and support site for his readers at http://www:siteprebuilder.com. Site prebuilder provides quizzes and exercises that accompany his books, and serves as his primary online presence and contact point.

When he isn't working (which isn't that often) he enjoys playing guitar (generally on stage and unrehearsed) and getting involved in outdoor activities ranging from touch rugby and golf to water skiing and snowboarding.

Books From Packt

Drupal 6 Social Networking
Drupal 6 Social Networking

Drupal 6 Themes
Drupal 6 Themes

Building Websites with Joomla! 1.5
Building Websites with Joomla! 1.5

Joomla! Web Security
Joomla! Web Security

OpenCms 7 Development
OpenCms 7 Development

Drupal Multimedia
Drupal Multimedia

Drupal 6 Site Builder Solutions
Drupal 6 Site Builder Solutions

Learning Drupal 6 Module Development
Learning Drupal 6 Module Development

Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software