Search Engines in ColdFusion

John Farrar

August 2008

Built-In Search Engine

Verity comes in package with ColdFusion. One of the reasons why people pay for ColdFusion is the incredible power that comes with this tool. It should be noted that one of the most powerful standalone commercial search engines is this tool. Some of the biggest companies in the world have expanded internal services with the help of the Verity tool that we will learn about.

We can see that in order to start, we must create collections. The building of search abilities is a three-step process. There is a standard ColdFusion tag to help us with each of these functions.

  1. Create collections
  2. Index the collections
  3. Search the collections

These collections can contain information about web pages, binary documents, and can even work as a powerful way to search cached query result information. There are many document formats supported. In the real business world, the latest bleeding-edge solutions will still store a previous version. Archived and shared documents should be stored in appropriate formats and versions that can be searched.

Creating a Collection

The first thing is to make our collection. See the ColdFusion Administrator under Data & Services.

Search Engines in ColdFusion

Here, we will be able to add collections and edit existing collections. There is one default collection included in ColdFusion installations. This is the bookclub demonstration application data. We will be creating a collection of PDF documents for this lesson. We have placed a collection of ColdFusion, Flex, and some of the Fusion Authority Quarterly periodicals in a directory for indexing. Here is the information screen for adding the collection through the administrator.

Search Engines in ColdFusion

We choose to select the Enable Category Support option. Also, there are libraries available for multiple languages if that is appropriate in a collection. We now see that there is a new collection for our devdocs. There are four icons to work with this collection. They are, from right to left, index, optimize, purge, and remove actions. The Name link takes us to the index action. The collection gives us the number of actual documents present, and the size of the index file on the server. The screen will show the details of the index as to when it was last modified, and the language in which it is stored. It lists the categories, and also shows the actual path where the index is stored.

Search Engines in ColdFusion

Here is a code version of creating a collection that would achieve the same thing. This means that it is possible to create an entire administrative interface to manage collections. It is also possible to move from tags to objects, and wrap up all the functions in that style.

path="c:ColdFusion8veritycollectionsdocuments" />

If we have categories in our collection, and we want to get a list of the categories, then the following code must be used:

name="myCats" />
<cfdump var="#myCats#">

Indexing a Collection

We can do this through the administration interface. But here, we will do it as shown in the the following screenshot. This is a limited directory that we have used as an example for searching.

Search Engines in ColdFusion

This is the result of the devdocs submitted above.

Search Engines in ColdFusion

This gave a result of 12 documents with a search collection of the size, 4,611 Kb. Now, we will look at how to do the same search using code and build the index outside the administrator interface. This will require the collection to be built before we try to index files into it. The creation of the collection can also be done inside the administration interface or in code. It should also be noted that ColdFusion includes a security called Sandbox Security. These three core tags for Verity searching among many others can be blocked if you find it better for your environment. Just consider what is actually getting indexed and what needs to be searched. Hopefully, documents will be secured correctly and it will not be an issue.

When we are making an index, we have to make sure that we can either choose to use a recursive search or not. A recursive search means that all the subdirectories in a document or web page search will be included in our search. It should also be noted that the service will not work for indexing other websites. It is for indexing this server only.

<cfindex name="myCats" action="refresh" 
collection="bookClub" recurse="true"
type="path" extensions=".html .htm .cfm .cfml"
urlpath="http://localhost/documents/" />

Your collection has been indexed.

It is important to note that there is no output from this tag. So we need to put some text on the screen to make sure the person using the site can know that the task has been completed. If we want to index a single file rather than a whole directory path, we can do it with this code:

<cfindex action="refresh" 
collection="bookClub" recurse="true"
type="file" extensions=".pdf"
key=" c:inetpubwwwrootdocumentsColdFusioncf8_devguide.pdf"
urlpath="http://localhost/documents/ColdFusion" />

Your collection has been indexed.

Searching a Collection

Now we have created collections and indexed the contents of those items. It is time to start knowing that something which we are looking for is not present in that collection. Not only will we learn how to find things but will also learn how to narrow our results in order to get more of what we want and less of what we do not require. The results will only be the links to the documents that contain the results for which we were looking.

The Search Form

We are going to build a search form that is simple, which will again reinforce the value of code reuse in ColdFusion. The techniques that we will use will involve including this page on another page, and using variables that can be set on either page. It should also be noted that this page can also be run independently. The scenario of running it independently will only allow for a singe default search result page. Here is the code.

<cfparam name="form.params" default="">
<cfparam name="target" default="searchVerity">
<form method="post" action="#target#.cfm">
Enter your search term(s) using AND, OR, NOT and parens.
Surround an exact phrase with quotes.
<input type="text" name="params" size="75" value="#form.params#">
<br />
<input type="submit" value="Search">

We can see here that we have a variable called params that is passed through the form scope. We do this setting of form scope by setting our form method attribute to post. If we did not do that, the variables would return as URL variables. There is also an on-page variable called target. Its default value is set to searchVerity which is used in the action attribute of the form. The action attribute will direct the data of the form to the specified page. This page, as you will see, is included onto the search pages. So the params we set will be displayed in that form again when the actual search results are presented.

The Results Page

Our results code is opposite. The top items are the same default parameters as present on the form code. If we look at the bottom of the page, we will see that the form page is included, and it will be displayed above the search results. In this first example, there will be no change.

The next item we see is the actual search function. The name attribute is where the results of the search will be stored. The collection is the name of the Verity collection, where the information has been indexed. The criteria are the search parameters that were entered on the form. Then, we can choose how many context passages can be retrieved from a single document. Lastly, it sets how much context to pull for reference, and the maximum number of rows of results if found to pull.

The results are returned in a record set, which ColdFusion refers to as queries. Since this is stored in a record set, it can be treated the same way as other queries, and we will use a cfoutput to loop through the records for display. We will show the number of records returned before the loop for the user. Then, we will show the URL link to the results since this result set happens to contain web-accessible results.

<cfparam name="form.params" default="">
<cfparam name="target" default="searchVerity">
contextPassages = "1"
contextBytes = "300"
maxrows = "100">
<cfinclude template="searchVerityForm.cfm">
<hr />
<h3>Search Results</h3>
This search returned a total of <cfoutput>#foundResults.recordCount#
</cfoutput> results.
<cfoutput query="foundResults">
<hr />
File: <a href="#foundResults.URL#" target="_blank">#
#)<br />
Highlighted Summary: #foundResults.context#

This is the type of content that we would get from searching for cfajaxproxy in our current data set where the examples were created.

Search Engines in ColdFusion

We can see that we get a reference link, a score on the chances of the returned results matching the searched query, then, we get a summary with the highlighted summary results. We can set the highlighting, but we have retained the default settings here. As the second result shows, the occurrences in our summary text are bolded. There are two ways to see the common fields in the query record set returned by the search tag. The first would be to go through the documents, but often, it would be helpful to see the actual results by using the CFDump tag.

Search Techniques

One consideration for using this search engine is that it runs a little differently from say Google. Now do not think that only this search engine is the right way to do things. The point to be noted is that we must consider the user habits and the training needed to achieve the best results by using this technique. As programmers, we are more familiar with the logical use of AND, OR, and NOT in phrases. Not to be outdone, we will also find that lawyers and other professions are actually as good as we are in this practice. The other general tip that may be of use is to add an * before or after a word to allow mixing of partial finds. This means that 'fish*' would find fish and fishing. The wild card modifier can be added before or after a search word. It should also be noted that a comma is translated as an OR in searches.

When we get ready to graduate to some more advanced searching, the ColdFusion documents also contain additional wild cards. The "?" represents a single character. The [ ] square brackets will find any one of the characters within them. Using the curly brackets { }, we can place a comma-delimited list of items to search for in our collection. The ^ symbol is the same as the NOT keyword, but it will allow us to find things like occurrences of words that do not have a character in them. The dash can be used within brackets to declare a range of characters. The last thing to note here is that if we want to look for a character such as "?", then we must place this slash in front of it within the search parameters. There are actually many other tips for finding things in collections. What we end up discovering is that the base tool set of Google has been created to make things simple. Yet, because it is the most common search tool, it is likely to be a standard based on which things are graded.

To exclude a term from a search, place a "–" in front of the item. If we wish to include an item, then we lead the item with a + character. Phrases can be declared by placing double quotes around the phrase. If we have a query stored, we should be able to find something by declaring the field with a colon and then the item being searched for in that request. Lower case will find all results normally. We can code the criteria attribute to pass only lower case using the lcase() ColdFusion function if that can help.

There are a number of operators that need to be researched as the search engine power matures on a site. There are five basic types of operators (concept, relational, evidence, proximity, and score) and each of these types is of a magnitude that needs to be explored. With practice, we can attain mastery over this. One such master has built a business called Knowledge Watch in Michigan, USA ( He has a business that runs based on ColdFusion search abilities, and vending result sets to the customers. He is a living proof that the information age is a good business to serve with ColdFusion.

If we find a client who has a need to search ColdFusion, Verity should be considered a very viable solution. There is much more information than could be covered in a chapter. Someone can run an entire seminar track just on ColdFusion's capabilities. It is not likely that we will push Google or Yahoo out of the market. But it is quite likely that we will be able to complete all the core needs of our clients.

PDF Linking to Searches

This is a great tip for working PDF searches into the mix. We just spoke about the differences in searches and this is one place where it comes into view. We can actually pass the search terms through to Acrobat. The search engine in Acrobat is different. Some detailed searches do not work in a similar manner. We will look at an example of how it works. While we will find that most of the searches work well, the ones that do not can be adjusted after the document loads.

In our example, we will say we have technical documents and we are looking for ECMAScript edition 3. Verity finds the Programming Actionscript 3 document for us. We want to be able to pass the search through, so when the document loads, we get search results. We will show a simple working concept here, and save the perfecting of the techniques for another book. Here is what we would pass through the URL:

http://localhost/documents/prog_actionscript30.pdf#search=ECMAScript edition 3

You've been reading an excerpt of:

ColdFusion 8 Developer Tutorial

Explore Title