Faceting in Solr 1.4 Enterprise Search Server

Exclusive offer: get 50% off this eBook here
Solr 1.4 Enterprise Search Server

Solr 1.4 Enterprise Search Server — Save 50%

Enhance your search with faceted navigation, result highlighting, fuzzy queries, ranked scoring, and more

$26.99    $13.50
by David Smiley | August 2009 | Open Source

In this article by David Smiley, we will learn about faceting in Solr in detail. We will cover the field requirements, types of faceting, faceting text and alphabetic range bucketing. We will also learn about faceting on arbitrary queries, excluding filters and faceting dates which includes date facet parameters.

(For more resources on Solr, see here.)

Faceting, after searching, is arguably the second-most valuable feature in Solr. It is perhaps even the most fun you'll have, because you will learn more about your data than with any other feature. Faceting enhances search results with aggregated information over all of the documents found in the search to answer questions such as the ones mentioned  below, given a search on MusicBrainz releases:

  • How many are official, bootleg, or promotional?
  • What were the top five most common countries in which the releases occurred?
  • Over the past ten years, how many were released in each year?
  • How many have names in these ranges: A-C, D-F, G-I, and so on?
  • Given a track search, how many are < 2 minutes long, 2-3, 3-4, or more?

Moreover, in addition, it can power term-suggest aka auto-complete functionality, which enables your search application to suggest a completed word that the user is typing, which is based on the most commonly occurring words starting with what they have already typed. So if a user started typing siamese dr, then Solr might suggest that dreams is the most likely word, along with other alternatives.

Faceting, sometimes referred to as faceted navigation, is usually used to power user interfaces that display this summary information with clickable links that apply Solr filter queries to a subsequent search.

If we revisit the comparison of search technology to databases, then faceting is more or less analogous to SQL's group by feature on a column with count(*). However, in Solr, facet processing is performed subsequent to an existing search as part of a single request-response with both the primary search results and the faceting results coming back together. In SQL, you would need to potentially perform a series of separate queries to get the same information.

A quick example: Faceting release types

Observe the following search results. echoParams is set to explicit (defined in solrconfig.xml) so that the search parameters are seen here. This example is using the standard handler (though perhaps dismax is more typical). The query parameter q is *:*, which matches all documents. In this case, the index I'm using only has releases. If there were non-releases in the index, then I would add a filter fq=type%3ARelease to the URL or put this in the handler configuration, as that is the data set we'll be using for most of this article. I wanted to keep this example brief so I set rows to 2. Sometimes when using faceting, you only want the facet information and not the main search, so you would set rows to 0, if that is the case.

It's important to understand that the faceting numbers are computed over the entire search result, which is all of the releases in this example, and not just the two rows being returned.

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">160</int>
<lst name="params">
<str name="wt">standard</str>
<str name="rows">2</str>
<str name="facet">true</str>
<str name="q">*:*</str>
<str name="fl">*,score</str>
<str name="qt">standard</str>
<str name="facet.field">r_official</str>
<str name="f.r_official.facet.missing">true</str>
<str name="f.r_official.facet.method">enum</str>
<str name="indent">on</str>
</lst>
</lst>
<result name="response" numFound="603090" start="0" maxScore="1.0">
<doc>
<float name="score">1.0</float>
<str name="id">Release:136192</str>
<str name="r_a_id">3143</str>
<str name="r_a_name">Janis Joplin</str>
<arr name="r_attributes"><int>0</int><int>9</int>
<int>100</int></arr>
<str name="r_name">Texas International Pop Festival 11-30-69</str>
<int name="r_tracks">7</int>

<str name="type">Release</str>
</doc>
<doc>
<float name="score">1.0</float>
<str name="id">Release:133202</str>
<str name="r_a_id">6774</str>
<str name="r_a_name">The Dubliners</str>
<arr name="r_attributes"><int>0</int></arr>
<str name="r_lang">English</str>
<str name="r_name">40 Jahre</str>
<int name="r_tracks">20</int>
<str name="type">Release</str>
</doc>
</result>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="r_official">
<int name="Official">519168</int>
<int name="Bootleg">19559</int>
<int name="Promotion">16562</int>
<int name="Pseudo-Release">2819</int>
<int>44982</int>
</lst>
</lst>
<lst name="facet_dates"/>
</lst>
</response>

The facet related search parameters are highlighted at the top. The facet.missing parameter was set using the field-specific syntax, which will be explained shortly.

Notice that the facet results (highlighted) follow the main search result and are given a name facet_counts. In this example, we only faceted on one field, r_official, but you'll learn in a bit that you can facet on as many fields as you desire. The name attribute holds a facet value, which is simply an indexed term, and the integer following it is the number of documents in the search results containing that term, aka a facet count. The next section gives us an explanation of where r_official and r_type came from.

MusicBrainz schema changes

In order to get better self-explanatory faceting results out of the r_attributes field and to split its dual-meaning, I modified the schema and added some text analysis. r_attributes is an array of numeric constants, which signify various types of releases and it's official-ness, for lack of a better word. As it represents two different things, I created two new fields: r_type and r_official with copyField directives to copy r_attributes into them:

<field name="r_attributes" type="integer" multiValued="true" 
indexed="false" /><!-- ex: 0, 1, 100 -->
<field name="r_type" type="rType" multiValued="true"
stored="false" /><!-- Album | Single | EP |... etc. -->
<field name="r_official" type="rOfficial" multiValued="true"
stored="false" /><!-- Official | Bootleg | Promotional -->

And:

<copyField source="r_attributes" dest="r_type" />
<copyField source="r_attributes" dest="r_official" />

In order to map the constants to human-readable definitions, I created two field types: rType and rOfficial that use a regular expression to pull out the desired numbers and a synonym list to map from the constant to the human readable definition. Conveniently, the constants for r_type are in the range 1-11, whereas r_official are 100-103. I removed the constant 0, as it seemed to be bogus.

<fieldType name="rType" class="solr.TextField" sortMissingLast="true" 
omitNorms="true">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.PatternReplaceFilterFactory"
pattern="^(0|1dd)$" replacement="" replace="first" />
<filter class="solr.LengthFilterFactory" min="1" max="100" />
<filter class="solr.SynonymFilterFactory" synonyms="mb_attributes.txt"
ignoreCase="false" expand="false"/>
</analyzer>
</fieldType>

The definition of the type rOfficial is the same as rType, except it has this regular expression: ^(0|dd?)$.

The presence of LengthFilterFactory is to ensure that no zero-length (empty-string) terms get indexed. Otherwise, this would happen because the previous regular expression reduces text fitting unwanted patterns to empty strings.

The content of mb_attributes.txt is as follows:

# from: http://bugs.musicbrainz.org/browser/mb_server/trunk/
# cgi-bin/MusicBrainz/Server/Release.pm#L48
#note: non-album track seems bogus; almost everything has it
0=>Non-Album Track
1=>Album
2=>Single
3=>EP
4=>Compilation
5=>Soundtrack
6=>Spokenword
7=>Interview
8=>Audiobook
9=>Live
10=>Remix
11=>Other
100=>Official
101=>Promotion
102=>Bootleg
103=>Pseudo-Release

It does not matter if the user interface uses the name (for example: Official) or constant (for example: 100) when applying filter queries when implementing faceted navigation, as the text analysis will let the names through and will map the constants to the names. This is not necessarily true in a general case, but it is for the text analysis as I've configured it above.

The approach I took was relatively simple, but it is not the only way to do it. Alternatively, I might have split the attributes and/or mapped them as part of the import process. This would allow me to remove the multiValued setting in r_official. Moreover, it wasn't truly necessary to map the numbers to their names, as a user interface, which is going to present the data, could very well map it on the fly.

Field requirements

The principal requirement of a field that will be faceted on is that it must be indexed. In addition to all but the prefix faceting use case, you will also want to use text analysis that does not tokenize the text. For example, the value Non-Album Track is indexed the way it is in r_type. We need to be careful to escape the space where this appeared in mb_attributes.txt. Otherwise, faceting on this field would show tallies for Non-Album and Track separately. Depending on the type of faceting you want to do and other needs you have like sorting, you will often find it necessary to have a copy of a field just for faceting. Remember that with faceting, the facet values returned in search results are the actual terms indexed, and not the stored value, which isn't even used.

Solr 1.4 Enterprise Search Server Enhance your search with faceted navigation, result highlighting, fuzzy queries, ranked scoring, and more
Published: August 2009
eBook Price: $26.99
Book Price: $44.99
See more
Select your format and quantity:

(For more resources on Solr, see here.)

Types of faceting

Solr's faceting is broken down into three types. They are as follows:

  • field values (text): This is the most fundamental and common type of faceting that works off of the indexed terms, which is the result of text-analysis on an indexed field. It needn't necessarily be text, but it is treated this way. Most faceting parameters are for configuring this type. The count for such faceting is grouped in the output under the name facet_fields.
  • dates: This is for faceting on dates to count matching documents by equal date ranges. The facet counts are grouped in the output under facet_dates.
  • queries: This works quite differently by counting the number of documents matching each specified query. This type is usually used for number ranges. The facet counts are grouped in the output under facet_queries.

We will describe how to do these different types of facets. But before that, there is one common parameter to enable faceting:

  • facet: It defaults to blank. In order to enable faceting, you must set this to true or on. If this is not done, then the faceting parameters will be ignored.
  • In all of the examples here, we've obviously set facet=true.

    Faceting text

    The following request parameters are for typical text based facets. They need not literally be text but should not be indexed with one of the number or date field types.

    • facet.field: You must set this parameter to a field name in order to text-facet on that field. Repeat this parameter for each field to be faceted on. Solr, in essence, iterates over all of the indexed terms for the field and tallies a count for the number of searched documents that have the term. Solr then puts this in the response. Lucene's index makes this much faster than you might think. See the previous Field requirements section.
    • The remaining faceting parameters can be set on a per-field basis, otherwise they apply to all text faceted fields that don't have a field-specific setting. You will usually specify them per-field, especially if you are faceting on more than one field so that you don't get your faceting configuration mixed up. For brevity, many of these examples don't. For example: f.r_type.facet.sort=lex (r_type is a field name, facet.sort is a facet parameter).

    • facet.sort: It is set to either count to sort the facet values by descending totals or to lex to sort alphabetically. If facet.limit is greater than zero (which is true by default), then Solr picks count as the default, otherwise lex is chosen.
    • facet.limit: It defaults to 100. It limits the number of facet values returned in the search results of a field. As these are usually going to be displayed to the user, it doesn't make sense to have a large number of these in the response. If you are confident that the indexed terms fit a very limited vocabulary, then you might choose to disable the limit with a value of -1, which will change the default sort of them to alphabetic.
    • facet.offset: It defaults to 0. It is the index into the facet value list from which the values are returned. This enables paging of facet values when used with facet.limit. If there are lots of values and if you want the user to scan through them, then you might page them as opposed to just showing them the most popular ones.
    • facet.mincount: This defaults to 0. It filters out facet values that have facet counts less than this. This is applied before limit and offset so that paging works as expected.
    • facet.missing: It defaults to blank and is set to true or on for the facet value listing to include an unnamed count at the end, which is the number of searched documents that have no indexed terms. The first facet example demonstrates this.
    • facet.prefix: It filters the facet values to those starting with this value. See a later section for an example.
    • facet.method: Solr can be told to use either the enum or fc (field cache) algorithm to perform the faceting. The speed and memory usage of the query varies depending on your data. If you are faceting on a field that you know only has a small number of values (say less than 50), then it is advisable to explicitly set this to enum. When faceting on multiple fields, remember to set this for the specific fields desired and not universally for all facets. The request handler configuration is a good place to put this.

    Alphabetic range bucketing (A-C, D-F, and so on)

    Solr does not directly support alphabetic range bucketing (A-C, D-F, and so on). However, with a creative application of text analysis and a dedicated field, we can achieve this with little effort. Let's say we want to have these range buckets on the release names. We need to extract the first character of r_name, and store this into a field that will be used for this purpose. We'll call it r_name_facetLetter. Here is our field definition:

    <field name="r_name_facetLetter" type="bucketFirstLetter" stored="false" />

    And here is the copyField:

    <copyField source="r_name" dest="r_name_facetLetter" />

    The definition of the type bucketFirstLetter is the following:

    <fieldType name="bucketFirstLetter" class="solr.TextField" sortMissingLast="true" omitNorms="true">
    <analyzer type="index">
    <tokenizer class="solr.PatternTokenizerFactory" pattern="^([a-zA-Z]).*" group="1" />
    <filter class="solr.SynonymFilterFactory" synonyms="mb_letterBuckets.txt"
    ignoreCase="true"expand="false"/>
    </analyzer>
    <analyzer type="query">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    </analyzer>
    </fieldType>

    The PatternTokenizerFactory, as configured, plucks out the first character, and the SynonymFilterFactory maps each letter of the alphabet to a range like A-C. The mapping is in conf/mb_letterBuckets.txt. The field types used for faceting generally have a KeywordTokenizerFactory for the query analysis to satisfy a possible filter query on a given facet value returned from a previous faceted search. After validating these changes with Solr's analysis admin screen, we then re-index the data. For the facet query, we're going to advise Solr to use the enum method, because there aren't many facet values in total. Here's the URL to search Solr:
    http://localhost:8983/solr/select?indent=on&q=*%3A*&qt=standard&wt=standard&facet=on&facet.field=r_name_facetLetter&facet.sort=lex&facet.missing=on&facet.method=enum

    The URL produced results containing the following facet data:

    <lst name="facet_counts">
    <lst name="facet_queries"/>
    <lst name="facet_fields">
    <lst name="r_name_facetLetter">
    <int name="A-C">99005</int>
    <int name="D-F">68376</int>
    <int name="G-I">60569</int>
    <int name="J-L">49871</int>
    <int name="M-O">59006</int>
    <int name="P-R">47032</int>
    <int name="S-U">143376</int>
    <int name="V-Z">33233</int>
    <int>42622</int>
    </lst>
    </lst>
    <lst name="facet_dates"/>
    </lst>
    <lst name="facet_dates"/>
    </lst>

    Faceting dates

    Solr has built-in support for faceting a date field by a range and divided interval. You can think of this as a convenient feature instead of being forced to use the more awkward facet queries described after this. Unfortunately, this feature does not extend to numeric types yet. I'll demonstrate a quick example against MusicBrainz release dates, and then describe the parameters and their options.

    <response>
    <lst name="responseHeader">
    <int name="status">0</int>
    <int name="QTime">145</int>
    <lst name="params">
    <str name="facet.date">r_event_date_earliest</str>
    <str name="facet.date.end">NOW/YEAR</str>
    <str name="facet.date.gap">+1YEAR</str>
    <str name="facet.date.other">all</str>
    <str name="rows">0</str>
    <str name="facet">on</str>
    <str name="indent">on</str>
    <str name="echoParams">explicit</str>
    <str name="q">smashing</str>
    <str name="qt">mb_releases</str>
    <str name="f.r_event_date_earliest.facet.date.start">NOW/YEAR-5YEARS</str>
    </lst>
    </lst>
    <result name="response" numFound="248" start="0"/>
    <lst name="facet_counts">
    <lst name="facet_queries"/>
    <lst name="facet_fields"/>
    <lst name="facet_dates">
    <lst name="r_event_date_earliest">
    <int name="2004-01-01T00:00:00Z">1</int>
    <int name="2005-01-01T00:00:00Z">1</int>
    <int name="2006-01-01T00:00:00Z">3</int>
    <int name="2007-01-01T00:00:00Z">11</int>
    <int name="2008-01-01T00:00:00Z">0</int>
    <str name="gap">+1YEAR</str>
    <date name="end">2009-01-01T00:00:00Z</date>
    <int name="before">95</int>
    <int name="after">0</int>
    <int name="between">16</int>
    </lst>
    </lst>
    </lst>
    </response>

    This example demonstrates a few things, not only date faceting:

    • qt=mb_releases is a dismax query type handler and ensures that we're looking at releases.
    • q=smashing indicates that we're faceting on a search instead of all the documents, granted we kept the rows at zero, which is unrealistic but not pertinent.
    • The facet start date was specified using the field specific syntax. It is just a demonstration. We'd probably do this with every parameter.
    • The part below the facet counts indicates the upper bound of the last date facet count. It may or may not be the same as facet.date.end (see facet.date.hardend explained in the next section).
    • The before, after, and between counts are for specifying facet.date.other.

    Date facet parameters

    All of the date faceting parameters start with facet.date. As with most other faceting parameters, they can be made field specific in the same way. The parameters are explained as follows:

    • facet.date: You must set this parameter to your date field's name to date-facet on that field. Repeat this parameter for each date field to be faceted on.
    • The remainder of these date faceting parameters can be specified on a per-field basis in the same fashion that the non-date parameters can. For example, f.r_event_date_earliest.facet.date.start.

    • facet.date.start: Mandatory, this is a date to specify the start of the range to facet on. The syntax is the same as used elsewhere in Solr. Using NOW withsome Solr date math is quite effective as in this example: /NOW/YEAR-5YEARS, which is interpreted as five years ago, starting at the beginning of the year.
    • facet.date.end: Mandatory, this is a date to specify the end of the range exclusively. It has the same syntax as facet.date.start. Note that the actual end of the range may be different (see facet.date.hardend).
    • facet.date.gap: Mandatory, this specifies the time interval to divide the range. It uses a subset of Solr's Date Math syntax, as it's a time duration and not a particular time. It should always start with a +. Examples: +1YEAR or +1MINUTE+30SECONDS. Note that after URL encoding, + becomes %3B.
    • facet.date.hardend: It defaults to false. This parameter instructs Solr on what to do when facet.date.gap does not divide evenly into the facet date range (start->end). If this is true, then the last date span will have a smaller duration than the others. Moreover, you will observe that the end date value in the facet results is the same as facet.date.end. Otherwise, by default, the end is essentially increased sufficiently so that the date spans are all equal.
    • facet.date.other: It defaults to none. This parameter adds more faceting counts depending on its value. It can be specified multiple times. See the example using this at the start of this section.
      • before: count of documents before the faceted range
      • after: count of documents following the faceted range
      • between: documents within the faceted range (somewhat redundant)
      • none: (disabled) the default
      • all: shortcut for all three (before, between, and after)

    Faceting on arbitrary queries

    This is the final type of facet, and it offers a lot of flexibility. Instead of choosing a field to facet on its values (whether text based or date), we specify some number of Solr queries that each itself becomes a facet. For each facet query specified, the number of search results matching the query is counted, and this number is returned in the results. As with all other faceting, the set of documents that are faceted is the search result, which is q less any filtered with fq.

    There is only one parameter for configuring facet queries:

    • facet.query: A Solr query to be evaluated over the search results. The number of matching documents is  returned as an entry in the results next to this query. Specify this multiple times to have Solr evaluate multiple facet queries.

    As facet queries are the only way to facet for numeric ranges, we'll use that as an example. In our MusicBrainz tracks index, there is a field named t_duration, which is how long the song is in seconds. In the search below, we've used echoParams for making the search parameters clear.

    <response>
    <lst name="responseHeader">
    <int name="status">0</int>
    <int name="QTime">106</int>
    <lst name="params">
    <str name="indent">on</str>
    <str name="rows">0</str>
    <str name="q">t_name:Geek</str>
    <arr name="facet.query">
    <str>t_duration:[* TO 119]</str>
    <str>t_duration:[120 TO 179]</str>
    <str>t_duration:[180 TO 239]</str>
    <str>t_duration:[240 TO *]</str>
    </arr>
    <str name="facet">true</str>
    </lst>
    </lst>
    <result name="response" numFound="200" start="0"/>
    <lst name="facet_counts">
    <lst name="facet_queries">
    <int name="t_duration:[* TO 119]">55</int>
    <int name="t_duration:[120 TO 179]">36</int>
    <int name="t_duration:[180 TO 239]">64</int>
    <int name="t_duration:[240 TO *]">45</int>
    </lst>
    <lst name="facet_fields"/>
    <lst name="facet_dates"/>
    </lst>
    </response>

    In this example, the facet.query parameter was specified four times to divide a range of numbers into four buckets: less than 2 minutes, 2 to < 3 minutes, 3 to < 4 minutes and > 4 minutes. These numbers add up to 200, which is the total number of documents. Note that the queries need not be disjointed, but they were in this example. It's certainly possible to query for dates using various range durations and to reference other fields in the facet queries too, whatever Solr query suits your needs.

    Solr 1.4 Enterprise Search Server Enhance your search with faceted navigation, result highlighting, fuzzy queries, ranked scoring, and more
    Published: August 2009
    eBook Price: $26.99
    Book Price: $44.99
    See more
    Select your format and quantity:

    (For more resources on Solr, see here.)

    Excluding filters

    Consider a scenario where you are implementing faceted navigation and you want to let the user pick several values  of a field to filter on instead of just one. Typically, when an individual facet value is chosen, this becomes a filter  that would cause any other value in that field to have a zero facet count, if it would even show up at all. In this  scenario, we'd like to exclude this filter for this facet. I'll demonstrate this with a before and after  clause.

    Here is a search for releases containing smashing, faceting on r_type. We'll leave rows at 0 for brevity, but observe the numFound value nonetheless. At this point, the user has not chosen a filter (therefore no fq).
    http://localhost:8983/solr/select?indent=on&qt=mb_releases&rows=0&q=smashing&facet=on&facet.field=r_type&facet.mincount=1&facet.sort=lex

    And the output of the previous URL is:

    <?xml version="1.0" encoding="UTF-8"?>
    <response>
    <lst name="responseHeader">
    <int name="status">0</int>
    <int name="QTime">24</int>
    </lst>
    <result name="response" numFound="248" start="0"/>
    <lst name="facet_counts">
    <lst name="facet_queries"/>
    <lst name="facet_fields">
    <lst name="r_type">
    <int name="Album">29</int>
    <int name="Compilation">41</int>
    <int name="EP">7</int>
    <int name="Interview">3</int>
    <int name="Live">95</int>
    <int name="Other">19</int>
    <int name="Remix">1</int>
    <int name="Single">45</int>
    <int name="Soundtrack">1</int>
    </lst>
    </lst>
    <lst name="facet_dates"/>
    </lst>
    </response>

    Now the user chooses the Album facet value that interests him/her. This adds a filter query. As a result, now the URL is as before but has &fq=r_type%3AAlbum at the end and has this output:

    <response>
    <lst name="responseHeader">
    <int name="status">0</int>
    <int name="QTime">17</int>
    </lst>
    <result name="response" numFound="29" start="0"/>
    <lst name="facet_counts">
    <lst name="facet_queries"/>
    <lst name="facet_fields">
    <lst name="r_type">
    <int name="Album">29</int>
    </lst>
    </lst>
    <lst name="facet_dates"/>
    </lst>
    </response>

    Notice that the other r_type facet counts are gone because of the filter, yet we want these so that we can give the user a choice for expanding the filter. The reduced numFound of 29 is good though, because at this moment the user did indeed filter on a value so far.

    The solution: Local Params

    Solr can solve this problem with some additional metadata on both the filter query and the facet field reference using a new and obscure Solr feature called Local Params. Local Params are name-value parameters inserted at the start of a query and in some other places like facet field references. The previous example would change as follows:

    • fq would now be {!tag=foo}r_type:Album
    • facet.field would now be {!ex=foo}r_type
    • Remember to URL Encode this added syntax when used in the URL. The only problem character is =, which becomes %3D.

    Explanation:

    • tag is a local parameter to arbitrarily label a parameter.
    • The name foo was an arbitrarily chosen tag name, it truly doesn't matter what it's named. If multiple fields and filter queries are to be tagged correspondingly, then you could use the field name as the tag name to differentiate them consistently.
    • ex is a local parameter on a facet field that refers to tagged filter queries to be excluded in the facet count. Multiple tags can be referenced by commas separating them. For example: {!ex=t1,t2,t3}r_type.

    The new complete URL is:

    http://localhost:8983/solr/select?indent=on&qt=mb_releases&rows=0&q=smashing&facet=on&facet.field={!ex%3Dfoo}r_type&facet.mincount=1&facet.sort=lex&fq={!tag%3Dfoo}r_type%3AAlbum.

    And here is the output. The facet counts are back, but numFound remains at the filtered 29:

    <response>
    <lst name="responseHeader">
    <int name="status">0</int>
    <int name="QTime">4</int>
    </lst>
    <result name="response" numFound="29" start="0"/>
    <lst name="facet_counts">
    <lst name="facet_queries"/>
    <lst name="facet_fields">
    <lst name="r_type">
    <int name="Album">29</int>
    <int name="Compilation">41</int>
    <int name="EP">7</int>
    <int name="Interview">3</int>
    <int name="Live">95</int>
    <int name="Other">19</int>
    <int name="Remix">1</int>
    <int name="Single">45</int>
    <int name="Soundtrack">1</int>
    </lst>
    </lst>
    <lst name="facet_dates"/>
    </lst>
    </response>

    At this point, if the user chooses additional values from this facet, then the filter query can be modified to allow for more possibilities, such as: fq={!tag%3Dfoo}r_type%3AAlbum+r_type%3AOther, which filters for releases that are either of type Album or Other.

    Facet prefixing (term suggest)

    When one thinks of faceting, one doesn't think of term-suggest, aka auto-complete. Within Solr, however, the faceting technology is suited for this purpose too.

    For this example, we have a text box containing:

    smashing pu

    All of the words in the user's text box except the last one become the main query for the term-suggest. We may want to make it a phrase query. For our example, this is just smashing. If there isn't anything, then we'd want to ensure that the search handler used would query for all documents. The faceted field is r_name, and we want to sort by occurrence. We also want there to be at least one occurrence, and we probably don't want more than ten values. We don't need the actual search results either. This leaves the facet.prefix faceting parameter to make this work. This parameter filters the facet values to those starting with this value.

    Remember that facet values are the final result of text analysis, and therefore are probably lowercased for fields you might want to do term completion on. You'll need to pre-process the prefix value similarly, or else nothing will be found.

    We're obviously going to set this to pu, the last word that the user has partially typed. Here is a URL for such a search:

    http://localhost:8983/solr/select?q=smashing&qt=mb_releases&wt=json&indent=on&facet=on&rows=0&facet.limit=10&facet.mincount=1&facet.field=r_name&facet.prefix=pu

    In this example, we're going to use the JSON output format. Here is the result:

    {
    "responseHeader":{
    "status":0,
    "QTime":5},
    "response":{"numFound":248,"start":0,"docs":[]
    },
    "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
    "r_name":[
    "pumpkins",10,
    "pumpkin",2,
    "pure",2,
    "pumpehuset",1,
    "punk",1]},
    "facet_dates":{}}}


    This is exactly the information needed to fill up a pop-up box of choices that the user can conveniently choose.

    However, there are some issues to be aware of with this feature:

    • You may want to retain the case information of what the user is typing so that it can then be re-applied to the Solr results. Remember that facet.prefix will probably need to be lowercased depending on text analysis.
    • If stemming text analysis is performed on the field at the time of indexing, then the user is going to get junk. Either don't do stemming or use an additional field for suitable text analysis of this feature.
    • If you would like to do term-completion of multiple fields, then you'll be disappointed that you can't do that directly. The easiest way is to combine several fields at index-time. Alternatively, a query searching multiple fields with faceting configured for multiple fields can be done. It would be up to you to merge the faceting results based on ordered counts.

    Summary

    We covered a lot on faceting in Solr. Faceting is beyond searching. It is possibly the most valuable and popular search component. Just to summarize, we covered field requirements, types of faceting, faceting text and alphabetic range bucketing. We also learned about faceting on arbitrary queries, excluding filters and faceting dates which includes date facet parameters.


    Further resources on this subject:


    About the Author :


    David Smiley

    Born to code, David Smiley is a software engineer that’s passionate about search, Lucene, spatial, and open-source. He has a great deal of expertise with Lucene and Solr, which started in 2008 at MITRE. In 2009 as the lead author, he wrote Solr 1.4 Enterprise Search Server, the first book on Solr, published by PACKT. It was updated in 2011, and again for this third edition. After the first book, he developed a one and two-day Solr training courses delivered a half dozen times within MITRE, and he delivered LucidWorks’ training once too. Most of his excitement and energy relating to Lucene is centered on Lucene’s spatial module to include Spatial4j, which he is largely responsible for. He presented his progress on this at Lucene Revolution and other conferences several times. Finally, he currently holds committer / Project Management Committee (PMC) status with the Lucene/Solr open-source project. During all this time, David has staked his career on search, working exclusively on such projects, formerly for MITRE, and now as an independent consultant for various clients. You can reach him at dsmiley@apache.org.

    Books From Packt

    Learning jQuery 1.3
    Learning jQuery 1.3

    Magento: Beginner's Guide
    Magento: Beginner's Guide

    Building Powerful and Robust Websites with Drupal 6
    Building Powerful and Robust Websites with Drupal 6

    Joomla! 1.5 Development Cookbook [RAW]
    Joomla! 1.5 Development Cookbook [RAW]

    Joomla! 1.5 SEO
    Joomla! 1.5 SEO

    Zend Framework 1.8 Web Application Development
    Zend Framework 1.8 Web Application Development

    Symfony 1.3 web application development
    Symfony 1.3 web application development

    FreePBX 2.5 Powerful Telephony Solutions
    FreePBX 2.5 Powerful Telephony Solutions

     

    Your rating: None Average: 4 (2 votes)
    Thanks by
    One of the best examples I've seen.
    Rivulet Enterprise Search Community Edition 0.9 by
    Rivulet ES is an open source enterprise search server based on the Lucene Java search library And Solr,Like to use Solr can use the same Rivulet ES, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface. It runs in a Java servlet container such as Tomcat,In addition, Rivulet ES adds a visual management and control platform, most of the functions of Solr way through WEB access and use, you can collect the data source, including file system, Network File System, CMS, ECM, the database can be collected IBM's content management system, EMC documentum, Rivulet ES the most important feature is the ability to customize data from different sources show different ways See https://sourceforge.net/projects/rivu/ for more information.

    Post new comment

    CAPTCHA
    This question is for testing whether you are a human visitor and to prevent automated spam submissions.
    z
    v
    R
    E
    X
    U
    Enter the code without spaces and pay attention to upper/lower case.
    Code Download and Errata
    Packt Anytime, Anywhere
    Register Books
    Print Upgrades
    eBook Downloads
    Video Support
    Contact Us
    Awards Voting Nominations Previous Winners
    Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
    Resources
    Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software