Integrating Solr: Ruby on Rails Integration

Exclusive offer: get 50% off this eBook here
Solr 1.4 Enterprise Search Server

Solr 1.4 Enterprise Search Server — Save 50%

Enhance your search with faceted navigation, result highlighting, fuzzy queries, ranked scoring, and more

$26.99    $13.50
by David Smiley Eric Pugh | September 2010 | Open Source

There has been a lot of churn in the Ruby on Rails world for adding Solr support, with a number of competing libraries and approaches attempting to add Solr support in the most Rails-native way. Rails brought to the forefront the idea of Convention over Configuration. In most traditional web development software, from ColdFusion, to Java EE, to .NET, the framework developers went with the approach that their framework should solve any type of problem and work with any kind of data model. This led to these frameworks requiring massive amounts of configuration, typically by hand. It wasn't unusual to see that adding a column to a user record would require modifying the database, a data access object, a business object, and the web tier. Four changes in four different files to add a new field! While there were many attempts to streamline this, from using annotations to tooling like IDE's and Xdoclet, all of them were band-aids over the fundamental problem of too much configurability. The Rails sweet spot for development is exposing an SQL database to the web. Add a column to the database and it is now part of your object relational model with no additional coding. The various libraries for integrating Solr in Ruby on Rails applications attempt to follow this idea of Convention over Configuration in how they interact with Solr. However, often there are a lot of mysterious rules (conventions!) to learn, such as prefixing String schema fields with _s when developing the Solr schema.

In this article by,David Smiley and Eric Pugh, authors of the book Solr 1.4 Enterprise Search Server, we will look at accessing Solr results through the language based client, Ruby.

(For more resources on Solr, see here.)

The classic plugin for Rails is acts_as_solr that allows Rails ActiveRecord objects to be transparently stored in a Solr index. Other popular options include Solr Flare and rsolr. An interesting project is Blacklight, a tool oriented towards libraries putting their catalogs online. While it attempts to meet the needs of a specific market, it also contains many examples of great Ruby techniques to leverage in your own projects.

You will need to turn on the Ruby writer type in solrconfig.xml:

<queryResponseWriter name="ruby"
class="org.apache.solr.request.RubyResponseWriter"/>

The Ruby hash structure has some tweaks to fit Ruby, such as translating nulls to nils, using single quotes for escaping content, and the Ruby => operator to separate key-value pairs in maps. Adding a wt=ruby parameter to a standard search request returns results in a Ruby hash structure like this:

{
'responseHeader'=>{
'status'=>0,
'QTime'=>1,
'params'=>{
'wt'=>'ruby',
'indent'=>'on',
'rows'=>'1',
'start'=>'0',
'q'=>'Pete Moutso'}},
'response'=>{'numFound'=>523,'start'=>0,'docs'=>[
{
'a_name'=>'Pete Moutso',
'a_type'=>'1',
'id'=>'Artist:371203',
'type'=>'Artist'}]
}}

acts_as_solr

A very common naming pattern for plugins in Rails that manipulate the database backed object model is to name them acts_as_X. For example, the very popular acts_as_list plugin for Rails allows you to add list semantics, like first, last, move_next to an unordered collection of items. In the same manner, acts_as_solr takes ActiveRecord model objects and transparently indexes them in Solr. This allows you to do fuzzy queries that are backed by Solr searches, but still work with your normal ActiveRecord objects. Let's go ahead and build a small Rails application that we'll call MyFaves that both allows you to store your favorite MusicBrainz artists in a relational model and allows you to search for them using Solr.

acts_as_solr comes bundled with a full copy of Solr 1.3 as part of the plugin, which you can easily start by running rake solr:start. Typically, you are starting with a relational database already stuffed with content that you want to make searchable. However, in our case we already have a fully populated index available in /examples, and we are actually going to take the basic artist information out of the mbartists index of Solr and populate our local myfaves database with it. We'll then fire up the version of Solr shipped with acts_as_solr, and see how acts_as_solr manages the lifecycle of ActiveRecord objects to keep Solr's indexed content in sync with the content stored in the relational database. Don't worry, we'll take it step by step! The completed application is in /examples/8/myfaves for you to refer to.

Setting up MyFaves project

We'll start with the standard plumbing to get a Rails application set up with our basic data model:

>>rails myfaves
>>cd myfaves
>>./script/generate scaffold artist name:string group_type:string
release_date:datetime image_url:string
>>rake db:migrate

This generates a basic application backed by an SQLite database. Now we need to install the acts_as_solr plugin.

acts_as_solr has gone through a number of revisions, from the original code base done by Erik Hatcher and posted to the solr-user mailing list in August of 2006, which was then extended by Thiago Jackiw and hosted on Rubyforge. Today the best version of acts_as_solr is hosted on GitHub by Mathias Meyer at http://github.com/ mattmatt/acts_as_solr/tree/master. The constant migration from one site to another leading to multiple possible 'best' versions of a plugin is unfortunately a very common problem with Rails plugins and projects, though most are settling on either RubyForge.org or GitHub.com.

In order to install the plugin, run:

 

>>script/plugin install git://github.com/mattmatt/acts_as_solr.gitt

We'll also be working with roughly 399,000 artists, so obviously we'll need some page pagination to manage that list, otherwise pulling up the artists /index listing page will timeout:

 

>>script/plugin install git://github.com/mislav/will_paginate.git

Edit the ./app/controllers/artists_controller.rb file, and replace in the index method the call to @artists = Artist.find(:all) with:

@artists = Artist.paginate :page => params[:page], :order =>
'created_at DESC'

Also add to ./app/views/artists/index.html.erb a call to the view helper to generate the page links:

<%= will_paginate @artists %>

Start the application using ./script/server, and visit the page http://localhost:3000/artists/. You should see an empty listing page for all of the artists. Now that we know the basics are working, let's go ahead and actually leverage Solr.

Populating MyFaves relational database from Solr

Step one will be to import data into our relational database from the mbartists Solr index. Add the following code to ./app/models/artist.rb:

class Artist < ActiveRecord::Base
acts_as_solr :fields => [:name, :group_type, :release_date]
end

The :fields array of hashes maps the attributes of the Artist ActiveRecord object to the artist fields in Solr's schema.xml. Because acts_as_solr is designed to store data in Solr that is mastered in your data model, it needs a way of distinguishing among various types of data model objects. For example, if we wanted to store information about our User model object in Solr in addition to the Artist object then we need to provide a type_field to separate the Solr documents for the artist with the primary key of 5 from the user with the primary key of 5. Fortunately the mbartists schema has a field named type that stores the value Artist, which maps directly to our ActiveRecord class name of Artist and we are able to use that instead of the default acts_as_solr type field in Solr named type_s.

There is a simple script called populate.rb at the root of /examples/8/myfaves that you can run that will copy the artist data from the existing Solr mbartists index into the MyFaves database:

>>ruby populate.rb

populate.rb is a great example of the types of scripts you may need to develop to transfer data into and out of Solr. Most scripts typically work with some sort of batch size of records that are pulled from one system and then inserted into Solr. The larger the batch size, the more efficient the pulling and processing of data typically is at the cost of more memory being consumed, and the slower the commit and optimize operations are. When you run the populate.rb script, play with the batch size parameter to get a sense of resource consumption in your environment. Try a batch size of 10 versus 10000 to see the changes. The parameters for populate.rb are available at the top of the script:

MBARTISTS_SOLR_URL = 'http://localhost:8983/solr/mbartists'
BATCH_SIZE = 1500
MAX_RECORDS = 100000 # the maximum number of records to load,
or nil for all

There are roughly 399,000 artists in the mbartists index, so if you are impatient, then you can set MAX_RECORDS to a more reasonable number.

The process for connecting to Solr is very simple with a hash of parameters that are passed as part of the GET request. We use the magic query value of *:* to find all of the artists in the index and then iterate through the results using the start parameter:

connection = Solr::Connection.new(MBARTISTS_SOLR_URL)
solr_data = connection.send(Solr::Request::Standard.new({
:query => '*:*',
:rows=> BATCH_SIZE,
:start => offset,
:field_list =>['*','score']
}))

In order to create our new Artist model objects, we just iterate through the results of solr_data. If solr_data is nil, then we exit out of the script knowing that we've run out of results. However, we do have to do some parsing translation in order to preserve our unique identifiers between Solr and the database. In our MusicBrainz Solr schema, the ID field functions as the primary key and looks like Artist:11650 for The Smashing Pumpkins. In the database, in order to sync the two, we need to insert the Artist with the ID of 11650. We wrap the insert statement a.save! in a begin/rescue/end structure so that if we've already inserted an artist with a primary key, then the script continues. This just allows us to run the populate script multiple times:

solr_data.hits.each do |doc|
id = doc["id"]
id = id[7..(id.length)]
a = Artist.new(:name => doc["a_name"], :group_type => a["a_type"],
:release_date => doc["a_release_date_latest"])
a.id = id
begin
a.save!
rescue ActiveRecord::StatementInvalid => ar_si
raise ar_si unless ar_si.to_s.include?("PRIMARY KEY must be
unique") #sink duplicates
end
end

Now that we've transferred the data out of our mbartists index and used acts_as_solr according to the various conventions that it expects, we'll change from using the mbartists Solr instance to the version of Solr shipped with acts_as_solr.

Solr related configuration information is available in ./myfaves/config/solr.xml. Ensure that the default development URL doesn't conflict with any existing Solr's you may be running:

development:
url: http://127.0.0.1:8982/solr

Start the included Solr by running rake solr:start. When it starts up, it will report the process ID for Solr running in the background. If you need to stop the process, then run the corresponding rake task: rake solr:stop. The empty new Solr indexes are stored in ./myfaves/solr/development.

Build Solr indexes from relational database

Now we are ready to trigger a full index of the data in the relational database into Solr. acts_as_solr provides a very convenient rake task for this with a variety of parameters that you can learn about by running rake -D solr:reindex. We'll specify to work with a batch size of 1500 artists at a time:

>>rake solr:start
>>% rake solr:reindex BATCH=1500
(in /examples/8/myfaves)
Clearing index for Artist...
Rebuilding index for Artist...
Optimizing...

This drastic simplification of configuration in the Artist model object is because we are using a Solr schema that is designed to leverage the Convention over Configuration ideas of Rails. Some of the conventions that are established by acts_as_solr and met by Solr are:

  • Primary key field for model object in Solr is always called pk_i.
  • Type field that stores the disambiguating class name of the model object is called type_s.
  • Heavy use of the dynamic field support in Solr. The data type of ActiveRecord model objects is based on the database column type. Therefore, when acts_as_solr indexes a model object, it sends a document to Solr with the various suffixes to leverage the dynamic column creation. In /examples/8/myfaves/vendor/plugins/acts_as_solr/solr/solr/conf/ schema.xml, the only fields defined outside of the management fields are dynamic fields:

    <dynamicField name="*_t" type="text" indexed="true"
    stored="false"/>

  • The default search field is called text. And all of the fields ending in _t are copied into the text search field.
  • Fields to facet on are named _facet and copied into the text search field as well.

The document that gets sent to Solr for our Artist records creates the dynamic fields name_t, group_type_s and release_date_d, for a text, string, and date field respectively. You can see the list of dynamic fields generated through the schema browser at http://localhost:8982/solr/admin/schema.jsp.

Now we are ready to perform some searches. acts_as_solr adds some new methods such as find_by_solr() that lets us find ActiveRecord model objects by sending a query to Solr. Here we find the group Smash Mouth by searching for matches to the word smashing:

% ./script/console
Loading development environment (Rails 2.3.2)
>> artists = Artist.find_by_solr("smashing")
=> #<ActsAsSolr::SearchResults:0x224889c @solr_data={:total=>9,
:docs=>[#<Artist id: 364, name: "Smash Mouth"...
>> artists.docs.first
=> #<Artist id: 364, name: "Smash Mouth", group_type: 1,
release_date: "2006-09-19 04:00:00", created_at: "2009-04-17
18:02:37", updated_at: "2009-04-17 18:02:37">

Let's also verify that acts_as_solr is managing the full lifecycle of our objects. Assuming Susan Boyle isn't yet entered as an artist, let's go ahead and create her:

 

>> Artist.find_by_solr("Susan Boyle")
=> #<ActsAsSolr::SearchResults:0x26ee298 @solr_data={:total=>0,
:docs=>[]}>
>> susan = Artist.create(:name => "Susan Boyle", :group_type => 1,
:release_date => Date.new)
=> #<Artist id: 548200, name: "Susan Boyle", group_type: 1,
release_date: "-4712-01-01 05:00:00", created_at: "2009-04-21
13:11:09", updated_at: "2009-04-21 13:11:09">

Check the log output from your Solr running on port 8982, and you should also have seen an update query triggered by the insert of the new Susan Boyle record:

INFO: [] webapp=/solr path=/update params={} status=0 QTime=24

Now, if we delete Susan's record from our database:

>> susan.destroy
=> #<Artist id: 548200, name: "Susan Boyle", group_type: 1,
release_date: "-4712-01-01 05:00:00", created_at: "2009-04-21
13:11:09", updated_at: "2009-04-21 13:11:09">
=> #<Artist id: 548200, name: "Susan Boyle", group_type: 1,
release_date: "-4712-01-01 05:00:00", created_at: "2009-04-21
13:11:09", updated_at: "2009-04-21 13:11:09">

Then there should be another corresponding update issued to Solr to remove the document:

INFO: [] webapp=/solr path=/update params={} status=0 QTime=57

You can verify this by doing a search for Susan Boyle directly, which should return no rows at http://localhost:8982/solr/select/?q=Susan+Boyle.

Solr 1.4 Enterprise Search Server Enhance your search with faceted navigation, result highlighting, fuzzy queries, ranked scoring, and more
Published: August 2009
eBook Price: $26.99
Book Price: $44.99
See more
Select your format and quantity:

Read more about this book

(For more resources on Solr, see here.)

Complete MyFaves web site

Now, let's go ahead and put in the rest of the logic for using our Solr-ized model objects to simplify finding our favorite artists. We'll store the list of favorite artists in the browser's session space for convenience. If you are following along with your own generated version of MyFaves application, then the remaining files you'll want to copy over from /examples/8/myfaves are as follows:

  • ./app/controller/myfaves_controller.rb contains the controller logic for picking your favorite artists.
  • ./app/views/myfaves/ contains the display files for picking and showing the artists.
  • ./app/views/layouts/myfaves.html.erb is the layout for the MyFaves views. We use the Autocomplete widget again, so this layout embeds the appropriate JavaScript and CSS files.
  • ./public/javascripts/blackbirdjs/ contains everything required to use the Blackbird logging library.
  • ./public/stylesheets/jquery.autocomplete.css and ./public/ stylesheets/indicator.gif are stored locally in order to fix pathing issues with the indicator.gif showing up when the autocompletion search is running.

The only other edits you should need to make are:

  • Edit ./config/routes.rb by adding map.resources :myfaves and map.root :controller => "myfaves".
  • Delete ./public/index.html to use the new root route.
  • Copy the index method out of ./app/controllers/artists_controllers. rb, because we want the index method to respond with both HTML and JSON response types.
  • Run rake db:sessions:create to generate a sessions table, then rake db: migrate to update the database with the new sessions table. Edit ./config/ environment.rb and add config.action_controller.session_store = :active_record_store. As we are storing Artist model objects in our session, we need to store them in the database versus in a cookie for space reasons.

You should now be able to run ./script/server and browse to http://localhost:3000/myfaves. You will be prompted to enter an artist's name to search for. If you don't receive any results, then make sure you have started Solr using rake solr:start. Also, if you have only loaded a subset of the full 399,000 artists, then your choices may be limited. You can load all of the artists through the populate.rb script and then run rake solr:reindex, it will take a long time. Something good to do just before you head out for lunch or home for the evening!

If you look at ./app/views/myfaves/index.rhtml, then you can see the jQuery autocomplete call is a bit different:

$("#artist_name").autocomplete( '/artists.json?callback=?', {

The URL we are hitting is /artists.json, with the .json suffix telling Rails that we want JSON data back instead of normal HTML. If we ended the URL with .xml, then we would have received XML formatted data about the artists. We provide a slightly different parameter to Rails to specify the JSONP callback to use. Unlike the previous example, where we used json.wrf, which is Solr's parameter name for the callback method to call, we use the more standard parameter name callback. We changed the ArtistController index method to handle the autocomplete widgets data needs through JSONP. If there is a q parameter, then we know the request was from the autocomplete widget, and we ask Solr for the @artists to respond with. Later on, we render @artists into JSON objects, returning only the name and id attributes to keep the payload small. We also specify that the JSONP callback method is what was passed when using the callback parameter:

def index
if params[:q]
@artists = Artist.find_by_solr(params[:q], :limit =>
params[:limit]).docs
else
@artists = Artist.paginate :page => params[:page], :order =>
'created_at DESC'
end
respond_to do |format|
format.html # index.html.erb
format.xml { render :xml => @artists }
format.json { render :json => @artists.to_json(:only => [:name,
:id]), :callback => params[:callback] }
end
end

At the end of all of this, you should have a nice interface for quickly picking artists:

When you are selecting acts_as_solr as your integration method, you are implicitly agreeing to the various conventions established for indexing data into Solr. acts_as_solr is a wonderful solution if you are indexing just a few unrelated models and don't have multiple data sources feeding your Solr indexes. While acts_as_solr has evolved to support more complex solutions (for example, by adding faceting support or the ability to perform more complex mappings with custom logic), it has its limits.

If you have a very complex data model with lots of inter-relationships that do not more or less map one-to-one with what you'd expect from search results, then you may find yourself running into edge cases that acts_as_solr doesn't support cleanly—especially if you are doing searches against specific fields in Solr versus the default text field. However, if your requirement is to quickly get your ActiveRecord model objects searchable, then acts_as_solr can't be beat!

Blacklight OPAC

Blacklight is an open source Online Public Access Catalog (OPAC) that demonstrates the power of a highly configurable Ruby on Rails frontend paired with Solr. OPACs are the modern web enabled version of the classic card catalog that allow libraries to easily put their collections online. Blacklight supports parsing of various standard library catalog storage formats including MARC records and TEI XML format. Blacklight 2.0 was released in March of 2009 as a Rails Engine plugin. Rails Engine plugins allow users to integrate the rich functionality of the plugin, while keeping the plugin related code and assets, such as JavaScript, CSS and images, separate from the hosting application, thus facilitating upgrades to the Blacklight Engine. You may find that Blacklight provides an excellent starting point for your own Solr/Ruby on Rails development.

Let's go ahead and index information from MusicBrainz.org into Blacklight, just to see how easy it is. Please refer to the sample application in /examples/8/ blacklightopac/blacklight/. Blacklight project is releasing frequent updates, so you should refer to the main web site at http://www.blacklightopac.org/.

Almost all of the dependencies are included in the blacklight sample application. You will need to install a couple of gems:

>>sudo gem install curb
>>sudo gem install bcrypt-ruby

Indexing MusicBrainz data

Blacklight builds on top of the rsolr library for communicating back and forth with the Solr server and adds some concepts around mapping data into Solr. Unlike acts_as_solr, Blacklight doesn't require the source data to be in a database. Instead you build a custom Mapper to fetch the data for Blacklight.

Blacklight requires some synchronization between the Solr and Ruby on Rails sides to make things work. Blacklight expects a search handler called search to be configured, while specifying which schema fields are facets and which are just straight fields of data to be returned. We are going to index various artists and their music releases from the MusicBrainz.org site, while creating facets for the languages, scripts, and types of releases. For example, The Dave Matthews Band's album Under the Table and Dreaming is in English, using the standard Latin script for the album notes, and is an Album. We are going to be indexing many artists from non-Western countries. Fortunately, Solr and Blacklight support alternative character sets such as Cyrillic, Kanji, and Chinese characters. You can see that we are still using conventions for how we name the schema fields, with _t signifying text, and _facet signifying a field for faceting on:

<requestHandler name="search" class="solr.SearchHandler" >
<lst name="defaults">
<str name="fl">id, format_code_t, language_facet, script_facet,
type_facet, releases_t, title_t, score</str>
<str name="facet">on</str>
<str name="facet.mincount">1</str>
<str name="facet.limit">10</str>
<str name="facet.field">language_facet</str>
<str name="facet.field">script_facet</str>
<str name="facet.field">type_facet</str>
</lst>
</requestHandler>

We also need to tell Blacklight through the ./config/solr.xml which facets and fields to display in the UI. We are using the field title_t to store the artist's name:

facet_fields:
- type_facet
- language_facet
- script_facet
index_view_fields:
- title_t
- language_facet
- script_facet
- type_facet
- releases_t

One of the nice features about Blacklight is that it provides an architectural pattern for mapping information from any data source into Solr that you can mimic for your own use. We added ./lib/tasks/brainz.rake to give us the ability to load the information from MusicBrainz by running a simple Rake task: rake app:index: brainz. The Rake task is defined in ./lib/tasks/brainz.rake. The core of the task instantiates a BrainzMapper class (that we developed) that provides a collection of documents related to Artists and their music Releases for Solr to index. In order to reduce memory usage, we index artists alphabetically, while committing the results to Solr periodically:

solr = Blacklight.solr
mapper = BrainzMapper.new
('A' .. 'Z').each do |char|
mapper.from_brainz("#{char}*") do |doc,index|
puts "#{index} -- adding doc w/id : #{doc[:id]} to Solr"
solr.add(doc)
end
puts "Sending commit to Solr..."
solr.commit
end
puts "Complete."

The real magic of the Blacklight mapper pattern is in the BrainzMapper class in ./lib/brainz_mapper.rb. While the class may look a little hairy, it is actually quite simple. The pattern is defined by the base class BlockMapper. BlockMapper expects us to define a series of map methods for each field that we want to store in Solr. For example, to store the artist's name in the previously mentioned title_t field, we define it this way:

map :title_t do |rec,index|
rec[:artist].name
end

T his says that to map the :title_t field, we are handed our record object and the index of that record in our overall collection of records to be stored in Solr. In our case, we have populated the record object as a hash with two keys, :artist and :releases, whose values are an artist and their releases. In the :title_t mapping case, we ask the record hash for the artist object and call the .name() method.

How about a slightly more complex example, mapping all of the releases for an artist:

map :releases_t do |rec, index|
rec[:releases].collect {|release|release.entity.title}.compact.uniq
end

In this case, when we map the releases_t field, we obtain the releases object, which is an array of MusicBrainz::Model::Release objects. From each one we get the title of the release. The resulting array is compacted to remove any nil objects, and then only unique release titles are returned, as sometimes we have multiple releases listed with the same name. Blacklight properly handles storing a single value or an array of values in the releases_t field, as any field ending in _t is specified as multiValued="true" in schema.xml.

Very similar logic is used for mapping our facets as well. In this case, we are using the MusicBrainz::Utils.get_language_name method to translate from three letter language codes like "ENG" to "English" in order to have a prettier display in our facets:

map :language_facet do |rec,index|
rec[:releases].collect {|release| MusicBrainz::Utils.get_language_
name(release.entity.text_language)}.
compact.uniq
end

Okay, we've seen the mapping logic, but where does the data come from? How are we populating the individual record hash object with :artist and :releases values? Web services to the rescue! MusicBrainz has an XML based web service that follows the REST design pattern that you can learn more about at http:// musicbrainz.org/doc/XMLWebService. Even by using the web service directly, you still need to parse and manipulate XML documents. Fortunately, there is the very nice rbrainz Ruby gem available from http://rbrainz.rubyforge.org/ that abstracts away all of the plumbing for communicating with MusicBrainz through XML. Instead, we work with higher level abstractions like Query and Artist objects. In the query below, we are asking for all of the artists similar to Dave Matthews Band, returning records 50 through 100.

require 'rbrainz'
query = MusicBrainz::Webservice::Query.new
results = query.get_artists({:name => 'Dave Matthews Band', :limit =>
50, :offset => 50})

MusicBrainz uses Lucene for its search engine, and it permits you to use Lucene's syntax in your queries. So, to find every band except the Dave Matthews Band we would execute:

results = query.get_artists({:name => 'Dave Matthews NOT Band'})

The method create_records_from_music_brainz(query_string) in ./lib/ brainz_mapper.rb returns a collection of record hashes containing artist and release data downloaded from MusicBrainz through rbrainz.

In order to run Blacklight, first start the included Solr in ./examples/8/ blacklightopac/blacklight/jetty through

>>java -jar start.jar

Then, run the indexing process in ./examples/8/blacklightopac/blacklight/ rails which downloads artists alphabetically from A to Z:

>>rake app:index:brainz

Indexing is very slow due to all of the HTTP requests being made to MusicBrainz web site. Artists are downloaded in batches of 100, with up to 1000 artists per letter, and then each artist requires a separate HTTP request to find their music releases. So indexing a thousand artists for the letter P requires roughly 1010 HTTP queries ((1000 / 100) + 1000). Additionally, you'll notice that the query parameter using just a single alphabetical character, such as D*, leads to somewhat odd matches. Records are only indexed into Solr once all of the artist/release data for a letter is downloaded, so you need to wait for a complete letter to finish. However, soon you will have thousands of artists and their releases in Solr that you can browse through.

Customizing display

The user interface for Blacklight is fairly clean but pretty bland and displays every type of information the same way. However, based on the format_code_t field, you can easily customize the display. If you are indexing records with different types, such as Artists, Record Labels, and so on, then you can have a different display by populating format_code_t differently. We've chosen to just index Artists in this example, and defined :format_code_t to be brainz. As every record indexed uses the same value, we populate the shared_field_data parameter when calling the from_brainz method of the mapper:

mapper.from_brainz("#{char}*", {:format_code_t => 'brainz'})
do |doc,index|
def from_brainz(query_string, shared_field_data={}, &blk)
shared_field_data.each_pair do |k,v|
# map each item in the hash to a solr field
map k.to_sym, v
end

Any values put into the shared_field_data hash will be set on every field. A common use case for the shared_field_data hash is to set an :indexed_by_s property that specifies the name of the user who invoked the indexing process.

Th ere are two ways of customizing the display of fields. One of them is the above mentioned ./config/solr.xml that allows us to filter the list of fields to display on the index page and the details page. However, that is a one-size-fits-all solution and still doesn't let you tweak the actual user interface depending on the data to display. There is another option that leverages the dynamic pathing of Rails to specify that view files should first be loaded from ./app/views, and if not found, then load them from the Blacklight plugin. For example, we created a custom partial, which is to be rendered for the detailed view of an artist that incorporates the MusicBrainz logo and some photos of the artist. By placing the partial in ./app/views/catalog/_ show_partials/_brainz.html.erb, the name of the partial is mapped directly to the format_code_t value of brainz. So, if you indexed multiple entities, then./ app/views/catalog/_show_partials/_artists.html.erb and ./app/views/ catalog/_show_partials/_releases.html.erb map onto format_code_t of artists and releases respectively. Sometimes, you don't want to override Blacklight's UI. For example, we don't have a custom display partial when displaying listings for a search. Blacklight checks for the existence of ./app/views/catalog/_ index_partials/_brainz.html.erb. If it doesn't find that file, then it defaults to the _default.html.erb partial stored in ./vendor/plugins/blacklight/app/ views/catalog/_index_partials/_default.html.erb. This makes it very easy to override the default behaviors of Blacklight without requiring changes to the underlying plugin. This facilitates the upgrade of the plugin, as new Blacklight versions are released.

solr-ruby versus rsolr

For a lower-level client interface to Solr from Ruby environments, there are two libraries duking it out to be the client of choice. In one corner you have solr-ruby, which is the client library officially supported by the Apache Solr project. solr-ruby is fairly widely used, including providing the API to Solr used by the acts_as_solr Rails plugin we looked at previously. The new kid on the block is rsolr, wh ich is a re-imagining of what a proper DSL (Domain Specific Language) would look like for interacting with Solr. rsolr is used by Blacklight OPAC as its interface to Solr. Both of these solutions are solid. However, rsolr is currently gaining more attention, has better documentation, and nice features such as a direct Embedded Solr connection through JRuby. rsolr also has support for using either curb (Ruby bindings to curl, a very fast HTTP library) or the standard Net::HTTP library for the HTTP transport layer.

In order to perform a select using solr-ruby, you would issue:

response = solr.query('washington', {
:start =>0,
:rows=>10
})

In order to perform a select using rsolr, you would issue:

response = solr.select({
:q=>'washington',
:start=>0,
:rows=>10
})

So you can see that doing a basic search is pretty much the same in either library. Differences do crop up more as you dig into the details on parsing and indexing records. Both libraries are evolving, with neither having a dominant position at this point. You can learn more about solr-ruby on the Solr Wiki at http://wiki. apache.org/solr/solr-ruby and learn more about rsolr at http://github.com/ mwmitchell/rsolr/tree.

Summary

In this article we saw the integration options for Solr, from supported client libraries in Ruby.


Further resources on this subject:


Solr 1.4 Enterprise Search Server Enhance your search with faceted navigation, result highlighting, fuzzy queries, ranked scoring, and more
Published: August 2009
eBook Price: $26.99
Book Price: $44.99
See more
Select your format and quantity:

About the Author :


David Smiley

Born to code, David Smiley is a software engineer that’s passionate about search, Lucene, spatial, and open-source. He has a great deal of expertise with Lucene and Solr, which started in 2008 at MITRE. In 2009 as the lead author, he wrote Solr 1.4 Enterprise Search Server, the first book on Solr, published by PACKT. It was updated in 2011, and again for this third edition. After the first book, he developed a one and two-day Solr training courses delivered a half dozen times within MITRE, and he delivered LucidWorks’ training once too. Most of his excitement and energy relating to Lucene is centered on Lucene’s spatial module to include Spatial4j, which he is largely responsible for. He presented his progress on this at Lucene Revolution and other conferences several times. Finally, he currently holds committer / Project Management Committee (PMC) status with the Lucene/Solr open-source project. During all this time, David has staked his career on search, working exclusively on such projects, formerly for MITRE, and now as an independent consultant for various clients. You can reach him at dsmiley@apache.org.

Eric Pugh

Fascinated by the “craft” of software development, Eric Pugh has been involved in the open source world as a developer, committer, and user for the past decade. He is a emeritus member of the Apache Software Foundation.

In biotech, financial services and defense IT, he has helped European and American companies develop coherent strategies for embracing open source software. As a speaker he has advocated the advantages of Agile practices in search/discovery/analytics projects.

Eric became involved in Solr when he submitted the patch SOLR-284 for Parsing Rich Document types such as PDF and MS Office formats that became the single most popular patch as measured by votes! The patch was subsequently cleaned up and enhanced by three other individuals, demonstrating the power of the Free/Open Source Model to build great code collaboratively. SOLR-284 was eventually refactored into Solr Cell.

He blogs at http://www.opensourceconnections.com/blog/.

Books From Packt


jQuery 1.4 Reference Guide
jQuery 1.4 Reference Guide

Nginx HTTP Server
Nginx HTTP Server

Java EE 6 with GlassFish 3 Application Server
Java EE 6 with GlassFish 3 Application Server

Cloning Internet Applications with Ruby
Cloning Internet Applications with Ruby

Plone 3 Intranets
Plone 3 Intranets

Joomla! 1.5 Site Blueprints
Joomla! 1.5 Site Blueprints

Joomla! Social Networking with JomSocial
Joomla! Social Networking with JomSocial

Joomla! 1.5 JavaScript jQuery
Joomla! 1.5 JavaScript jQuery


Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software