Apache Solr Integration with Drupal

Posted by Godfrey Wilson on Tuesday, June 2nd, 2009

Earlier, search did not have a high priority in the sites that were developed using Drupal. Analysis reveals that the slowness and lack of smartness of the search feature have made the users loose their trust on search. The integration of Drupal with Apache Solr is changing the entire scenario now. Here in this article, I am going to give you a snapshot of this revolution.

What is Solr?

Lucene as we know, is a search engine library for enabling text-based search and is written in Java. Solr is a search server developed based on Lucene. It is easy to install and configure and it comes with an HTTP-based administration interface. Documents are first indexed through XML over HTTP. Queries are sent through HTTP GET method and search results are received in XML.

What makes Solr stand in front?

  • Faceting
  • Spell checking
  • Highlighting
  • Caching
  • Replication
  • Open Source

There are two types of search mechanisms used by dominant search engines. Navigational search uses a hierarchy structure (taxonomy). This mechanism is used by Yahoo directory, DMOZ, etc. Google, Yahoo search and other popular search engines use direct search. Both these have their own benefits and drawbacks. Recently the direct method is gaining more recognition and is evident from the growth of Internet dominance by Google and Yahoo search engines.

Faceted search is a new mechanism and it combines both the above techniques. It allows users to navigate multi dimensionally with a pool of words. Here is an illustration that contrasts faceted searching with taxonomical searching.

Lets move on to the other features. Spell checking: With this feature, the user can get search results for a given query and also get spelling suggestions at the same time. This is similar to the ‘Did you mean’ in google. The SpellCheckComponent that forms a part of Solr is designed to provide this inline spell checking of queries.

Solr provides a set of highlighting utilities with which it highlights the location of the query terms in the text of the search results. Solr caches are associated with an Index Searcher. Any item in the cache will be valid and available for reuse as long as that Index Searcher is being used.  Solr cached objects will not expire after a certain period of time and the cached objects will be valid as long as the Index Searcher is valid.

Apache Solr Project

Apache Solr Search Integration is a module that integrates Drupal with a Solr server for searching. Solr can be used as a replacement for core content search that already comes with Drupal. The module comes with schema.xml and solrconfig.xml which requires configuration. This module makes all the features of Solr available in Drupal for the development of the new site. A few websites that have currently implemented Solr using this project are AOL, Drupal.org, Netflix, CNET, CitySearch and GameSpot.

Links for further study

http://lucene.apache.org/solr/

http://drupal.org/project/apachesolr

http://www.ibm.com/developerworks/java/library/j-solr1/#ibm-pcon

http://www.ibm.com/developerworks/java/library/j-solr2/#resources

Tags: , , , , , , , ,

8 Responses to “Apache Solr Integration with Drupal”

  1. Jacob Singh says:

    Thanks for this write-up, it’s really nice to see the integration taking off. One other thing not mentioned here is that Solr provides really good content recommendation. So when viewing a node, it will search solr for other nodes related by common facets / text and show them. We (at Acquia) have found this to be one of the most popular features in our hosted Apache Solr / Drupal service.

  2. Jacob Singh says:

    Sorry, wrong link earlier:
    http://acquia.com/products-services/acquia-search

    Is our search service. We also maintain the Apache Solr module for the general public who wants to setup the Java stuff themselves.

  3. Goddy Wils says:

    Jacob, Thanks for adding to the content. Your presentation at Drupalcon was awesome.

  4. We’ve got Solr up and running well in the cloud and have several of our Drupal sites sharing this instance using this module: http://drupal.org/project/apachesolr. It was a little tricking setting up given compatability differences of the environments, but well worth it. As Jacob noted above, the content recommendation from Solr is proving to be an excellent feature. Thanks for all the hard work.

  5. LnddMiles says:

    The best information i have found exactly here. Keep going Thank you

  6. Extenze says:

    Thank you for your help!

  7. Vinay Tadav says:

    We have been running thebigjobs.com for sometime using the default Drupal search. Recently we shifted to Apache Solr – and performance and results are cool. ApacheSolr module helped a lot. Documentation at Drupal.org were great for multiple instance setup.

  8. Sushil says:

    I am using apache solr search for one of my drupal based website. I want to filter the search result so that the nodes appearing in the result must have unique values for a particular CCK field.
    e.g If 4 node have the same value for a cck field but their content matches in a search then only one node should be shown in the search result. Some thing similar to distinct selection of rows on certain field in database. Can you please help me, how should i alter the solar query to do it.

Post a Comment