Posts Tagged ‘Apache’
Apache Solr Integration with Drupal
Tuesday, June 2nd, 2009Earlier, search did not have a high priority in the sites that were developed using Drupal. Analysis reveals that the slowness and lack of smartness of the search feature have made the users loose their trust on search. The integration of Drupal with Apache Solr is changing the entire scenario now. Here in this article, I am going to give you a snapshot of this revolution.
What is Solr?
Lucene as we know, is a search engine library for enabling text-based search and is written in Java. Solr is a search server developed based on Lucene. It is easy to install and configure and it comes with an HTTP-based administration interface. Documents are first indexed through XML over HTTP. Queries are sent through HTTP GET method and search results are received in XML.
What makes Solr stand in front?
- Faceting
- Spell checking
- Highlighting
- Caching
- Replication
- Open Source
There are two types of search mechanisms used by dominant search engines. Navigational search uses a hierarchy structure (taxonomy). This mechanism is used by Yahoo directory, DMOZ, etc. Google, Yahoo search and other popular search engines use direct search. Both these have their own benefits and drawbacks. Recently the direct method is gaining more recognition and is evident from the growth of Internet dominance by Google and Yahoo search engines.
Faceted search is a new mechanism and it combines both the above techniques. It allows users to navigate multi dimensionally with a pool of words. Here is an illustration that contrasts faceted searching with taxonomical searching.

Lets move on to the other features. Spell checking: With this feature, the user can get search results for a given query and also get spelling suggestions at the same time. This is similar to the ‘Did you mean’ in google. The SpellCheckComponent that forms a part of Solr is designed to provide this inline spell checking of queries.
Solr provides a set of highlighting utilities with which it highlights the location of the query terms in the text of the search results. Solr caches are associated with an Index Searcher. Any item in the cache will be valid and available for reuse as long as that Index Searcher is being used. Solr cached objects will not expire after a certain period of time and the cached objects will be valid as long as the Index Searcher is valid.
Apache Solr Project
Apache Solr Search Integration is a module that integrates Drupal with a Solr server for searching. Solr can be used as a replacement for core content search that already comes with Drupal. The module comes with schema.xml and solrconfig.xml which requires configuration. This module makes all the features of Solr available in Drupal for the development of the new site. A few websites that have currently implemented Solr using this project are AOL, Drupal.org, Netflix, CNET, CitySearch and GameSpot.
Links for further study
http://lucene.apache.org/solr/
http://drupal.org/project/apachesolr
http://www.ibm.com/developerworks/java/library/j-solr1/#ibm-pcon
http://www.ibm.com/developerworks/java/library/j-solr2/#resources