Posts Tagged ‘Search’
The Art of Googling!
Friday, July 17th, 2009For most of us finding information on the Internet is synonymous with going to Google.com, typing in a word or phrase and clicking search. In fact Google does account for a major share of the search engine market and with good reason too. check this out if you are still in doubt.
Thus despite new players coming up, Google still remains the leader in information search on the web. That is exactly why it makes sense to understand and develop efficient googling techniques. Mentioned below are few tips which, when practiced while searching the web using Google, will save time and improve search results. (more…)
SEO Checklist
Wednesday, June 24th, 2009While there is no doubt that good SEO consultants can help drive more traffic to your site, many small businesses cannot afford a good consultant. But you don’t need to despair if you can’t afford an expensive SEO consultant. If you are one of those DIY type business owners, this article will help you create a fairly decent search-engine optimized site. Even if you plan to use a web design/development agency and not do it yourself, you can demand that they create a site that complies with basic SEO tenets.
Here is how to go about placing yourself on the right side of search engines:
- Keyword analysis – This should be done before you start building your site. If you already have a site, you may have to tweak your content based on the results of this analysis. Know what keywords are used by your customers to find you. This may not be industry jargon words. A good tool to start with is Google’s Keyword Tool. You need to identify the keywords or phrases that have high volume but less competition. Once you identify the keywords for a page, mention it a few times on that page. Do not over-stuff your page with keywords. Search engines penalize keyword stuffing. Write naturally, but don’t forget to repeat your keywords a few times.
- Make sure every page on your site has a proper title tag, meta keywords and meta descriptions. Again, there is no need to repeat your keywords too many times, but your keyword should be there on the title tag, as it’s the most important tag from a search-engine perspective.
- Search engine friendly URLs (SEF). You need to have meaningfully named URLs that accurately describe the page content. Example: www.example.com/camera/dslr/nikon/D5000 is better than www.example.com?product_id=123. Carefully choose your URL names and structure. Now, how do you create search-engine friendly URLs? You can provide URL rewriting rules in .htaccess if you are using Apache; but it is cumbersome to manage. Many content-management systems like Drupal and WordPress support SEF, so if you are using these, you’ve got yourself covered. Most web app development frameworks like Symfony also support SEF.
- Provide textual description for all non-text elements like images, audio and video. For example, use alt tag with images. This will help the search engine better understand your multi-media content. This has the added benefit of making your site accessible.
- Search engine bots should be able to spider all your content even if the content resides in a database and are dynamically displayed. For example, your products may be sitting in your product catalog table in a database, but should create a static looking page for each product.
- Make effective use of heading tags like h1 and h2 to showcase the relative importance of text. Your important text should be text and not images.
- Use ordered lists for creating menus rather than using tables.
- The anchor text (hyperlink to another page) should contain keywords that describe the target page. Instead of writing “Click here for D5000 details”, it’s better if you write “check out the D5000 digital SLR camera“.
- Avoid duplicate content issues. If example.com, www.example.com and www.example.com/index.php all point to the same page, you should consider one of them as the primary URL. If you designate www.example.com as your primary or canonical URL, then the other URLs should be permanently redirected to the canonical URL. You can redirect by using the HTTP 301 code. Also consider storing the session id or affiliate parameter in a cookie and then redirect the URL with parameters to the canonical version.
- Never copy-and-paste content from other sites. You may be violating copyright laws and incurring duplicate content penalty. Likewise, if you are getting your content from a syndication service, check that the same content is not syndicated to other sites. Do a Google search on your content and if you find that your content has been copied by someone else, file a DMCA request with Google.
- What if you have multiple top-level domains? Like example.com and example.net? If you plan to have identical content on all these sites, do a permanent redirect to your primary domain.
- Multiple language versions of your site – I would say use a different sub-domain for each language. Example: fr.example.com for French and de.example.com for German. Using the same URL for different language versions is not a good idea.
- Block search engines from seeing admin panels, HTTPS content etc by using the robots exclusion protocol. Password protect those pages you don’t want the outside world to see.
- How do you know if Google has indexed all your pages? Search for site:example.com on Google. It will return the number of pages indexed.
- Externalize CSS and Javascript.
- Follow XHTML 1.0 strict standard.
- Reduce the amount of code in your page, and maintain a good content-to-code ratio.
- Speed is important. Your pages should load fast and should not timeout.
- Use microformats to describe your data.
- Last but not the least, build quality in-bound links.
Some of the above items need further explanation. However, there is a wealth of information available in blogs and online articles. So start digging and learn more on this interesting topic.
Apache Solr Integration with Drupal
Tuesday, June 2nd, 2009Earlier, search did not have a high priority in the sites that were developed using Drupal. Analysis reveals that the slowness and lack of smartness of the search feature have made the users loose their trust on search. The integration of Drupal with Apache Solr is changing the entire scenario now. Here in this article, I am going to give you a snapshot of this revolution.
What is Solr?
Lucene as we know, is a search engine library for enabling text-based search and is written in Java. Solr is a search server developed based on Lucene. It is easy to install and configure and it comes with an HTTP-based administration interface. Documents are first indexed through XML over HTTP. Queries are sent through HTTP GET method and search results are received in XML.
What makes Solr stand in front?
- Faceting
- Spell checking
- Highlighting
- Caching
- Replication
- Open Source
There are two types of search mechanisms used by dominant search engines. Navigational search uses a hierarchy structure (taxonomy). This mechanism is used by Yahoo directory, DMOZ, etc. Google, Yahoo search and other popular search engines use direct search. Both these have their own benefits and drawbacks. Recently the direct method is gaining more recognition and is evident from the growth of Internet dominance by Google and Yahoo search engines.
Faceted search is a new mechanism and it combines both the above techniques. It allows users to navigate multi dimensionally with a pool of words. Here is an illustration that contrasts faceted searching with taxonomical searching.

Lets move on to the other features. Spell checking: With this feature, the user can get search results for a given query and also get spelling suggestions at the same time. This is similar to the ‘Did you mean’ in google. The SpellCheckComponent that forms a part of Solr is designed to provide this inline spell checking of queries.
Solr provides a set of highlighting utilities with which it highlights the location of the query terms in the text of the search results. Solr caches are associated with an Index Searcher. Any item in the cache will be valid and available for reuse as long as that Index Searcher is being used. Solr cached objects will not expire after a certain period of time and the cached objects will be valid as long as the Index Searcher is valid.
Apache Solr Project
Apache Solr Search Integration is a module that integrates Drupal with a Solr server for searching. Solr can be used as a replacement for core content search that already comes with Drupal. The module comes with schema.xml and solrconfig.xml which requires configuration. This module makes all the features of Solr available in Drupal for the development of the new site. A few websites that have currently implemented Solr using this project are AOL, Drupal.org, Netflix, CNET, CitySearch and GameSpot.
Links for further study
http://lucene.apache.org/solr/
http://drupal.org/project/apachesolr
http://www.ibm.com/developerworks/java/library/j-solr1/#ibm-pcon
http://www.ibm.com/developerworks/java/library/j-solr2/#resources