logo

Call Us

USA: +1-703-652-8473

UK: +44-114-279-2798

UAE: +971-50-1887848



Archive for the ‘Open Source’ Category

Pitfalls in WordPress Version 2.6.1

Friday, August 21st, 2009

Almost a year back (Aug 15th, ‘08, to be precise), AUTOMATTIC released WordPress 2.6.1 fixing over 60 bugs. Also the version featured with the introduction of ‘right to left’ typing for Hebrew and Farsi language administrators. In a very short period of time (may be around one month), the company alerted 2.6.1 version users of security holes in using the same. Here, in this small article, we are going to analyze those vulnerabilities that made AUTOMATTIC release an upgrade for WordPress version 2.6.1 so soon.

Ok, let’s be clear and to the point. The problem is created by the nature of:

1.    mt_rand () function of PHP and

2.    the truncation method that MySQL adopts

mt_rand ():

PHP has two random number generating functions: rand (), mt_rand (). The former uses GNU C library and the latter uses Mersenne Twister algorithm. Mersenne Twister algorithm was created by Takuji Nishimura and Makoto Matsumoto of Japan. mt_rand () is predominantly used in most of the PHP applications and most importantly, WordPress 2.6.1 uses it.

Normally a seed is used to initiate the generation of random numbers. If it is possible to determine that seed, we will be able to generate the same sequence for any number of times. In other words, we will be able to hack the working of random generation. Seed can be determined using a lookup. Now, once the seed is found, anyone can generate the sequence that the application generates. If you want to know how this is possible, you got to learn random number generation in PHP or there’s an alternative: bow to the fact that it is the nature of mt_rand () function.

Now, make a request for admin password which would send an activation link to the actual admin. But since we have the seed, we will be able to calculate the same activation link by enabling Keep Alive HTTP request.  Activating this link and using a different email ID in the form will allow creation of a new WordPress admin password and thereby complete control.

MySQL Truncation:

Let’s see the next one. When the string input given in a query is longer than the defined maximum length, MySQL, by default, truncates the string to the defined maximum length. For example, if the maximum value of the string column is defined to be 8 then, the input value, “qburst_expressions” will be truncated to “qburst_e”. There will be a warning displayed but, applications are normally not configured to handle those warnings. And importantly, WordPress version 2.6.1 was not.

Suppose I know the WordPress admin name, (let’s say, “godfrey”) and the maximum length of the username in MySQL is set as 32. When I register as a new user with the same name “godfrey”, obviously, MySQL will return an error as there already exists an username godfrey. Now, I try with “godfrey   “(with 2 spaces after the name), MySQL will truncate the string to “godfrey” and again return an error due to the same reason. Suppose I try with “godfrey                         g” (with 25 spaces between godfrey and g) then MySQL will not be able to identify a similar username and also truncate the name to “godfrey” to be inserted into the database column. This happen because the username exceeds the defined maximum length of 32 and the system will not be able to find a match in the database. Now we have 2 admin usernames in the table. This is sufficient to pass the validation and gain access to the password of the original admin, thereby complete control.

Username Length Max Length After Truncation Database Change

“godfrey”

7

32

“godfrey”

No change

“godfrey  “

9

32

“godfrey”

No change

“godfrey                         g”

33

32

“godfrey”

Truncated string (godfrey)  inserted as new username into DB

These holes in security made AUTOMATTIC to work on an upgrade at the earliest. And the next release fixed all these errors. So if you are planning to use WordPress, make sure you use the latest version and remain safe. WordPress 2.8.4 is available for download now. It is the latest stable version of WordPress according to the AUTOMATTIC’s last release.

Google Wave 2- The Platform

Tuesday, July 21st, 2009

Articles on Google waves is flooding the web, trying to bring out a deeper understanding on this wave renaissance. There is so much of expectation generated now as people are anxiously looking forward to get their hands on it. With the little information revealed by Google, let us try to figure out something more on how this is going to work. In Google wave 1 we discussed about Google waves as a product. This time let us view Google waves in the perspective of a developer, that is, Google waves as a platform.

What is a platform?

Platform in software realms can be understood as an entity on which software can be made to function. A platform provider will provide APIs (Application Programming Interface) for software to be developed in his platform. Let’s take a few examples: Java, the product of Sun Microsystems serves as a platform and it comes with APIs like AWT, JDBC, JMF and so on. These APIs are also provided by Sun Microsystems. Apple Inc, owner of iphone had APIs confidential until October 2008 when the company open sourced and made it license free to develop software applications to be run on iphone. Lately, there is facebook API which is both powerful and popular.

What about Google API?

Google has promised to come up with a public API which can be used by any developer to create applications that run on the wave platform. There are 2 ways by which a developer can make his presence felt in Google waves. The first method is by building robots or creating gadgets. The other method is by embedding waves on third party websites. Let’s try to get some insight on these new terminologies.

Robots, Gadgets and Embed API

Robots are automated participants in a wave. Remember the robot in ‘Lost in Space’. It is a similar kind of simulation except that these robots will function inside the computer. A robot created inside a wave will be able to read, modify and delete blips and wavelets. A wavelet is a smaller wave that is resident inside a wave and a blip resides inside a wavelet. The diagram below will give you better picture.

The developer can create robots and perform interactive operations within a wave. What are the interactive operations? Well, that is left to the creativity of the developer. Learn more about robots here. Wave Gadgets are similar to the ordinary gadgets in its mechanism to get embedded as third party development applications. But there is more offered. A wave gadget can function within a live wave. An example Google gives to explain this is one which lets participants of a wave to vote on where to go for lunch. Learn more about gadgets here.

The second method using Embed API enables to bring waves into third party websites. There will be simultaneous updates in websites as and when an update is made inside a wave. Google has already come up with a few embeds. ‘You tube playlist discuss’ is one among them and is sure to gain so much popularity.  Learn more about embed APIs here.

As Facebook is dominating now with so much integration, it is certain that we can expect even more from Google waves. So if you are a developer, be informed about what is going on in Google waves and get ready to play with the tools as soon as you get them.

Links for further study:

http://code.google.com/apis/wave/

http://googlewavedev.blogspot.com/

Microsoft Ready for Google’s Challenge, Forays into ‘Online Office’

Tuesday, July 14th, 2009

In response to Google Chrome OS, Microsoft has announced that the new version of MS Office, which is expected to hit markets by 2010, will feature online collaboration. This dramatic announcement was made at the partner conference in New Orleans.

The new generation office suite will enable users to access their documents online with co-authoring capabilities. PowerPoint will be streamlined with video and picture capabilities which will revolutionize presentations.

Though Microsoft is coming up with online capabilities for Office, they don’t have the intention to provide comprehensive online access, which they think can scale down their business. This won’t be a great concern for Google Docs, as they are providing comprehensive access to users. Google considers it as a weaker reply for the Google Chrome OS, which is the core of Microsoft’s business.

Watch out for Google Chrome OS

Friday, July 10th, 2009

In its endeavor to be the leader on the software space, Google Inc has announced its foray into the manufacture of Operating System, with its maiden project named ‘Google Chrome OS‘. Google has already locked its horns with Microsoft on numerous projects and the present one will intensify the competition. Being the 90% market shareholder of the OS market, it will be interesting to see how Microsoft reacts to this concern. Since Google believes on Open Source concept, if the Chrome OS project is rolled out successfully, then it will revolutionize the entire PC, Laptop and OS markets.

In its official blog, Google explains more about Chrome OS, which aims the Netbook market initially. Google Chrome OS is expected to hit the market by the second half of 2010.

HTML 5.0 – A glance at new elements

Thursday, June 11th, 2009

WhatWG (Web Hypertext Application Technology Working Group) was formed in 2004 with focus on HTML and APIs for web applications.  Specification document for HTML 5.0 is in progress.  The document gets updated on a regular basis.  Check out the document at http://dev.w3.org/html5/spec/Overview.html.  Getting our head into the document is tedious and cannot be made to fit into one page.  So here we will glance over a few new elements to get a picture of how HTML 5.0 is going to be.

Div Element

Header, footer, nav, aside, article and section are new elements that will replace div.  The complexities of div have paved way for these elements.  Instead of having so many div tags inside the code, HTML 5 gives the capability to use separate element for each purpose.  During modifications, identifying a particular portion thus becomes easy. These two snapshots will give an idea of how the simplification is going to work.

Audio Video Elements

Recently, audio and video have mass migrated to Internet.  HTML 5 provides the ability to treat audio and video as web pages without the need for plug-ins to play them. That is, audio and video will be natively supported by the HTML 5 compliant browsers.  The debate on whether to use a standard format or to support all formats is still on.  These elements are expected to contain textual content for every video, audio brought in the web page.  Such a provision will enable information to be conveyed through non-supportive browsers.  Internet users with debilities will also have the accessibility to web content.  Here is a lookup.

<audio src=”Martinluther.mp3″>

<p>I am happy to join with you today in what will go down in history as the greatest demonstration for freedom in the history of our nation.</p>

…</audio>

Few More Elements

Time element will help browsers,  search engines and web crawlers identify time from web pages.  Images are brought through the figure element.  Captions of the image are always associated with the image.  This will allow the user agents to understand more about the image.  Dialog is a another new element and it comes up with 2 sub tags: dt, dd.  dt will indicate the speaker and dd will indicate the dialog.  Here is an example:

<dialog>

<dt>Fay</dt>

<dd>Jerry, could you show me how to hold the racket?</dd>

<dt>Jerry</dt>

<dd>Sure Fay, it’s just like shaking hands. Hold your hand out as though you were going to shake my hand… </dd>

<dt>Fay</dt>

<dd>Do you mean like this?</dd>

<dt>Jerry</dt>

<dd>Right, like that. Then put the racket in your hand, like this. </dd>

</dialog>

There is more in HTML 5.  Seeing by the way developers are contributing to its specification, we can sure expect fascinating behaviors in web pages soon. Most importantly, you can contribute too. Here’s how:

Subscribe to the WhatWG mailing list: http://www.whatwg.org/mailing-list

Participate in discussions: http://forums.whatwg.org/

Comment and post blogs: http://blog.whatwg.org/

Links to articles on HTML 5:

http://radar.oreilly.com/2009/05/google-bets-big-on-html-5.html

http://www.webmonkey.com/blog/How_HTML_5_Is_Already_Changing_the_Web

Google Wave – 1

Tuesday, June 9th, 2009

Why do we have to live with divides between different types of communication – email versus chat, or conversations versus documents?

Could a single communications model span all or most of the systems in use on the web today,  in one smooth continuum? How simple could we make it?

What if we tried designing a communications system that took advantage of computers’ current abilities, rather than imitating non-electronic forms?

Tough questions! These questions have paved way for Google Wave. Jens Rasmussen and Lars Rasmussen wrestled over these questions since 2004. These geeks were the inventors of Google Maps and now they are ready to unleash Google Wave into the Internet. Google Wave comes in 3 layers. The product, platform and the protocol. Here, we will discuss Google Wave as a product.

Google describes Wave as “Equal parts conversation and document”. It is the next generation of e-mail. A Wave contains a complete thread of message saved in a common server. When this Wave is shared with other users, they can also get into edit mode. The interesting feature is that when a person is editing the wave, others will be able to see the edit process almost letter by letter. So this means that all of them will be able to collaborate in a wave almost instantly. Waves come with a rich text editor and several other functions that will enable the users to work on text, images, videos, maps and many more. Whenever a change is made on a wave, all the collaborators are notified. The complete history is stored within the wave.

Here is a screenshot provided by Google that gives us a first look.


Waves can therefore serve as e-mail and chat. It will work similar to wikis. The next layer ‘platform’ provides various APIs enabling waves to become a place where a group of people can work together to prepare documents, plan events, hold discussions, play games, etc. We will discuss about them in the next section.

Drupal – An Overview

Monday, June 8th, 2009

Drupal is one of the most popular content management system (CMS) used in web development. It is also called content management framework for it enables developers to extend and implement custom content management solutions. Drupal is written in PHP with MySQL as backend.

With Drupal, it is possible to develop and manage blogs, websites, portals, forums, e-commerce sites, social networking sites and many more. A few examples of popular websites developed using Drupal are www.labs.sonyericsson.com, www.jacksonville.com, www.nysenate.gov.

CMS like Joomla, Plone, Wordpress are also existent in the market but the features available in the core Drupal and its extendibility makes Drupal stand in front of its competitors. SEO is better achieved through Drupal. It also provides a number of themes and modules to choose from. Integration of various technologies with Drupal extends its capability further. Apache Solr integration is a recent accomplishment. It is done through the Apache Solr Integration module.

Drupal administration has four main components. Content management enables to manage the website content. Site building controls look and feel of the site. Custom modules and themes help extend the ability of Drupal by not restricting to the available options in core module. Roles and permissions are created in the user management section for managing access rights to different users.

The Drupal presentation is available on Slideshare.

Apache Solr Integration with Drupal

Tuesday, June 2nd, 2009

Earlier, search did not have a high priority in the sites that were developed using Drupal. Analysis reveals that the slowness and lack of smartness of the search feature have made the users loose their trust on search. The integration of Drupal with Apache Solr is changing the entire scenario now. Here in this article, I am going to give you a snapshot of this revolution.

What is Solr?

Lucene as we know, is a search engine library for enabling text-based search and is written in Java. Solr is a search server developed based on Lucene. It is easy to install and configure and it comes with an HTTP-based administration interface. Documents are first indexed through XML over HTTP. Queries are sent through HTTP GET method and search results are received in XML.

What makes Solr stand in front?

  • Faceting
  • Spell checking
  • Highlighting
  • Caching
  • Replication
  • Open Source

There are two types of search mechanisms used by dominant search engines. Navigational search uses a hierarchy structure (taxonomy). This mechanism is used by Yahoo directory, DMOZ, etc. Google, Yahoo search and other popular search engines use direct search. Both these have their own benefits and drawbacks. Recently the direct method is gaining more recognition and is evident from the growth of Internet dominance by Google and Yahoo search engines.

Faceted search is a new mechanism and it combines both the above techniques. It allows users to navigate multi dimensionally with a pool of words. Here is an illustration that contrasts faceted searching with taxonomical searching.

Lets move on to the other features. Spell checking: With this feature, the user can get search results for a given query and also get spelling suggestions at the same time. This is similar to the ‘Did you mean’ in google. The SpellCheckComponent that forms a part of Solr is designed to provide this inline spell checking of queries.

Solr provides a set of highlighting utilities with which it highlights the location of the query terms in the text of the search results. Solr caches are associated with an Index Searcher. Any item in the cache will be valid and available for reuse as long as that Index Searcher is being used.  Solr cached objects will not expire after a certain period of time and the cached objects will be valid as long as the Index Searcher is valid.

Apache Solr Project

Apache Solr Search Integration is a module that integrates Drupal with a Solr server for searching. Solr can be used as a replacement for core content search that already comes with Drupal. The module comes with schema.xml and solrconfig.xml which requires configuration. This module makes all the features of Solr available in Drupal for the development of the new site. A few websites that have currently implemented Solr using this project are AOL, Drupal.org, Netflix, CNET, CitySearch and GameSpot.

Links for further study

http://lucene.apache.org/solr/

http://drupal.org/project/apachesolr

http://www.ibm.com/developerworks/java/library/j-solr1/#ibm-pcon

http://www.ibm.com/developerworks/java/library/j-solr2/#resources

MySQL Replication

Thursday, May 7th, 2009

Replication has now become an essential feature for most MySQL users. The good news I can share at the same time is that the working and implementation of this concept is also less complicated. It involves a minimum of 2 servers: a master and a slave (in most cases). The slave makes use of the binary logs created by the master to update its database thereby keeping both of them in exact synchronization.

Issues Leading to the Need of Replication

Heavy Load:

Lets consider a website with an exponentially increasing number of users regularly. There will arrive a state in which the single database server could no longer handle the load anymore. If the server receives more number of read queries rather than write queries (which will be the normal case for most of the websites) then, the best choice will be to adopt replication into the current architecture. Here, the read queries refer to SELECT statements and the write queries refer to INSERT, UPDATE and DELETE statements.

Now I am going to explain how replication solves the issue of heavy load. When the concept of replication is implemented we will be having more than one server. Among these servers, the one named as master will receive queries related to write and make changes to its database immediately. Consequently, when the binary log is updated, the slaves update their database reading from the log files. The slaves on the other hand receive all read queries. Depending on the number of queries received, the number of slaves can be increased or decreased. Now by using any scheduling algorithm (Round Robin is an example), we can effectively load balance the incoming read queries to different slave servers so that all of them get equal workloads.

Backup: Anytime and Without Client Disturbance

During backup requirements, we normally stop MySQL or lock the read queries to get an exact backup. This may sometimes result in the annoyance of the clients who access the website during the process. Although there are a few clever techniques with which you may do this without the notice of the clients, things become very simple with replication.

The slaves always remain in exact synchronization with the master. In other words, the slaves will have another copy of the entire repository that the master processes. And hence backing up of a slave is similar and as good as backing up of the master. Also the presence of slaves as exact replicas will in most cases help avoid the need for backup of the master. This is because we always have the slaves as a spare in case of any misfortunes to the master.

Distribution of Data Without Respect to Distance:

Next issue I am to focus on deals with distributing copies of data in various locations that are geographically very apart which is not a trivial task. But the replication factor gives the flexibility that we require to make it trivial.

The master provides no errors even if the slave remains disconnected for some time. So in spite of the poor connection and other factors that may influence the link between the different destinations, a synchronized copy of the master can be made to exist in a geographically distant region.

Architectures of Replication

There are a few rules that I recommend to be kept in mind to better understand the different architectures.

  • There needs to be a unique server ID for every slave
  • There can be many slaves for a master
  • There can be only one master for a slave
  • Slaves can also function as masters

Master: Slave

This architecture best suits an environment, which has low number of write queries and high number of read queries. Effective load balance can be achieved by spreading the workload among the different servers. Here is an illustration.

Dual Master:

This kind of architecture is particularly useful when servers are geographically far apart. Although during interruptions, neither will have access to their data both will catch up from each other when the connection is reestablished. An extension of this architecture will be to have a slave on either side that is also diagrammatically shown below.

.

Pyramid:

In a large organization where there is diverse distribution in a hierarchical manner, a possible architecture like this will be the best suit. There is no necessity to configure every slave with the master as the slave above in the hierarchy can act as their master.

Although replication solves problems, it demands so much precision, which if not taken care of, can even result in the crash of the master database. Slaves are not always in synchronized state with their master. But with proper monitoring systems this can be detected. The concept of replication is provided by MySQL and can definitely improve overall performance if dealt with proper caution.

Google Launches API for Google Analytics

Wednesday, April 22nd, 2009

When it comes to web then it’s all the way Google. Right from search engine to browser to email, they are now providing with API for their Analytics feature. Google Analytics is their free service for tracking and analyzing website traffic and usage.

It was a much-awaited release from Google. “Large organizations and agencies now have a standardized platform for integrating Analytics data with their own business data,” says Google’s Nick Mihailovski.

Google was blamed of storing the browsing history of a user. Now they are willing to share this data. So it is really an interesting move from Google.

One can access Aoogle Analytics from phone using Android application and from desktop using Desktop-Reporting. For more details about how the API works, you can read this blog post from Google about it.