Project Overview
Our client conducts market analysis for products by pulling posts/comments from social media networks such as Facebook, Twitter, YouTube, and Instagram. These analyses help companies evaluate how well their products are received on social media. Leveraging technologies such as ElasticSearch (for superior text search indexing) and Sencha Ext JS library (for agile development), we delivered a custom solution to the client. The application developed was visually appealing, interactive, fast, and one that could scale easily to accommodate the ever-growing dataset. QBurst was granted full product ownership for this solution.
Business Requirement
Prior to engaging QBurst, the client used an application that retrieved information from social networks into their local PostgreSQL database. Their analysts worked on this data and generated reports to forecast market trends of various products. The client wanted QBurst to improvise this application by:
-
Incorporating visualization capabilities
-
Live display of tweets in a map as and when they appear
-
Interactive graph highlighting retweets and shares as they appear in social media networks
-
Improving application speed
-
Incorporating scalability to accommodate the increasing dataset
The main challenge for QBurst was to figure out how to make the application faster and accommodate the ever growing dataset. We had to explore various JavaScript frameworks to choose the best that could incorporate changes easily and fast.
Our Solution
Primary study of the client’s system showed that they were using a relational database and doing heavy joins to fetch data. The growing database would eventually increase response time. We recommended moving to NoSQL for the following reasons:
-
Joins can be avoided.
-
Large amount of data can be stored.
-
Results can be queried with faster response time.
-
Transaction success (for example, missing of few tweets during insertion) would not be a factor in the final report as the application is more of a statistical data analysis.
-
Fewer insertions/updates into the database while allowing for greater permutation of selections on the dataset.
-
With increase in data, nodes can be increased to distribute data across machines.
MongoDB was initially preferred as the NoSQL provider as it had a good API and support system on the web. During the second iteration however, ElasticSearch was chosen over MongoDB due to the following reasons:
-
Superior text search indexing compared to MongoDB
-
Greater query optimization allowing selection and grouping in the same query
-
Extensive language supported indexing when compared to MongoDB (Example: A tweet in Arabic can be indexed in Arabic and will fetch results accordingly on search)
-
Better geolocation queries (searches all tweets within 200 km radius of given city)
-
Faster than MongoDB
-
Easily distributed over nodes (across machine instances)
The client wanted graphs, such as force-directed graphs and tree circulant graphs, which are not available in a normal charting library. We chose D3 because of its superior API and cost effectiveness. To display tweets on world map, we chose Leaflet due to its rich user interface (UI) and plug-in support. The Leaflet UI also supports large number of points on the map without time lag.
Sencha Ext JS was chosen as the JavaScript framework as it offered a host of advantages such as:
-
Cross browser compatibility
-
Rich UI library
-
MVC (Model-View-Controller) framework for JavaScript
-
Dynamic JavaScript loading
-
Good forum support
-
Well-written API
-
Easy integration with the other JS libraries such as D3, Leaflet, and jQuery