Introduction to Pebal: Persistent BigData Algorithm Library

 In Big Data

Discovering trends and patterns in seemingly uncorrelated data can equip businesses with the competitive edge needed to add value and differentiate an enterprise. As the amount of data increases, there is a growing need to quickly analyze varied and vast data for meaningful insights.

There is plethora of information in various forms such as web logs, email and more. In addition to its size, the unstructured nature of the data creates problems. With Big Data technologies, unstructured data generated from a various sources can now be analyzed. Big Data technologies tackle issues related to the variety, volume, and velocity of data. The challenge lies in the necessary rapid development of tools capable analyzing data as per business requirements. In terms of analytics, business requirements vary with the industry, but there are specific set of functions and algorithms which are primarily required to build any Big Data solution.

Having worked on several Big Data projects area across diverse applications such as search engines, recommendation engines, and email analytics, we at Persistent Systems found that these varied applications have common underlying structure. Social networks, transport networks, and the web are essentially some sort of graphs. There is an ardent need for functions on these graphs; finding shortest distance between two cities or finding how people are linked in social networks. Similarly, email analytics, sentiment analysis, or search engines need their text to be extracted. Instead of reinventing the wheel every time, we thought it would be more useful to have a ready made library of functions, which can be utilized for Big Data application development across diverse domains. Hence, we built the Persistent Big Data Analytics Library (Pebal)

Persistent Big Data Analytics Library is a library of commonly required functions which can be utilized to build Hadoop based Big Data solutions. Several functions have been identified across five major algorithm areas i.e. Graphs, Sets, Indexing, Text Analytics and Web Analytics.

For instance, typical requirements of text analytics on email data are masking personal information or extracting entities from documents. Pebal provides high performance, easy to use algorithms for several commonly required functionalities. Like an STL library for C++ development, Pebal functions significantly to reduce time for development and deployment of solutions in Big Data world. Pebal functions are generic, easy to learn and use, and have been tested on large data sets. These functions have schema on read paradigm followed by Hadoop, are schema agnostic, and use JSON formats for data and schema.

For more information please visit Pebal on our website keep watching the Big Data category on Persistent Systems blog for more insights on Hadoop and Big Data.

Recommended Posts

Start typing and press Enter to search