Our daily lives are becoming increasingly digitized as technology continues to advance at a very high speed. From twitter feeds to sensor data, companies are drowned in big data and yet somehow thriving for the actionable information.
Table of Contents
The point is that for many companies their ability to collect the data has far surpassed their ability to organize the data for analysis and action and almost everybody is frustrated with the traditional process which requires some series of steps before the data can be used for analysis. Relational databases have served many businesses well to a point where the structure of data was known in advance but these relational databases are unable to keep up with the rapidly evolving variety and format of data.
All in all traditional and legacy databases are not agile enough to meet the needs of most organizations today.
Hadoop is the mainstream technology for the purpose of storing and processing huge amounts of data at low cost but today, how much data can process is not important the focus point has shifted to data agility, which means how fast the values can be extracted from the data and can be converted to action.
Executives want the teams to focus on business impact and not on how they should they process the data or analyze the data and this concept is not limited to Big Data only but can also be applied to risk management, marketing campaigns, supply chain etc.
A beforehand schema is the requirement of traditional databases and this is coupled with the time in which data is entered into the database. This process cannot be considered agile. In other extreme cases, there are situations in which DBA must perform some complicated processes such as dropping the foreign key, exploring the data, altering the table designs and in some cases even reloading the data. Basically, a defined schema is a must before the user can ask his/her very first question.
New data exploration technologies are being developed and Apache Drill is one of them. It is a low-latency SQL query engine for Hadoop and SQL which can query across data sources. It has the ability to handle flat fixed schemas and is built for semi-structured and nested data. The drill is extremely important for businesses as it is helping in shortening cycle times for data processing.
Above all it implements the schema on the fly which means that when new data arrives nothing has to be done to process the data with the drill. Even DBAs are not required to maintain the schema design.
Related: Security Considerations for Big Data
Data agility is important and should be an important aspect of any big data initiatives in the future. Data agility helps in eliminating the dependency on IT, data definitions and structures. More importantly, it frees the IT staff so that they can perform more valuable and leveraged activities.