There is a lot of buzz about Hadoop, NoSQL, NewSQL and columnar MPP databases. But where is the actual value for businesses? Businesses need to have actionable information derived from their data that they collect on a regular basis. We know how to collect data and store them in databases of various kinds. We have seen the evolutions of SQL databases over the last five decades and databases have gotten sophisticated in terms of processing structured data. With the recent explosion of social media and with it the proliferation of unstructured data, new technologies have emerged, such as MapReduce. So now we have the data but the real question is where does the business value actually get delivered? The answer is simple. The value does not lie in the way data gets pulled into the database or how the database is optimized to handle new varieties of data. While these steps are important the value continues to be delivered at the Analytics and the end user Reporting layer as illustrated in the Business Intelligence value pyramid.
© image shree dandekar 12/6/2011
Rewind << for a second:
The 1990 era was all about capturing business relevant data, storing it using business constructs into a database. Typical use cases involved performing OLTP (Online Transaction processing) workloads on that data. We saw the evolution of Data Warehouses as enterprises started to seek out more analytical insights from the data stored in the database which gave rise to OLAP (Online analytics processing) workloads. Once in the data warehouse the data was cleansed, filtered and augmented with Business rules using some traditional ETL (Extract, Transform, Load) or Data Integration tools, thus removing any redundancies from the data as well as normalizing it. You would still have to run a Business Intelligence capability against this data to develop dashboards or reports to actually be able to derive some business insight from this data. Enterprises could also decide to further perform detailed trend analysis, forecasting using advanced data mining tools.
Fast fwd >> to today:
As EDW’s started getting bigger in size IT soon realized that managing a monolithic data warehouse was cumbersome. Hence the birth of departmental and function specific data marts. But that was not enough since they did not address the core issues of scalability, performance, agility and the ability to handle large volume transactions. Over the years some viable alternates like Database sharding have been used but even that have limited success in terms of scalability. Also it is noteworthy to mention that some of these core issues spawned from the actual limitations of the underlying DBMS’s like MySQL not being able to scale.
Hence investigating alternate DBMS technologies to address these issues has been a focal point of IT managers. So we continue to see the emergence of new DBMS technologies like NoSQL and NewSQL. Similarly we have seen the emergence of MapReduce (Hadoop) in the area of handling unstructured data. The core use case for MapReduce remains in its ability to store massive amounts of data, pre-process it and perform exploratory analytics.
The reality for enterprises is that there are now multiple types of databases in the form of EDW, data marts, columnar MPP stores as well as MapReduce clusters. This ecosystem is being commonly referred to by some industry analysts as Data Lakes.
So if you step back and look at the broader BI space you will notice that there is a lot of effort being spent on getting the plumbing right so that the data (structured as well as unstructured) is massaged and primed. While businesses continue to figure out the optimal data management solution they should not do it without investing in analytics and reporting capabilities needed to extract actionable insights.
I will expand on the reporting and analytics layer in my next post. Specifically trying to address self-service and new technology disruptions like in-memory.
Also the value pyramid assumes an on-premise business intelligence architecture. I will address the Cloud intercept in subsequent posts.