Data Anarchy – A real threat to Self-Service BI (Part 1)

In this post series I examine the challenges companies are experiencing while trying to implement self-service business intelligence initiatives through bleeding edge BI tools. Data anarchy is a real threat for many companies who jump on the band wagon of self-service enabled BI tools. I will end the series with practical recommendations for companies to avoid data anarchy.

Part 1: What is Data Anarchy?

Companies today feel the increasing need for gathering business insights from their data, and this is transforming the BI landscape. Many are looking for simple to use and easy to deploy self-service enabled BI tools to get results, fast.  One of the common complaints of business users is that traditional tools have a steep learning curve and are not intuitive enough to feed data and extract insights within minutes.

Also, as the “moneyball” effect sweeps organizations, business managers try to innovate using data analytics. They want to milk the data they have to the utmost to gain insights about buying behaviors of their customers at an ever deeper level. Widespread adoption of mobile technology and social computing has driven interest in visualization capabilities and real-time analytics. And companies cannot survive (let alone prosper) without recognizing that social as a phenomenon can allow them to redefine their organizations to be inherently more fast fluid and flexible by its very design. There is some relief provided by new in-memory enabled technologies like Qlikview and Tableau but it often comes at the cost of temporarily suspending data management rules, policies or procedures leading to data anarchy.

Companies are susceptible to data anarchy arising from the growing and often hastily implemented new BI tools without thoughtfully planned data management. The effects of data anarchy are more severe for SMBs than for enterprises because enterprise size companies generally have already experienced data anarchy caused by the proliferation of data marts and departmental DWs and they are in the process of adopting robust MDM strategies to address that. But they now need to comprehend the data anarchy caused by the new BI tools as well as part of their MDM strategy.

Most of the new self-service BI tools ingest data into memory using a simple tabular format and further compress it. The ingestion process typically uses some proprietary mechanism to load the data quickly using its own unique join schema. In effect each ingestion process is now creating a unique instance of a data cube. Thus, every time a user needs to bring in new data (which includes new associations/joins and new data entities) the tool has to be re-run to create a new data cube. This approach leads to data anarchy!

The stages of Data anarchy include:

  • Stage 1: During this stage users are typically composing new reports and dashboards out of existing reports. In most of the cases the original data model is preserved and there is a limited possibility that a new data is created since all the insights are constructed using existing data sets.
  • Stage 2: In this stage users extract new data sets from the source to develop new reports and dashboards. This is where IT starts losing control of the master data rules and processes. Depending upon the type of data set created the original model may be partially or totally compromised at this point.
  • Stage 3: This is the stage where users start bringing in totally new data which they then mash up with existing data sets to create insights.
  • Stage 4: Over time multiple users start maintaining multiple sources of data cubes created from the master data and at this stage it is a data management nightmare even for the end user! There is no single version of truth and reconciling to a single version is a mammoth effort.

Organizations dealing with data anarchy need to ask themselves the following questions:

1) How do organizations prevent the suspension of rules, policies, while continuing to meet the demands for time-sensitive business intelligence results?

2) How do organizations manage multiple instances of data? Where is the single version of truth?

3) How to organizations evolve their existing data governance model to be able to address the data anarchy chaos?

4) How do SMB organizations create a data governance model out of existing anarchy? And

5) How can BI solution providers address data anarchy?

Stay tuned for the next post where we explore how we got into this mess!

( I am going to watch the 2012 Presidential election results now!)