This week, I attended the Strata 2012 conference on Big Data hosted by O’Reilly. The theme was ‘Making Data Work’ and 2000+ industry professionals came to the Santa Clara Convention Center to attend sessions and to network. The day one Jump Start Sessions called ‘The missing MBA for Big Data’ discussed various topics spanning from what a data-driven CEO should be to workforce instrumentation. These discussions were interesting from a more philosophical perspective, but lacked specific examples that would have made the arguments more compelling. One really good example of what I would have liked to see a lot more of was the use case presented in the data-driven CEO session of P&G’s investment and their use of Predictive Analytics. P&G has invested in a big data analysis command center using Business Sphere, a visually immersive data environment that has transformed how they make decisions because it allows them to take advantage of global data in real-time.
The exhibition floor bore some unique surprises in terms of new technologies as well as next generations of traditional DB and BI technologies. I visited the Amazon web services booth and saw a demo of DynamoDB which is a NoSQL database service hosted in the cloud. It is a highly available, distributed database cluster architecture. One of the things that hit me during the demo was the key difference between traditional database architectures and NoSQL – while NoSQL gives you the ability to scale and store massive amounts of data it does come at the cost of compromising some of the traditional database characteristics, such as database schemas and joins.
As expected, Microsoft had a big presence at the expo. They featured everything from their new SQL 2012 server to their cloud-ready analytics platform. A couple of interesting announcements from MS included its plans to make Hadoop data analyzable via both, the Java Script framework and MS Excel as well as their partnership with Datameer supporting Apache Hadoop service on Windows Azure. The MS Excel connector is particularly relevant because it bridges the gap between the developer-centric Hadoop environment use and end-user centric Excel use – essentially bringing Hadoop data to the end user!
Then there were newer companies showcasing their technologies. One of them was Data Sift. Data Sift is a cloud platform which helps customers to do sentiment analysis using social data (ex. Twitter). It is a very unique way of combining social and business insights. As more and more organizations start embracing social media into their GTM strategies, technologies like these will help them bridge the gap between traditional data management solutions and modern-day social analytics.
VMware announced Hadoop support for their SpringSource framework. The first version will simply allow developers to create MapReduce jobs including Hive and Pig connections as well as scheduling. I am looking forward to seeing their broader vision for the use of data analytics in their mainstream products.
My personal key takeaways from Strata 2012 are:
- Discussions on Big Data are much more valuable if founded on customer needs rather than technology capabilities.
- Let’s not forget about data federation! As the number of data sources keep multiplying enterprises should start focusing their energy on The Big Picture, i.e. how to use the data to enable them to make business decisions.
- This space is in a critical phase of development. There is a huge demand for the right skills evidenced by the flocking of programmers to these conferences.
- Many vendors at the conference (MarkLogic, MapR, Hadapt, Revolution Analytics, Datameer, Horton Works, Karmasphere) claim that their technologies complement Hadoop but there is no indication of when Hadoop will become mainstream in enterprise ITs. However, VMware and MS’s announcements at the conference indicate that there is an early effort to begin mainstreaming Hadoop.
- NoSQL databases are not getting major traction, yet.