Enterprise Adoption of Big Data Goes Mainstream

While I walked through the show floor, socialized and attended sessions at the last Strata Conference and Hadoop World in New York, my mind kept coming back to one key takeaway:  Big Data technologies are going mainstream! The feel of the conference shifted from the years prior from grassroot conversations to an Enterprise focus. Everything was more commercial and the conference sessions had considerably more business attendees that were hungry to learn and invest. I could not help but think that Gartner had it wrong, we aren’t headed to the trough of disillusionment for Big Data, when it comes to the Enterprise, it is just the Beginning of the Big Data boom.

Data Analytics have always been in the Enterprise; however, often the Enterprise does not have the agility to adopt the latest and greatest technologies. This could be a byproduct of not having the skill sets or simply the time it takes to convince the right folks that resources and budget should be allocated. And everyone knows it costs more to maintain a new solution than it does to build a new one. The massive amount of marketing materials extolling the virtues and massive ROIs on Big Data projects have caught business attention… innovate in this arena or become irrelevant. From weekly articles from Forbes and the Wall Street Journal to the inundation of marketing materials from vendors, the space is flooded with dialog.

Check out the search trends on Big Data spike within the last year going from 36 to 100. Also note that the popularity of Hadoop is slightly higher yet correlates almost 1:1 to Big Data. This is no surprise since Big Data is often seen as the market leading technology in Big Data.

Big Data Search Trends

Data Source: Google Trends (www.google.com/trends).

I’ve been chatting with many Business owners across numerous industry verticals from Telcos, Finance, Utilities and Retail. It is clear that Business Owners want to learn more about existing case studies on Big Data so they can tee up their business strategies and fiscal year budgets for 2014. Both CIOs and CMOs are speaking the language of the CFO to get them on board with top line and bottom line implications. They need fuel to build the business justification for their next generation projects.

Meanwhile the tools and technologies are growing up and more SMEs are emerging in the larger market. The tools are enabling more than just the Data Scientist and top tier developers to explore the space.  This makes it more plausible to adopt the technologies with less project risk.

Tinkering to Enterprise Deployment

While Big Data has been around for some time in the Silicon Valley, it is just now becoming a part of more traditional businesses. In 2013, we saw a lot of Enterprises tinkering with Big Data looking for ways to Innovate and leverage the technology. They are starting to understand their needs and will look to put the technologies into production.

With market maturity comes the requirement to integrate it into the larger IT ecosystem. Security will become paramount and Enterprises will be looking for ways to secure data within their Big Data Hubs.  Rather than specialized, one-off tools; Enterprises will demand centralized governance and control of their data pipeline for regulatory compliance and overall operational health.  And most importantly, businesses will look to automate their Big Data implementations to incorporate it into their larger business workflows to ensure operational health, auditing, and service levels.

Meanwhile, shortcomings in Big Data platforms will be seen as show stoppers.  The open source community and enterprise vendors will need to fill in the holes: security, administrative tools, high availability, multi-tenancy and real-time operations.

From Volume to Velocity

The focus of Big Data has been the three ‘Vs’: Volume, Variety, and Velocity. Many saw volume and variety as a driving force behind Big Data enabling them to embrace all types of structure and unstructured data formats through late bound schemas and the volume of say the Twitter firehouse.  But really the core focus of Big Data has always been about speeding up the processing time on large amounts of data. Hadoop solves this by sending the processing power to the data instead of the reverse scenario in most SOA based applications. The emphasis on innovation in 2014 in open source and in commercial software businesses will be focused on the “Need for Speed”.

Hadoop for example at its core is about velocity. While it may not have been real-time out of the gate, it is about increasing time to value. The Hadoop v1 architecture was about batch process called Map Reduce. Sears Holding company talks about reducing their existing ETL processing from 30 days to 1 hour. Many will argue that “Fast or faster” is not real-time.  The recent GA of Hadoop v2 will change the game. The new architecture is all about moving from Batch processing to a General processing platform. And we are seeing a speed of innovation that is unprecedented within the Apache community and beyond.

All of the vendors in the Big Data ecosystem are focused on leveraging the new general processing platform or alternative real-time solutions. Cloudera provides the market leading “real-time” SQL platform on Hadoop. Pivotal Hawq is now competing with solutions from Hortworks that are being incorporated into the Apache Hadoop open source ecosystem. With Hortonwork’s recent contributions they have already made Hadoop Hive 40x faster than it was previously and is pushing towards 100x performance improvement.  Presto was just Open sourced by Facebook and also focuses on more real-time processing.  And then there are the adjacent technologies. Spark is interesting technology that banks on memory becoming more expansive and cheaper to get 100x velocity gains over Hadoop; however, unlike Hadoop, it currently does not enjoy wide spread adoption.

It is hard to ignore NoSQL and Columnar storage platforms when talking about Big Data. While Hadoop will continue to dominate in the analytics and data storage. Columnar storage solutions will dominate in the real-time applications that are generally Web and Mobile applications.  Cassandra is the solution of choice at Netflix which can’t be ignored. Netflix’s technologies are driving thought leadership across the industry and setting the precedence for many emerging internet startups. Vendors like Datastax and Acunu are focused on Enterprise offerings of Cassandra.  Datastax has some unique IP for using Cassandra and Hadoop together. DynamoDB gained ground as compared to MongoDB last year, but it will be interesting to see how the new $140 million investment VCs recently made in MongoDB will change the game.

New solutions will emerge to capitalize on data sampling and incremental data as enabling technology for real-time processing in 2014. Rather than analyzing all of your data, considering analyzing the right sample of data to get to the most accurate conclusion. For incremental data processing, LinkedIn just released a great technology which will help in the need for speed, DataFu allows you to build dashboards quickly that incorporate data differentials rather than processing all the historical data you already processed last week or last month.

Market and Technology Consolidation Begins

The speed of innovation around Big Data has been unprecedented in the past five years and there are no signs of it slowing down. And as highlighted when talking about Velocity, there is a lot of room for new areas of innovation. As larger vendors recognize Enterprise interest and demands in the space, the need to have leadership in this will drive acquisitions of smaller companies.

Clear leaders will also begin to emerge in the distribution vendors. The Hadoop Ecosystem alone has numerous Enterprise Vendors. Someone will need to become the “RedHat” of Hadoop.  Will it be Cloudera, Hortonworks, MAPR or Intel?  Cloudera today enjoys leadership through Impala, their ‘real-time’ SQL engine based on Hive. MAPR is less well known; however, has a strong relationship with Amazon and has unique IP for geo-rendundancy with Hadoop. Hortonworks has the best tutorials on Hadoop and often wins the customer love though system usability and management. And then there are other vendors like Intel packaging up and offering Hadoop.

Hadoop provides a columnar storage solution called HBase. Will this gain popularity or will continue with the likes of MongoDB and DynamoDB? Meanwhile there are numerous storage platforms showing up trying to be the jack of all trades. Some will emerge and become household names and some will fade away to be forgotten…. NuoDB, Basho, Calpoint InfiniDB, CouchBase.

All of the vendors will shift their marketing focus to own the Data Platform or as Cloudera calls it, the Data Hub. One place for your data truth or storage with flexibility in processing i.e. for analytics or real-time communication.

Meanwhile numerous visualization and analytics vendors have emerged and gained considerable success on the coat-tails of Big Data. From Tableau, Datameer and Platfora. The more traditional BI solutions like Cognos, Business Objects, SAS and QlikView will begin to fight it out with these new entrants.