It was refreshing to read Leena Rao’s article in Tech Crunch yesterday suggesting a New Year’s resolution to drop buzz words like big data, cloud and pivot (at least one of these terms is already on the banned words list here at realestate.com.au!). Of the three buzz words she mentioned big data is probably the most over hyped right now. When it comes to data the focus should not be on volume alone but the insights that can be extracted.
By approaching the data space with this thought in mind I’m seeing three categories of vendor offerings
- Data warehouse in the cloud (AWS Redshift, Treasure Data)
- Enterprise friendly Hadoop (Cloudera, Platfora)
- Single pane of glass across multiple data sources (Datameer)
(1) is about cost reduction and cost avoidance. Many organizations are spending significant capex dollars to maintain and grow their on-premise data warehouse combined with teams of ETL and data warehouse experts. In the cloud opex costs are likely to be consistent but capex will be significantly reduced when you take away the need to buy physical hardware. Looking at AWS Redshift costs appear to be around US$1k per terabyte per annum, compare this to US$5k+ per terabyte just for enterprise class storage (not including software, rack space, power etc) and you can see there is going to be huge demand for products in this space.
(2) is about taking Hadoop beyond the realms of your engineering teams and putting it in the hands of the masses (think analysts, managers etc). It’s highly likely that the analyst community in your organization do not have the skills to use Hadoop. Vendors are looking to address this with ‘business friendly’ applications that enable non-technical people to interact with Hadoop in order to exploit insights hidden in data.
(3) is about providing a unified integration and analytics layer that can query all sources of data within your organization from structured data like My SQL and Oracle to unstructured data like twitter and email. Many organizations suffer from data silos and the nightmare scenario is an unpolished data diamond locked away somewhere inaccessible to the people that could exploit its true value, hence the single pane of glass view into all sources of data. Datameer is built on Hadoop so I could have included it in (2) but it feels different to the other players in this space and the most intriguing.
It’s clear that that we are going to see many more products hit the market from both startups and enterprise software players. The more time I spend in this space I feel it’s less about technology and more about understanding the problem you are trying to solve. All the tools exist today to find any data needle in any haystack but without the right thinking up front to define how a piece of data can be converted into business value the technology investment will ultimately fail.