There's an interesting thread on LinkedIn's TDWI Business Intelligence and Data Warehousing Group I participate in. The point of departure for the discussion is a venting blog by venerable Bill Inmon, often acknowledged as the father of data warehousing, over a marketing ad by Hadoop distribution vendor Cloudera that claims: “CLOUDERA-BIG DATA Turbocharge your data warehouse”.
Inmon's diatribe takes Cloudera to task for even mentioning big data and the data warehouse in the same breath, the ad's imputed implication being there's a strong affinity between the two, that in fact DW and BD are companion endeavors. Inmon, though, scoffs that BD's simply a technology while the DW's an architecture -- honed over 25 years. “Big Data is good at gobbling up large amounts of data. But analyzing the data, using the data for integrated reporting, and trusting the data as a basis for compliance is simply not in the cards. There simply is not the carefully constructed and carefully maintained infrastructure surrounding Big Data that there is for the data warehouse.”
Some in the group agree with Inmon, piling on to Coudera's seemingly brazen claims. My take, though, is much more in line with Barry Devlin, who first chides Inmon that Hadoop, not big data, is rightfully the technology, and then more critically opines: “What is important architecturally is NOT to attack one or other set of vendors. What is important is NOT to mindlessly defend the data warehouse approach. What we need to do is to look how the data warehouse must evolve and how big data technologies and BI technologies interact to create a new biz-tech ecosystem where all information is maintained according to its value and its meaning.”
I've been a Bill Inmon fan for years, my BI/analytics consultancies always adopting a hybrid Inmon-Kimball approach to implementing DW/BI. He was even on the board of directors of a company I worked for in the late 90's.
Inmon, though, appears to have “dug in” with the DW-BD controversy. From my vantage point, his essay would have been much more effective had “not” been softened to “not yet”. To suggest that big data cannot encroach on traditional data warehousing is quite naïve, just as was the claim by network and hierarchical database vendors 30 years ago that new relational technologies could never meet business processing needs like the entrenched. What do you hear about 1985 database stalwarts IMS and IDMS today?
Indeed, if you think big data tech's role is consigned to be outside the data warehouse, just ask Facebook, Yahoo and Netflix, each of whom presented compelling illustrations of replacing traditional “intelligence databases” with low-latency HDFS SQL query platforms for the masses at Strata 2014. While one could certainly make the case that these data stores are different from the pristine 2014 DW, it'd be foolish to deny any similarity and ignore that change is on the way.
True, big data technologies are just now advancing to second generation and cannot yet offer the trust, metadata management, security, etc. that are part and parcel of the DW. So maybe rather than calling these new structures data warehouses, how about, as Devlin suggests, anointing a new term that acknowledges their hybrid nature?
Also true that bellwether technology/data companies from Silicon Valley are generally early adopters. Mainstream companies won't be nearly as eager to change their intelligence platforms at the first blush of technology success. But I bet that'll come in relatively short order, especially if the trajectory of progress continues as it is today. The lower cost and better performance of big data technologies will indeed present formidable business cases.
Big data and the data warehouse serve different masters. DW has historically revolved on performance management, while BD obsesses on analytical products for data-driven business. On the tech side, DW/BI has generally been served by dominant vendors like IBM, Oracle, Microsoft, SAP, et al, whereas big data's gravitated to open source.
Perhaps as much as anything, though, there's a generational divide between big data and data warehouse camps. If TDWI is much the incumbent data warehouse organization, then Strata represents big data upstarts. And if my participation in TDWI and Strata conferences are guides, the average “Stratan” is half a generation younger than her kin “TDWIn”. With age comes wisdom along with perhaps a tad of complacency.
BD types often deride data warehousing as the ponderous intelligence incumbent, its ranks bloated, its timelines padded, its delivery often underwhelming. DW'rs, on the other hand, paint big data as the cowboy cotillion, much more aligned with the excitement of discovery than the tedium of production.
Ever the conciliator, I see important contributions from both sides. I also see the data warehousing landscape changing for the better over the next five years as big data matures and comes of age. Both DW and BD are poised to be beneficiaries.