The data warehousing appliance has been compared to a black swan.1 For those who may be unaware of it, there really are black swans including some in Australia discovered by Captain Cook (1728-1779). This creates a powerful metaphor about technology innovation (as well as uncovering ones blind spots). Unfortunately, some have jumped to conclusions that are unwarranted about the technology innovations, indulged in speculative hyperbole about a particular vendor where there is really less than meets the eye and misunderstood the market dynamics around the data warehousing appliance. Having said that, the metaphor is on target and offers a bold statement of the obvious namely, that the data warehousing appliance and appliance-like systems are disruptive innovations about which we will be hearing more, much more.
A black swan event describes something that is highly improbable and therefore disrupts our existing view of the world. The black swan stands as a stern warning to all those who imagined All swans are white. Those who imagined this was a firm rule of science were proven wrong by the experience of Captain Cook, granted he had to go to the South Pacific to find a counterexample, but it exists. Beware of hasty generalization as with the comparison of the black swan with the data warehousing appliance.
First, the ongoing drum beat from the appliance vendors that promise order of magnitude performance improvements must be examined carefully. In some cases, it should be challenged. Field programmable gate array (FPGA) manufacturers have never been able to keep up with the commodity CPU companies. While performance on FPGAs is supposed to improve by a factor of five, this is a tad underwhelming when one realizes that the roadmap commodity chips calls for a factor of 10 performance improvement within the same time frame.
Meanwhile, appliances have made a career out of replacing underpowered systems on legacy hardware. A common tactic is to request the prospects query from hell and run it on a new, standalone appliance processor that is otherwise unburdened with work. Naturally, performance is much improved over the legacy production environment. However, is it the real world? When customers conduct a benchmark with complex, mixed workloads and multiple concurrent quires, including updates, the results are much closer with the appliance clocking in at a virtual statistical tie with the standard relational database on comparably powerful hardware.
In short, the appliances claim of an order of magnitude performance improvement simply does not hold up. Advances in hardware will not benefit the data warehousing appliance more than its competitors. Acquiring indexing, aggregates and user-defined functions (UDFs) will not be an advantage for an architecture dependent on any more than one based on a shared nothing architecture such as the standard relational databases. As Ralph Kimball observed years ago, one way to get an order of magnitude (x10) performance improvement is a well-designed aggregate. However, that is not an innovation owed to the appliance trend. If anyone is playing catch up with respect to indexes, aggregates and tuning optimizations, it is the startup data warehousing appliances vendors, who, in turn, can be expected to struggle to integrate feature such as indexes, UDFs and aggregates that have been common for years with the standard relational databases.
Prospects should be encouraged to request full disclosure of the before and after system configurations in the data warehousing pitch. Some appliance vendors have acknowledged the before configuration system of the before/after comparison is a collection (pastiche) of the best of several different systems, no one of which ever displayed all the results together. In short, there was no before system. This can be surfaced by asking a few direct, professional questions such as: How many users were on the system? Current users? Hardware? Databases? How was the system instrumented? That is, how would you know the difference? What was the exact before system configuration over which there was a 10 times improvement? Caution - some of the statements go beyond hasty generalization and stretch the truth into the pants on fire category. Often the data warehousing appliance vendor will have to get back to you on that, in which case one may well ask, What are the facts? The facts are that the data warehousing appliance offers less than meets the eye.
The Data Warehousing Appliance is a Pioneer The black swan should not be confused with a first mover advantage, though some commentators have implies that it offers such an advantage. According to one account, by the time the standard database vendors catch up with where the data warehousing appliance is today, the latter will have moved on. Like Achilles and the tortoise, one can never catch up since the one with the head start no matter how big or small will inevitably have advanced even further by the time the follower covers half the distance separating them. And since space and time are infinitely divisible, the pursuer can never pass the leader! While this is a nice use of Zenos paradox, leaders get passed all the time, nor is the implication accurate that the data warehousing appliance vendor is the leader (except perhaps in its own power point presentations).
At this point, the metaphors come swiftly - we know from the race between the tortoise and the hare, a quick start is good, but it is no guarantee of ultimate success. Granted that it is hard to hit a moving target, the data warehousing appliance providers are not running a linear race, but leap-frogging one another as is frequently the case in technology innovation and competition. Yes, there is a first mover advantage in many markets such as consumer goods and business to consumer (B2C) markets; but in IT, we often find that the pioneers are the ones with the arrows in their backs. The follow-on innovators often benefit from the lessons (and missteps) of the startups, especially if the followers are reasonably prompt in responding.
Certainly the data warehousing appliance offers credible claims to deliver improved performance through massive parallel processing, and hardware assists such as field programmable gate arrays along with ease of installation and operation. The issue occurs as some appliance vendors claim that the performance improvement can be traded for and used to make general product improvements. The suggestion is that all that horsepower can be put to other uses in addition to executing scanning queries in parallel, admittedly a strong suit of the appliance. Many of the claims made are simply non sequiturs - the advantages proposed simply dont logically follow from what has gone before. Possibly an infatuation with the technologically new has created a blind spot or two. Even if we grant for the sake of discussion the data warehousing appliance claims about orders of magnitude improvement exceeding those due to Moores law and bringing indexing and aggregations on stream, claims do not hold up under closer examination.
Findings and Recommendations
Avoid data warehousing religious wars. Whether one favors a centralized or federated architecture, the recommendation remains - avoid data warehousing religious wars. Once again, innovation happens.
Enterprises have seen the future of data management. It requires simplification, high performance and business value-add. In spite of the alignment of the data warehousing appliance with these trends, the metaphor of the black swan contains a serious warning to all providers of appliances and appliance-like systems. Those who believed All swans are white ended up like those who believed one size fits all. It simply isnt so. After preparing the market, the early appliance providers will be marginalized as the bigger players with the sophisticated laboratory systems, support networks and proven ability to bring innovations to market as shipping products are entering the competitive arena. While the future is uncertain by definition, most enterprises will reduce their business risk by going with the second wave of appliances from the larger, established providers.
The beginning of the end and the end of the beginning. For data warehousing appliances, this is the beginning of the end in one way; and, in another way, the end of the beginning. First it is the end of the beginning: The emergence of the data warehousing appliance has been validated by the market even with the public filing by the original data warehousing appliance vendor indicating it has never had a profitable quarter.2 Nevertheless, the first mover startups have primed the pump, gotten traction, turned some prospects into customers and demonstrated that the idea of an open, commodity-based appliance is capable of changing the economics of data warehousing in favor of cost-sensitive buyers. And it has done so in favor of buyers across all platforms, even those that are proprietary. The end of the beginning market means the start of the mainstream, middle market where the technology breaks out into the general purpose data management market. At the same time, it is the beginning of the end for special purpose, proprietary data warehousing systems, which, henceforth, are being renamed as legacy appliances. Thanks to database innovations in standard relational technology, going forward, enterprises will need only one kind of database to perform both transactional and business intelligence processes, though it will be common to continue to implement separate instances for reasons of operational efficiency.
Black swan or duckbilled platypus? In conclusion, the comparison of the data warehousing appliance with a black swan is thought provoking. All swans are not white, nor does one size fit all. Lets agree to disagree on how far to extend the validity of the metaphor and leave it an open question - which data warehouse appliance is the swan, regardless of color, which the peacock, which the elephant and which the duckbilled platypus. Each is perfectly adapted to its environment. Which one will best survive when that environment inevitably changes?
1. Howard, Philip. Netezza: a Black Swan, Bloor Research. October 7, 2007. http://www.it-analysis.com/technology/applications/content.php?cid=9874.
2. See http://www.secinfo.com/dsvRx.u1xp.htm: S-1 Filing, March 22, 2007, page 7.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access