The changing economics of data warehousing hardware environments mean more options for end-user enterprises as the Linux and AMD64 revolution comes to data warehousing. This is exemplified by improving TPC-H benchmarks and IBM's Integrated Cluster Environment (ICE). In addition to a marketing pun as IBM puts the data warehouse on ICE, the metrics make an engaging case study. A careful inspection of the numbers indicates that the open source revolution is the occasion for the price reductions, not the cause. Approximately 96 percent of the savings is due to hardware improvements as well as lower database costs directly determined by hardware improvements. The actual savings due to open source is one percent of the total system cost. This is the first audited benchmark to be submitted using SUSE's LINUX operating system with any standard relational database (DB2 UDB 8.1 in this case). The results amount to lower costs and higher performance as is typical of the relentless march of progress in commoditizing a successful technology. Data warehousing clients should look to open source for savings in acquisition and lifetime support costs but should not neglect the relentless march of improved hardware performance as a source of savings of even greater current significance.

Let's compare the IBM TPC-H from July 29, 2003, with that from April 9, 2002, at the 300GB volume point. Both execute with DB2 UDB ­ versions 8.1 and 7.2, respectively. The overall price of the configuration has fallen dramatically in the past 15 months, from $2,636,750 to $851,953 (by $1,784,797 or nearly 68 percent). Meanwhile, the composite power and throughput metric (QphH@300GB) remained about the same, increasing slightly from 12,995.4 to 13,194.9 (a tad more than one percent). As noted, this betters the definition of Moore's law (which is still in force and states that processor performance doubles every 18 months) because the IBM eServer with 2GHz AMD chips from the July 2003 submission cost $112,935 in comparison with ProLiant 900MHz chips priced at $777,812 from the April 2002 report. This creates an 85-percent improvement in the price of the hardware during 15 months (see Figure 1).

Figure 1: Year-to-Year IBM TPC-H Improvement

The increased hardware power results in a reduction of the number of processors from 64 in the April 2002 benchmark to 16 processors in the July 2003 benchmark. Given a clustering configuration where the database is priced by processor, this results in a reduction of the cost of the database software from $1,417,792 for the April 2002 benchmark to $425,504 (including the DPF license) for the July 2003 benchmark, a savings of $992,288 (with DPF).

This results in a total savings of $2,777,085 for both the hardware and database, of which the database is approximately 36 percent and the hardware is 44 percent. However, note the database savings is directly determined by the hardware savings. In addition, the price of the database per instance decreased approximately 10- percent year to year. While every penny is significant, the latter is not a major factor here and, along with modest improvement in the cost of disk, is responsible for the other 3 percent not included in the bottom row of the table in Figure 1.

Meanwhile, regarding the 300GB benchmarks, the actual cost of the operating system is $38,384 for Microsoft Windows 2000 Advanced Server versus $2,588 for SUSE. This is a dramatic percentage savings of approximately 93 percent of Linux over Microsoft, or $35,796. However, this savings is off of a very modest base. This savings is thus only approximately 2 percent of the total system cost of the July 29, 2003, benchmark and only one percent of the April 2002 submission. Therefore, the open source operating system is only a small part of the overall dynamic here. While every dollar counts, there are just so many more of them in the case of the hardware and database.

As indicated, the cost of the processor hardware and the implied savings in the reduced number of database instances is responsible for the lion's share of the savings. It is quite likely that dramatic savings would still be available even if open source were not a part of the equation. Open source has numerous benefits including breaking the relentless hold of technology lock-in and cost savings in acquisition and lifetime support costs. However, clients should also attend to significant opportunities for savings due to such common considerations as the natural trajectory of technology innovation, which generates improved hardware performance as a source of savings of great measure in its own right.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access