The following column is excerpted from the white paper, "Introducing the Data Warehouse Appliance," by William McKnight.

data warehouse appliance n., 1: a hardware/software/OS/DBMS/storage bundle designed to perform traditional and complex analysis functions using commodity components at a price/performance advantage over traditional approaches.

Datallegro and Netezza are examples of data warehouse appliance vendors. As such, they offer pre-integrated platforms, storage, relational database management systems (RDBMSs) and their own software to make it all work together according to their specifications, but that doesn't mean their configurations are identical.

Datallegro uses Novell's SUSE Linux open source OS software. Datallegro uses Ingres as its open source RDBMS. Netezza also leverages Linux open source OS but uses the version provided by Red Hat. Netezza uses Postgres as its open source RDBMS. Netezza uses Gigabit Ethernet, and Datallegro uses InfiniBand.

Where they differ is in their architectural approaches. Datallegro configures off-the-shelf components into dual-CPU, multi-disk "bricks" as their unit of parallelism. Datallegro says this architecture delivers balanced performance for general purpose data warehousing (i.e., mixed query workload) by marrying the power of dual CPUs with very high direct-attach I/O capacity. They further claim their data distribution significantly reduces network traffic on joins.

Netezza's unit of parallelism is their Snippet Processing Unit (SPU). The SPU consists of a disk drive and a special-purpose computer with hard-wired logic for accelerating record management and analysis. According to a recent Forbes article, "The chip queries the data right at the drive, passing back only the correct answers to the main computer, which runs Netezza's own database software program. The machine runs faster because fewer files are flying back and forth." (Forbes, December 13, 2004).

The vendors also differ in their product positioning. Datallegro positions itself as a general-purpose bolt-on to terabyte-and-beyond Oracle data warehouse environments, whereas Netezza is targeting high-end enterprise data warehouse environments.

One important characteristic the data warehouse appliance market shares is that it is taking a fresh look at an old problem. By challenging conventional price points for the storage of complete corporate data and the development cycles for the data to be accessible and under management, they are hoping to render useless entrenched views. This is one example of many new approaches and mind-set changes that the appliance model brings to a company deploying it.

Some hurdles have already been crossed by the data warehouse appliance industry. Data load rates are quite impressive. Performance of selective queries, especially against large volumes of data, is distinctively impressive due to the automatic parallelism. It is difficult to validate low TCO for a mixed workload data warehouse environment at this time, but low TCO is seemingly consequential with appliances.

Unproven areas include highly concurrent environments, management tools (for those times when you do need to tune the system), vendor support (although SQL, ODBC and JDBC compliance are supported) and named reference accounts. However, most of these are issues of maturity, not inherent flaws in the architecture.

Appliances are already solving real-world problems such as a wireless carrier having access to 120 days of data for revenue assurance analysis in less than 30 minutes (versus 6 hours for a single day) and 30 minutes for traffic pattern analysis that previously took 23 hours.

If you're committed to physical data warehousing and have a terabyte-plus warehouse or designs for one, stay aware of data warehouse appliances. Will the market recognize them in time or are they ahead of our time? Will traditional vendors such as Oracle, HP, Teradata and IBM close the gap? These questions remain to be answered.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access