APR 1, 2005 1:00am ET

Related Links

10 Sustainability Predictions for 2011
February 23, 2011
A Letter to Future Employees: Embrace Analytics
February 3, 2011
A Hunger for Risk
January 6, 2011

Web Seminars

Achieving Real-Time Agility with Operational Warehousing
June 21, 2012
Data Replication for Real-time (Big) Data Warehousing
Available On Demand
Improving your Overall Analytical Environment by Migrating to a New Data Warehouse Platform
Available On Demand

Introducing the Data Warehouse Appliance, Part 2

Print
Reprints
Email

The following column is excerpted from the white paper, "Introducing the Data Warehouse Appliance," by William McKnight.

data warehouse appliance n., 1: a hardware/software/OS/DBMS/storage bundle designed to perform traditional and complex analysis functions using commodity components at a price/performance advantage over traditional approaches.

Datallegro and Netezza are examples of data warehouse appliance vendors. As such, they offer pre-integrated platforms, storage, relational database management systems (RDBMSs) and their own software to make it all work together according to their specifications, but that doesn't mean their configurations are identical.

Datallegro uses Novell's SUSE Linux open source OS software. Datallegro uses Ingres as its open source RDBMS. Netezza also leverages Linux open source OS but uses the version provided by Red Hat. Netezza uses Postgres as its open source RDBMS. Netezza uses Gigabit Ethernet, and Datallegro uses InfiniBand.

Where they differ is in their architectural approaches. Datallegro configures off-the-shelf components into dual-CPU, multi-disk "bricks" as their unit of parallelism. Datallegro says this architecture delivers balanced performance for general purpose data warehousing (i.e., mixed query workload) by marrying the power of dual CPUs with very high direct-attach I/O capacity. They further claim their data distribution significantly reduces network traffic on joins.

Netezza's unit of parallelism is their Snippet Processing Unit (SPU). The SPU consists of a disk drive and a special-purpose computer with hard-wired logic for accelerating record management and analysis. According to a recent Forbes article, "The chip queries the data right at the drive, passing back only the correct answers to the main computer, which runs Netezza's own database software program. The machine runs faster because fewer files are flying back and forth." (Forbes, December 13, 2004).

The vendors also differ in their product positioning. Datallegro positions itself as a general-purpose bolt-on to terabyte-and-beyond Oracle data warehouse environments, whereas Netezza is targeting high-end enterprise data warehouse environments.

One important characteristic the data warehouse appliance market shares is that it is taking a fresh look at an old problem. By challenging conventional price points for the storage of complete corporate data and the development cycles for the data to be accessible and under management, they are hoping to render useless entrenched views. This is one example of many new approaches and mind-set changes that the appliance model brings to a company deploying it.

Some hurdles have already been crossed by the data warehouse appliance industry. Data load rates are quite impressive. Performance of selective queries, especially against large volumes of data, is distinctively impressive due to the automatic parallelism. It is difficult to validate low TCO for a mixed workload data warehouse environment at this time, but low TCO is seemingly consequential with appliances.

Unproven areas include highly concurrent environments, management tools (for those times when you do need to tune the system), vendor support (although SQL, ODBC and JDBC compliance are supported) and named reference accounts. However, most of these are issues of maturity, not inherent flaws in the architecture.

Appliances are already solving real-world problems such as a wireless carrier having access to 120 days of data for revenue assurance analysis in less than 30 minutes (versus 6 hours for a single day) and 30 minutes for traffic pattern analysis that previously took 23 hours.

If you're committed to physical data warehousing and have a terabyte-plus warehouse or designs for one, stay aware of data warehouse appliances. Will the market recognize them in time or are they ahead of our time? Will traditional vendors such as Oracle, HP, Teradata and IBM close the gap? These questions remain to be answered.

William McKnight brings process, organizational and architectural focus to building strategies and implementing master data management and data warehousing programs that have consistently, for many years, improved the productivity and performance for his clients including several global corporate giants. McKnight, award-winning consultant and author, can be reached at McKnight Consulting Group.

Filed under:

Advertisement

Comments (0)

Be the first to comment on this post using the section below.

Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.
Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.