For those data warehousing clients with requirements that are multiterabyte, high-end decision support, requiring superior price/performance across significant numbers of users, massively parallel processing (MPP) will remain an operational necessity. Although symmetric multiprocessing (SMP) is raising the price/performance bar and pushing the crossover point upward, SMP and MPP will coexist in a gray transition area at the high end; and clients will benefit from the choices and competition. Many more data warehousing clients will be able to make use of standard SMP approaches than had previously been the case, but those with multiterabyte volumes combined with high-performance requirements (active data warehousing) will still require a special-purpose data warehouse server.

Innovations in memory-to-CPU interconnection, such as crossbar switching, have reduced memory contention in SMP designs and have reduced the coordination costs of the hybrid clustered- SMP approach. However, slope of one linear scalability of hundreds of processors still requires an MPP database. The central trade-off between single image (SMP) and parallel processing (MPP) database warehousing servers is between ease of administration and scalability. Eventually the coordination costs of refreshing and synchronizing cache start to render the scalability of SMP less than one. The addition of another processor does not produce a full processor's quota of work and throughput due to coordination costs. A similar dynamic can also affect shared disk clusters where, absent an abstraction layer to map disks to nodes, a global lock or database synchronization mechanism is needed to preserve data integrity. In contrast, while MPP scales linearly to hundreds of nodes, troubleshooting so many processors can be an issue for administrators trained in the SMP world. More moving parts reduces the mean time between failure (MTBF). Advocates of SMP versus MPP (e.g., HP, Sun, IBM versus Teradata, IBM) argue that the performance cost of data movement through the high-speed switch (a defining characteristic of clustered hardware and MPP databases) can be significant, and data placement remains a critical success factor. This is true, but it is the required trade-off for high-performance results given complex queries against large volume points. Further trade-offs include:

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access