The end of the one-size-fits-all database is nigh. Todays 30-year-old online transaction processing (OLTP) databases werent designed to handle ad hoc queries against terabytes of data. That means companies today are facing conflicting data storage challenges as their databases are growing rapidly along with their need to conduct complex ad hoc queries on the data. Companies that spend millions of dollars on stopgap measures typically see limited benefits and, ultimately, have to rethink their strategies if they want to compete effectively.
This article presents an overview of the evolution of relational database management systems (RDBMSs) and how the gap between data warehouse requirements and data warehouse database capabilities has come about. I will also explain seven database innovations that will revolutionize data warehousing. New developments such as a column-oriented architecture, aggressive data compression and the ability for RDBMSs to run natively on large grids/clusters of industry-standard servers make todays business queries - regardless of complexity or level of detail - fast, efficient and affordable. A rearchitected database management system (DBMS) featuring these innovations means business data analysis becomes a competitive advantage rather than an obstacle. CIOs will be able to analyze more data faster so their organizations can succeed.
RDBMS Comes of Age
Relational databases emerged in the 1970s as a way to improve business data processing. The idea was that storing data in row-oriented tables that group related information was better than using hierarchical data models, which incorporate parent/child relationships to store data in tree-like structures. Relational databases such as Ingres were much better at allowing users to store and access information involving one-to-many relationships rather than hierarchies, which are limited because every child can only have one parent.
Ingres, Postgres and other early-stage RDBMS technologies shared a common architecture that became the blueprint for most RDBMS products still in use today. These row-oriented architectures are optimized for write-intensive OLTP, which enables simple data storage and retrieval. OLTP makes it easy to find complete records in one place, streamlining data updates.
While these RDBMS platforms made it easier and more efficient to store and retrieve data at the time, they were not designed to handle read-intensive analytical workloads. As demand increased for more complex business data analytics, organizations had to work harder to get the performance they needed from their queries, leading to rising hardware and operating costs.
Today, this means that many organizations are not achieving all of their business goals because their databases are too big to support their increasing need for complex, real-time data analysis. This problem will only worsen with time, as half of all data warehouses will exceed 10 terabytes of data by 2010, according to The Data Warehouse Institute.1 Moreover, query complexity often increases as the volume of data rises.
The reason so many organizations are struggling with this problem is that until recently, there had been little innovation in the 30-year-old DBMSs these companies are running to match the rapid growth in database size. Granted, large RDBMS vendors developed some enhancements to improve overall data warehouse query performance by adding features such as bitmap indexes, online analytical processing (OLAP) cubes, materialized views, index-only tables and join indexes. Theyve also extended their RDBMSs to support shared disk systems (disk clusters), shared-nothing systems (blades) or both as well as SQL or XQuery on either tables or data represented in XML schema.
These improvements were established in large part to ensure that a single line of code would continue to meet all DBMS needs. Vendors created a template for the one-size-fits-all database that they try to force fit into a broad range of markets. They do this because adapting databases to the different business markets would require writing other code lines, which is astronomically expensive, could take years to develop and would take more time and money to train sales staff on how to sell it.
Organizations in data-heavy industries using existing DBMS technologies have been faced with some unpleasant choices. Typical solutions involve hiring additional database administrators (DBAs), creating and maintaining OLAP cubes (which is time-consuming, costly and slows down load performance) or replacing legacy DBMSs with expensive, proprietary data warehouse appliance hardware. Companies spend millions of dollars annually on these fixes, which ultimately prove to be stopgap measures that dont really solve the problem.










Be the first to comment on this post using the section below.