High-performance data warehousing applications begin with a consistent and unified data model a design for performance. Design is on the critical path of high performance data warehousing. Different approaches to designing a data model for a data warehouse have frequently translated into religious wars between the followers of the star schema school and those of the operational data store (ODS) approach. However, in fact, both approaches are valid within context and represent different perspectives on the same domain or problem space. Here is how to reconcile the two different approaches.
One reason the data warehouse was invented was to avoid performance problems when decision-support queries were executed against operational systems. The long-running summaries characteristic of decision support presented logjams to the quick updates needed by the transactional systems. The situation also applies in reverse a highly normalized data model design is not optimal for business intelligence that aggregates data across multiple transactional systems, the latter used to run the business on a day-to-day basis. Data warehousing application performance requirements are different in interesting ways from those of classical online transaction processing (OLTP) databases. OLTP applications such as SAP R/3, PeopleSoft Human Resources, Oracle Financial or Siebel operational customer relationship management (CRM) require frequent updates to what are usually highly normalized data structures. In contrast, the requirements for decision support include aggregating and summarizing of large amounts of data in order to get a business intelligence perspective on trends for customer segments and product brands, and are performed across large time horizons. For example, knowing the lifetime value of a customer requires aggregating a lifetime of detailed transactional data. Call it what you will, that mass of detail often collected from multiple operational systems is a data warehousing function. These differences translate into different design approaches.
The star schema works best in the micro process designing a particular data model for summarizing a join of customer, product and geographic data. The ODS works best in the macro process integrating, managing and processing large volumes of data that represent the lifeblood of the enterprise the information factory. From the ODS perspective, an individual star schema is a functionally dependent data mart a local solution (albeit a highly useful one). In contrast, from the perspective of the star schema, the ODS is a staging area in which operational data from inconsistent transactional systems can be reconciled and consolidated. The star schema is a true third-normal form (3NF) data structure, though the points of the star are sometimes denormalized if the customer or product dimensions extend to millions of rows.
Performance is a function of the entire technology stack hardware, software, network, database, data model, business design and application, and the architecture that ties them all together. High-performance data warehousing applications begin with a consistent and unified data model a design for performance. Developers who think data warehousing performance issues are best left to post-implementation monitoring and tuning are in for a shocking disappointment. High-performance data warehousing begins with a design for performance, including the design of the data model supporting the application. Of course, one can and must apply performance and tuning principles after any significant implementation or upgrade, but that misses the bigger picture. Regardless of whether an enterprise has a centralized or distributed architecture and regardless of the many details of the technology stack, the single most important thing an enterprise can do to promote long-term scalability is design consistent, unified representations of product, customer, geography, calendar and other data dimensions intrinsic to the firm in question.
Avoid religious wars. Apply and use the star schema and the ODS as the micro and the macro approaches, respectively, to data warehousing. Use the star schema when designing a particular subject-area domain, individual aggregate or related data model. Performance in building summaries and aggregates and performing joins is often optimized in this way. Use the ODS when designing a workflow back and forth between operational and decision support systems that must match-merge, consolidate and process data from heterogeneous transactional systems. As data volumes (and related performance challenges) ramp upward, it is often useful to introduce a concept such as the ODS to make manageable the tidal wave of information that is being pumped through the systems. As particular business questions arise about subject-area domains, it is likewise useful to deploy a star schema to represent the same data in different ways to accommodate the different perspectives of marketing, finance, inventory control and your business process of choice.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access