It's been 20 years since the term "data warehousing" was first coined. Since then, it has evolved into a mainstream activity for Fortune 1000 companies, with many undertaking one or more data warehousing projects over the last decade. However, even after 20 years in the marketplace, it is evident that data warehousing is still misunderstood.
The goal of data warehousing is to create "one version of the truth." However, in these attempts, many companies often create a proliferation of data silos instead. Here are some classic examples:
- Companies implement a plethora of databases under the data warehousing umbrella - many of them inconsistent or not integrated with each other. According to a 2004 The Data Warehousing Institute survey, "On average, organizations have two data warehouses, six independent data marts, 4.5 operational data stores and 28.5 spreadmarts."
- Many enterprise resource planning (ERP) reporting and business intelligence (BI) efforts within companies continue to be siloed - organizationally and technically. This is true despite the fact that ERP vendors have embraced data warehousing and have built data warehousing modules as part of their product offerings. Unfortunately, these modules are often implemented separately from the corporate data warehouses companies have already built.
- Corporate performance management (CPM) presents a perfect business justification to implement a comprehensive, consistent and integrated data warehousing effort across an enterprise. However, a CPM project often implements a prebuilt solution from a BI vendor that is separate from the company's corporate data warehouse, just like the ERP one previously mentioned. This approach produces short-term, quick business wins. That is okay for the short term, but in the long term it just expands the silos and data inconsistency across a company.
Just because a company's data warehousing efforts have resulted in more data silos doesn't mean that data warehousing is a poor choice. However, it might mean that people simply do not understand or cannot get company-wide buy in about how to do data warehousing right.
Data Warehouse vs. Data Warehousing
Why is there such a large gap between the goal of data warehousing and the reality of what a company has actually implemented? Quite often, it is confusion.
People often confuse a "data warehouse" with "data warehousing." Data warehousing encompasses a complete architecture and process; it is not just having a single data warehouse. Data warehousing is the transformation of data to information, thereby enabling the business to examine its operations and performance. This task is accomplished by the staging and transformation of data from data sources, enabling the business to access and analyze information. The data stores may be persistent (stored on disk) or transient (using disk or memory). In addition, the workflow usually involves multiple data stores to support the staging and transformation of data into information such as operational data stores, data warehouses, data marts, online analytical processing cubes, files such as a flat file (comma-separated values extract, for example), XML data and even spreadsheets.
Any time you get data from the operational systems to perform reporting and analysis, you are performing a data warehousing process. In the old days, it was called decision support; now the term is business intelligence. Data warehousing is what lies beneath the graphs and pivots presented by BI. In fact, BI is merely the presentation layer of the data warehousing architecture.
Data Warehousing is an Architecture and a Process
Too often data warehousing is associated merely with a data warehouse rather than the entire architecture and process. The problem is, when you narrow your focus to a single database, you lose the entire context of the staging of data. Data quality, consistency and integrity - not to mention being able to audit the data trail - is only achieved when the entire data staging (or data warehousing) architecture is considered.
A data warehousing program is much more than a data warehouse. With a single, narrow data warehouse focus, separate efforts for ERP, CPM and BI recreate the data warehouse architecture and create their own data silos. Ideally, an overall architectural view would let all these efforts leverage each other's work and reuse tools, code, processes, data and standards. A company would be able to implement these systems more economically, with a higher ROI, lower overall operating and maintenance costs, but also strive toward the "single version of the truth."
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access