Continue in 2 seconds

We are building an enterprise data warehouse.

By
  • Sid Adelman, Michael Jennings, Joyce Bischoff
Published
  • February 07 2002, 1:00am EST

Q:

We are building an enterprise data warehouse. We are replacing some operational systems feeding data marts and management information systems, and the question arises whether all this reporting should be rebuilt out of the enterprise data warehouse (putting extreme burden on the team consolidating its data model and infrastructure) or whether some of the reporting systems ought to be kept separate from the enterprise data warehouse. Are there best practice guidelines on which (if not all) categories of data needed for reporting ought to be included in one's enterprise data warehouse?

A:

Sid Adelman’s Answer: Your question exposes some entrenched positions that are opposed. There are some who will say that nothing related to operational systems should compromise the data warehouse. If operational reporting is allowed out of the data warehouse, its development must be extended to provide for capabilities not planned for conventional data warehouse activities. The design of the database might also be a compromise, probably satisfying the performance requirements of neither constituency. On the other hand, the enterprise data warehouse should have cleaner data than that found in the operational systems. The meta data should be more complete. The access and analysis tools may satisfy some of the traditional reporting requirements and their use should enhance the productivity of those producing the reports. And finally, by reporting out of the data warehouse, there should be less discrepancy between the operational reports and those coming from the data warehouse – they both would be reporting off a single version of the truth. And now for the cultural issues (remember these are the ones that can kill you) – the application development folks responsible for the operational systems definitely don't want to lose any influence and power and reporting off the enterprise data warehouse will result in a loss to them. The people who have an intellectual investment in the tools they currently use for reporting may find the new access and analysis tools beneath them (it takes a real man to code in Assembler) and reject the new tools. Problems with the data warehouse, both real and those that have been conjured up, will serve the folks who wish to discredit all or parts of the new system.

Joyce Bischoff’s Answer: There is no reason not to produce certain reports from operational systems. Are the reports used for operational purposes or for decision support? Who uses the reports? If the users are the same people who use the data warehouse, you might decide to produce the reports from the warehouse. You might also consider developing an enterprise portal that will give the users a consolidated view of data and reports from both the operational and informational sources.

Michael Jennings’s Answer: End-user reporting should be driven be from data mart(s) and/or cubes that have been spawned from the enterprise warehouse. This allows back-end ETL processing issues such as granularity of the data extracted from the operational systems, coordination of data extract frequency and integration, capture of data beyond current iteration and load windows to remain isolated from affecting the end- user analysis. Information from the enterprise warehouse can be categorized, filtered and aggregated into either data marts and/or cubes to meet the business analysis needs of the end user. Use of the enterprise data warehouse as a back-end data store allow data marts or cubes to be redesigned or replaced to meet changing business requirements or focus. Data marts and/or cubes can be completely regenerated from the detailed level information contained in the enterprise warehouse. The data marts or cubes can focus on optimizing performance (limiting data content, aggregation, deriving data, categorizing data, database fragmentation) for end-user analysis without having to be coupled to back-end ETL processing requirements.

Chuck Kelley’s Answer: If you have all the data in the data warehouse and it is timely enough, then I would do my reporting off the DW. But, this may not be the case (or you have built a reporting database instead of a data warehouse!). The practice guidelines depends on the needs of your reports and your strategic analysis. I would keep the reporting separate from the data warehouse. I would let the data warehouse be fed from the reporting database (which could be your operational data store (ODS)), which was fed from the operational system.

Larissa Moss’s Answer: I strongly urge AGAINST making the enterprise data warehouse the answer to any and all reporting, especially if that reporting is operational! Data warehouses as well as data marts were invented to address STRATEGIC – mostly trend analysis – decision support needs. Since the multidimensional aspect of data marts, in particular, is a highly desirable design for storing data to generate any type of reporting, the temptation is high to solve all reporting needs from such databases. On the other hand, an enterprise data warehouse by definition is one database designed to solve all STRATEGIC – mostly ad hoc – decision support needs. In contrast to having many data marts, there is only one enterprise data warehouse, as its name implies. If there is only one database for all users and their strategic analysis needs, this database will most likely not be multidimensional but relational (two dimensional). Otherwise it would not be able to satisfy all users and all queries, especially those ad hoc detailed queries, which have no predefined patterns for aggregation and derivation by dimensions, which is a prerequisite for multidimensional database design. Since the enterprise data warehouse is one [logical] database for all users, all the data used by all users must be integrated. As you point out in your question, the data models and infrastructures have to be consolidated. That means that data that has been used inconsistently by many users is often renamed, regrouped; its contents is often standardized and changed; and all of these changes affect all of the users and their reports. Again, the impact of the slightest change would affect all users and all reports. If these reports are operational in nature and a request for a change is submitted, such as adding a new field, it would take much longer to implement it on the enterprise data warehouse than on a data mart or on an operational system because the enterprise data warehouse is a much larger "system" in terms of data scope. For example: a large federal savings and loan association received a federal requirement to include a new field on the HMDA report. Domain analysis of the field revealed that the data values (content) of the new field described different objects. For example A, B, C described a loan type, L, K, M, N, O described a borrower, and X, Y, Z described a loan application form. That meant that the requested field had to be mapped into three different tables, namely Loan, Customer and Application. Populating the three columns on the three tables touched many ETL program modules early in the extraction process, which meant that all subsequent modules had to be retested. The entire process of including one extra field on the enterprise data warehouse for HMDA reporting took three months from analysis to testing. Had that report been run from the operational Loan Application system (feeder system) it would have taken only three days to add a field to the flat file, change the few affected programs and test/regression test the programs. Data warehouses were never meant to be a "catch-all reporting system." They were meant to provide business intelligence, which was lost or is difficult/impossible to extract from operational systems, such as data relationships across data from operational systems, cleansed data values, "lost" business rules, etc.

Clay Rehm’s Answer: Are there best practices for data warehousing? Absolutely! Unfortunately they are not in one place and, depending on whom you talk to, you may find many conflicting best practices. Additionally, I am sure you will get many different points of view and reasons why you should or should not include reporting in the enterprise data warehouse. Before you can make the correct decision, ask yourself why are you building the enterprise data warehouse? No really, why? Is this management’s flavor of the month or will it solve a specific business problem? If this EDW is going to solve real problems, does anyone really care if the reporting is coming from the EDW or a data mart or some custom built DSS or reporting database? The point is – can you build an easy-to-use database that is integrated with other data in the company so it is useful to the people who need it, and when they need it? Most users will not care where the data is coming from as long as it is accurate, easy to understand and easy to access. To summarize – minimize the number of places the users need to go to find the data they need. If this makes more work for the IT staff, than so be it!

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access