What is the relationship between the data warehouse and data marts?


Les Barbusinski’s Answer: The basic difference between a data warehouse and a data mart is that the former is usually designed with an enterprise perspective, while the latter is usually created with a built-in organizational or functional bias (i.e., it is designed to generate a particular set of metrics from a specific business perspective). Data marts frequently obtain all, or most, of their data from a data warehouse.

Another difference is that a data warehouse consists of many different types of data structures (staging, ODS, extracts, etc.), while a data mart typically consists of a single data structure (i.e., a star schema, snowflake schema or hypercube).

Doug Hackney’s Answer: Check out the resource library at www.egltd.com, look in the PowerPoint presentations for DW 101 and Federated BI Architectures. The basic concepts are explained there.

Mike Jennings’ Answer: The data warehouse is used as a back-end data store that allow data marts or cubes to be redesigned or replaced to meet changing business requirements or focus. Data marts and/or cubes can be completely regenerated from the detailed level information contained in the data warehouse. The data marts or cubes can focus on optimizing performance (limiting data content, aggregation, deriving data, categorizing data, database fragmentation) for end-user analysis without having to be coupled to back-end ETL processing requirements

Chuck Kelley’s Answer: My view is that the data warehouse is a logical concept that houses the atomic data (and some aggregated/summarized data) for strategic analysis. Data marts are fed from the data warehouse with a subset (and aggregated/summarized) of the data warehouse data for performance and getting the data closer to the user.

Joe Oates’ Answer: A data warehouse is generally taken to mean an analytic repository fr an entire enterprise or major portion of an enterprise that contains transaction level data. A data mart is usually for a single department or even a group within a department.

There are two major architectural approaches being used today. The first is to have a data warehouse built on highly normalized (usually third normal form) data structures. Due to the number of joins required to produce analytic reports, this type of data warehouse is generally not used to produce analytical reports. Instead, physically separate data marts are built, usually designed around star schemas, to facilitate both batch and ad hoc queries.

A physically separate data mart runs in a different computer than the data warehouse. In this case the data mart contains summarized information based on the transaction-level data in the data warehouse. It may also contain some transaction-level data. An ETL process is required to build the physically separate data warehouse. Two of the most common complaints that I have heard from organizations that take this approach are: 1) In order to accommodate all of the various departments and groups, there must be many physically separate data marts, each running on a separate computer or partition of a computer along with the staffing and cost to keep them running; 2) Overlapping data between the data marts and differences in loading schedules for the data marts, different departments/groups get different answers to the same queries.

The second architectural approach is to have transaction-level data as well as summarized data in a central data warehouse built on a multidimensional design. Instead of physically separate data marts, virtual data marts are implemented by the type and number of presentation tool screens that a particular user is authorized to see or which tables power users are allowed to access. Aggregations can be built nightly by stored procedures. Of course when a full enterprise or significant portion of enterprise data is loaded in the warehouse, a robust computer is a requirement.

Clay Rehm’s Answer: In its simplest form, a data mart is a subset of data extracted from the data warehouse. However, there are many cases where organizations build only data marts without the data warehouse. Just because someone has built a data mart does not mean it truly is a subset of an existing data warehouse, or that it is really a data mart!

Scott Howard’s Answer: Data marts are two-tier warehouses, consisting of operational source systems and the data mart that can quickly be tailored to individual user needs. Vendors often offer data mart suites as integrated, packaged offerings that include data access middleware for extracting data from heterogeneous data sources, a warehouse RDBMS for the consolidation of the extracted volumes and a query tool to turn that data into information. Most also provide a GUI control point from which an administrator or technical business analyst can define and manage data source extractions, transformations and enrichments.

Care must be taken with data mart solutions because, if improperly controlled, they can create additional islands of information. Once established, these independent data marts can contradict the data warehouse as a consolidated single truth. Many independent data marts, lacking a central consolidation and reconciliation point, can result in numerous inconsistent,, and unreliable data ghettos.

Don’t misinterpret my opinion of data marts. They are the best choice for initial warehouse projects and also very appropriate for prototyping enterprise level solutions. They are also ideal departmental solutions when sourced from the multi-tier enterprise data warehouse that follows.

The enterprise data warehouse model is a three-tier model that includes your data sources, a single central data warehouse, which you refereed to as a data warehouse, and one or more departmental data marts. The central data warehouse is the heart of this model. It is the point at which the data model and process models merge, transforming raw transaction data into useful information. If all data marts are sourced from this consolidation point, you insure that they all receive the same integrated, time consistent, cleansed and reconciled data. This central data warehouse usually consists of normalized data structures or tables and is the point at which the temporal element is first introduced to transactional data. Data here is still generic and must be of global interest as all of the very specific user departmental data marts are sourced from it.

In the enterprise data warehouse model, data will move from the central data warehouse to the individual data marts where it is optimized for the needs of specific users and the tools that will be used for analysis.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access