How many of you have a clear understanding of the difference between a data warehouse and a data mart? Come on, raise your hand. Hmm, just as I thought. It seems that few of us do. Without this understanding, we're less likely to use the right architecture for the right job. Hopefully, this article will clear things up a bit. Data warehouses have been around longer, and I think most folks understand them better. They're big buckets of generic, detailed, enterprise-wide data, and they're not set up for a particular purpose or department. In addition, they are viewed as the source of data for data marts.
My analogy of a data warehouse is that it's like a big box of Legos those small plastic building blocks for children. You can build just about anything from their assortment of generic, standardized shapes, such as a robot, a house or an airplane. You are only limited by your imagination and the number of blocks at your disposal. Data warehouses work the same way. You can build just about any data mart you want from a data warehouse's generic pieces of data, and you're only limited by your technology and the data that you can acquire from your operational systems.
But what constitutes a mart? A data mart is based on a set of user requirements. Using my Lego analogy, I could build things based on known requirements. The house, robot and airplane all had architectural stipulations that I followed to construct them. The same can be said for the data mart. Each mart is meant to solve a particular business problem. For example, if the corporation cannot determine its products' profitability, then a data mart may be built to analyze product costs and revenues (product profitability mart). Perhaps we build a mart to analyze sales channels, or one to determine the effectiveness of our sales campaigns or another to study the demographics of our customers. Each reflects a distinct business problem requiring distinct sets of data and having distinct queries run against it. "Focus" is perhaps the most important characteristic of a data mart.
Size does not determine a data mart. Data marts come in all sizes from a few megabytes of data to some real whoppers. For example, your first mart may contain limited history and answer only a few simple questions. Therefore, it may be a relatively small database. On the other hand, I know of some customer demographic data marts that are several hundred gigabytes in size. In all cases though, the data marts are still a subset of the data contained in the data warehouse and so are generally smaller in size than the data warehouse.
A data mart can be collocated with the data warehouse. In order to classify as a data mart, many people think that it must be physically separated from the warehouse not so! You can build a set of star schemas within the data warehouse environment for a particular business purpose. The functionality you just implemented in these schemas is the key determination of whether it is a mart or not. You may want to prepare for the eventual migration of these schemas into their own world, but there is nothing to stop you from testing out your mart wings within the warehouse world before shipping it out to another site.
No single technology or design technique is indicative of a data mart. Data marts come in all sorts of technological flavors. Five years ago, we had limited technology available with to build a data mart mostly relational. Today, we have a plethora of technologies supporting different types of data marts relational (still the most popular), multidimensional databases, token-based databases, proprietary, specialized databases, etc. Each of these technologies is perfectly suitable for particular types of data marts.
We also have many different design techniques at our disposal. Again the most popular for data marts has been the star or snowflake schema. But there are other possibilities for design techniques. For example, you may choose a statistical subset of data with a more normalized design (very useful for data mining), flat file extracts of the warehouse or perhaps your mart is a MOLAP technology and you simply have hierarchies in your design.
Each of these technologies and design techniques has its place in the data mart world and supports different types of analyses slice and dice, exploration, simple queries, data mining, etc. It is important to remember that the user requirements should dictate what type of mart you are building. As always, the technology and design techniques should be based on these requirements not the other way around.
I hope this has helped you to understand some of the characteristics and options available for constructing your data marts. To recap:
- A data mart is based on a set of user requirements to solve a business problem.
- Data marts come is all sorts of sizes.
- Data marts may or may not be physically separated from the warehouse.
- Data marts may be built with a number of technologies and design techniques.
The data marts are the applications that our end users access to solve their business problems. The data warehouse serves as the source of data for these applications, ensuring the consistency, integration and accuracy of the data. One structure cannot act to replace the other; therefore, both a warehouse and data marts are needed in a robust, sustainable decision support environment.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access