Before planning a data warehouse or data mart, here are five things you should check off:
*A business sponsor--an IT customer who wants better information in a better form and who will provide experts to help you build a DW/DM.
*If your sponsor wants a DW/DM because of production problems, solve those problems where they exist. Warehouses can fail when they are conceived as workarounds for deficiencies in the production system.
*A business case for the DW/DM--it shouldn't be funded by IT. Enthusiastic use by business people for good business reasons is the only justifiable metric.
*A good sense of the types of data your sponsor needs: type of data is what's important, not appearance. With good source data, you can format and deliver it anywhere. But you can't mine for gold in a sandbox and find anything but sand, no matter how good your mining tools are.
*An IS/IT commitment to permanently operate a different animal--a warehouse. It's not a production application. It has different users, operations and help requirements. Most importantly, it has different growth characteristics--double or triple the number of users and usage within the first year is common.
Okay, you've checked all the "Before I Plan" boxes. Now what? Here are some planning tips from hard-earned experience:
1. Concentrate on locating sound sources of "rich business data." Sixty percent of the work to build a DW/DM (and 90 percent of the success) comes from acquiring quantities of "rich business data." It holds important facts and measures about things business people understand (sales, orders, contracts, dates, diagnoses, territories, ZIP codes, prices, temperatures, outcomes and hours billed). It (usually) should be data that's more interesting when seen in the context of other similar data (not as individual records) unless you're building an ODS. "Sound" means the source is reliable and won't disappear in a year.
2. The "cleanest data you've ever seen"--as promised by the user--isn't. Plan on it. It isn't. But that's okay, because you'll make it useful because it's rich business data. Plan to work hard at understanding source data--it's the most important part of the project. Loading a database, applying OLAP tools and plugging into a network isn't the hard part or even the interesting part. Understanding source data means modeling it, analyzing it, defining every attribute and dealing with its quality and integrity.
3. Plan to take lots of time to make the source data "accurate." Accurate, not necessarily clean or scrubbed. For example, accuracy means there is a correct customer number for every sales record in the DW/DM and that it's found in the customer number table when a new sales record is added. Inaccurate data in a warehouse is like a virus, but worse. The business value of a warehouse is as a valid reflection of business history. Inaccurate or anomalous data starts the seeds of doubt in the user's mind, and down the slippery slope you go. Be a fanatic about accurate data, accurately recorded.
4. You can't "correct" data that's "dirty" by changing it, but you can't reject it either. Both these actions change the integrity of the body of data you started with, and that's the first, second and last rule of warehousing: the builder can't mess with the data integrity--not any single record and not the collection of records as a whole. For example, if you always drop records that have incorrect spellings of states, you'll distort the proportions of Mississippis and Ohios.
5. Only business experts can determine what business data means, especially data that's several years old. They'll have to work with data analysts to carefully work out accurate history. They'll say things like "Until January, 1993, the NE sales territory included New York, New Jersey and Maryland; after that it included only New York and New Jersey, but we continued to use the same NE name. On January 1, 1993, Maryland joined the SE territory." These anomalies make data analysis interesting, but you must be a data archeologist to get it right. Don't be in a hurry where data quality is concerned. Also, plan to have several business experts work on the accuracy of past data, because they might not always agree. Always verify anyone's opinion with several sources.
Next month--Learn the business rules of data warehousing.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access