I am a DBA, new to data warehouse concepts and would appreciate if you could reply to my questions concerning issues of architecture as I prepare to implement a one terabyte warehouse.
Sid Adelman's Answer: For someone new to the data warehouse there is a wealth of information on past questions and answers in the Question Archive section of Ask the Experts on DMReview.com. There are 23 on OLAP, 12 on project management, 24 on data quality, and 122 on DW design and methodology.
Clay Rehm's Answer:
Les Barbusinski's Answer: You are doomed. Entire volumes have been written on the subjects that you mention, and you probably won't have time enough to prepare yourself. DBAs usually cut their teeth on small departmental data marts before graduating to terabyte data warehouse implementations.
Nevertheless, here's a partial list of essential reading that will give you an understanding of the critical issues in data warehousing with regards to data architecture, data quality, ETL processes, OLAP concepts, meta data concepts and project methodology:
The OLTP model is one that is intended to capture and efficiently manage the current state of your business. Short term transactions, current inventory, monitoring current manufacturing processes and the like are the focus of most OLTP applications and systems. OLAP systems represent history and need to function in a way contrary to OLTP systems. That is they need to capture everything that goes on within our business including the net business result of an OLTP transactional update or delete and represent and preserve that net meaning in a historical model. They also need to combine that with external events like promotions, special external events such as holidays, weather, manufacturing floor conditions, economic conditions, so business analysts can make sense of the changes in our business captured from the OLTP models. These external events are also not generally represented in the OLTP models. Now we don't have room or time today to expand on how to do just that, but that's what OLAP or data warehouse modeling is all about and why it differs so from OLTP models and architecture.
Let's now consider the operational differences between OLTP and OLAP systems. OLTP systems usually have a high constant transaction rate usually consisting of very simple read/write transactions. Systems administrators thus tune these systems to take most advantage of the resources available at constant rates, thus drive CPU and I/O usage as close to 100 percent as possible. This is in contrast to OLAP systems use which is inconsistent, primarily long running and complex read-only transactions. This pattern leads to peaks and valleys in resource usage that when combined with OLTP usage can cause usage spikes well over resource capacity. These spikes can result in service level violations for your OLTP system. Again this illustrates the need for very different system and data models.
Now let's consider data quality. Data quality relates to the reliability of source data regarding both content accuracy and business integrity. Addressing the issues is a simple task, recognize and fix as many anomalies as your can. Well the charge is simple; accomplishing it is a challenge that needs to be addressed individually model by model. I have a feeling that my Ask the Experts associates will expand on this.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access