Free Site RegistrationFree Site Registration

Sign up today and access Information Management on the web!
Your FREE registration entitles you to:

FREE email newsletters

FREE access to all Information Management content

FREE access to web seminars, resource portals, our white paper library and more!

I am a DBA, new to data warehouse concepts and would appreciate if you could reply to my questions concerning issues of architecture as I prepare to implement a one terabyte warehouse.

Information Management Online, February 4, 2004

Sid Adelman, Clay Rehm, Les Barbusinski, Scott Howard

Q:

I am a DBA, new to data warehouse concepts and would appreciate if you could reply to my questions concerning issues of architecture as I prepare to implement a one terabyte warehouse.

  1. What are some fundamental OLAP architecture design considerations?
  2. What is the difference between OLTP (operational) and DSS (decision support)?
  3. What are the best processes to use in a DW project?
  4. What are the architecture changes needed for DW compared to those in an OLTP environment?
  5. What is data quality and how do I address the issues associated with data quality (or lack thereof)?

A:

Advertisement

Sid Adelman's Answer: For someone new to the data warehouse there is a wealth of information on past questions and answers in the Question Archive section of Ask the Experts on DMReview.com. There are 23 on OLAP, 12 on project management, 24 on data quality, and 122 on DW design and methodology.

Clay Rehm's Answer:

  1. Fundamental OLAP architecture design focuses on used, duplicated and denormalized data. Let me explain - used data is stored in the data warehouse. This means that there is data in the data warehouse that is already somewhere else in some operational system. The data warehouse captures, stores and reformats data that has come from somewhere else. If you have data that is in the data warehouse that is not in an operational system (other than derived values), then there is a problem with your processes. Data is duplicated not only from the operational systems but also with the warehouse - the same data can be duplicated in many different types of detail or summary tables depending on how your audience needs to see the data. Data warehouses are meant for non-technical users and as such, the data needs to be in formats and structures that are easy for them to use.
  2. The difference between OLTP and DSS is the same difference between OLTP and DW. Data warehousing is a fairly new term for a concept that started many years ago. In the early days of data management, terms such as reporting databases and DSS surfaced to provide a repository of data and processes that focused on user's query and reporting needs. Added to that is the term Business Intelligence (BI) which in my opinion provides a bigger picture - data and processes.
  3. The best processes to use in a data warehouse? There are far too many to list here and I could recommend some books and methodologies. I always recommend a project management methodology (PMM) and the Project Management Institute (www.pmi.org) can help you out there. In addition, you will need a systems development methodology (SDM) that is tailored to the specifics of data warehousing. The books published by Ralph Kimball are good, as well as the book Data Warehouse Project Management by Adelman and Moss.
  4. If by architecture you mean hardware/network requirements, then your best bet is to have your own dedicated servers and high-speed networks. If your question pertains to how the data is structured, then you will have a mixture of fact, dimension and summary tables. Fact tables are normalized like in the OLTP world, but dimensions and summary tables are highly denormalized. In addition, your star schemas (facts and dimensions joined together) will usually be presented to the user as one table or view.
  5. Data quality deals with processes to keep your data of high quality. In addition there are levels of quality, and tolerance levels of what your users will accept. For example, say you are capturing customer addresses. On a very simple level, users should not be allowed to key in their own codes for States. Having a decode table (and drop down list) should provide a standard list of States the user would choose from. However this does not prevent the user from choosing a city that does not belong in that state. Your operational applications can be built to include many edits to prevent common sense errors, but there also needs to be business processes in place to prevent the entry of non-quality (bad) data. Whatever is keyed into the operational systems will be carried over to your data warehouse, and in many cases this bad data really becomes apparent. You then need to fix the source systems and/or fix the extract processes that load the data warehouse.

Les Barbusinski's Answer: You are doomed. Entire volumes have been written on the subjects that you mention, and you probably won't have time enough to prepare yourself. DBAs usually cut their teeth on small departmental data marts before graduating to terabyte data warehouse implementations.

Nevertheless, here's a partial list of essential reading that will give you an understanding of the critical issues in data warehousing with regards to data architecture, data quality, ETL processes, OLAP concepts, meta data concepts and project methodology:

  • Building the Data Warehouse by Bill Inmon
  • Corporate Information Factory by Bill Inmon, Claudia Imhoff, and Ryan Sousa
  • The Data Warehouse Toolkit by Ralph Kimball
  • The Data Warehouse Lifecycle Toolkit by Ralph Kimball
  • Data Warehouse: From Architecture to Implementation by Barry Devlin
  • Improving Data Warehouse and Business Information Quality by Larry English
  • Data Quality for the Information Age by Thomas Redman
  • Building and Managing the Metadata Repository by David Marco
  • Data Warehouse Project Management by Sid Adelman and Larissa Moss

Page 1 of 2.

Advertisement

Advertisement