FEB 4, 2004 1:00am ET

Related Links

Which CDC method is the best to achieve staging database with changed data?
March 7, 2008
Apart from bloated dimension, what are the negatives of using all known attributes in your SCD?
March 7, 2008
When is it better to have normalized data to create data marts and when is it better to have dimensional data?
March 7, 2008

Web Seminars

Data Replication for Real-time (Big) Data Warehousing
Available On Demand
Improving your Overall Analytical Environment by Migrating to a New Data Warehouse Platform
Available On Demand
The Dynamic Duo of Data Warehousing and Real-Time Streams
Available On Demand

I am a DBA, new to data warehouse concepts and would appreciate if you could reply to my questions concerning issues of architecture as I prepare to implement a one terabyte warehouse.

Print
Reprints
Email

Q:

I am a DBA, new to data warehouse concepts and would appreciate if you could reply to my questions concerning issues of architecture as I prepare to implement a one terabyte warehouse.

  1. What are some fundamental OLAP architecture design considerations?
  2. What is the difference between OLTP (operational) and DSS (decision support)?
  3. What are the best processes to use in a DW project?
  4. What are the architecture changes needed for DW compared to those in an OLTP environment?
  5. What is data quality and how do I address the issues associated with data quality (or lack thereof)?

A:

Sid Adelman's Answer: For someone new to the data warehouse there is a wealth of information on past questions and answers in the Question Archive section of Ask the Experts on DMReview.com. There are 23 on OLAP, 12 on project management, 24 on data quality, and 122 on DW design and methodology.

Clay Rehm's Answer:

  1. Fundamental OLAP architecture design focuses on used, duplicated and denormalized data. Let me explain - used data is stored in the data warehouse. This means that there is data in the data warehouse that is already somewhere else in some operational system. The data warehouse captures, stores and reformats data that has come from somewhere else. If you have data that is in the data warehouse that is not in an operational system (other than derived values), then there is a problem with your processes. Data is duplicated not only from the operational systems but also with the warehouse - the same data can be duplicated in many different types of detail or summary tables depending on how your audience needs to see the data. Data warehouses are meant for non-technical users and as such, the data needs to be in formats and structures that are easy for them to use.
  2. The difference between OLTP and DSS is the same difference between OLTP and DW. Data warehousing is a fairly new term for a concept that started many years ago. In the early days of data management, terms such as reporting databases and DSS surfaced to provide a repository of data and processes that focused on user's query and reporting needs. Added to that is the term Business Intelligence (BI) which in my opinion provides a bigger picture - data and processes.
  3. The best processes to use in a data warehouse? There are far too many to list here and I could recommend some books and methodologies. I always recommend a project management methodology (PMM) and the Project Management Institute (www.pmi.org) can help you out there. In addition, you will need a systems development methodology (SDM) that is tailored to the specifics of data warehousing. The books published by Ralph Kimball are good, as well as the book Data Warehouse Project Management by Adelman and Moss.
  4. If by architecture you mean hardware/network requirements, then your best bet is to have your own dedicated servers and high-speed networks. If your question pertains to how the data is structured, then you will have a mixture of fact, dimension and summary tables. Fact tables are normalized like in the OLTP world, but dimensions and summary tables are highly denormalized. In addition, your star schemas (facts and dimensions joined together) will usually be presented to the user as one table or view.
  5. Data quality deals with processes to keep your data of high quality. In addition there are levels of quality, and tolerance levels of what your users will accept. For example, say you are capturing customer addresses. On a very simple level, users should not be allowed to key in their own codes for States. Having a decode table (and drop down list) should provide a standard list of States the user would choose from. However this does not prevent the user from choosing a city that does not belong in that state. Your operational applications can be built to include many edits to prevent common sense errors, but there also needs to be business processes in place to prevent the entry of non-quality (bad) data. Whatever is keyed into the operational systems will be carried over to your data warehouse, and in many cases this bad data really becomes apparent. You then need to fix the source systems and/or fix the extract processes that load the data warehouse.

Les Barbusinski's Answer: You are doomed. Entire volumes have been written on the subjects that you mention, and you probably won't have time enough to prepare yourself. DBAs usually cut their teeth on small departmental data marts before graduating to terabyte data warehouse implementations.

Nevertheless, here's a partial list of essential reading that will give you an understanding of the critical issues in data warehousing with regards to data architecture, data quality, ETL processes, OLAP concepts, meta data concepts and project methodology:

  • Building the Data Warehouse by Bill Inmon
  • Corporate Information Factory by Bill Inmon, Claudia Imhoff, and Ryan Sousa
  • The Data Warehouse Toolkit by Ralph Kimball
  • The Data Warehouse Lifecycle Toolkit by Ralph Kimball
  • Data Warehouse: From Architecture to Implementation by Barry Devlin
  • Improving Data Warehouse and Business Information Quality by Larry English
  • Data Quality for the Information Age by Thomas Redman
  • Building and Managing the Metadata Repository by David Marco
  • Data Warehouse Project Management by Sid Adelman and Larissa Moss
Filed under:

Advertisement

Comments (0)

Be the first to comment on this post using the section below.

Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.
Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.