Please explain the architecture of a data warehouse or direct me to books with figures so that I can visualize it.
Doug Hackney’s Answer: Check out the resource library at www.egltd.com, look in the PowerPoint presentations for DW 101 and Federated BI Architectures. There is a complete visualization of the basic concepts there.
Mike Jennings’ Answer: The Data Warehouse Lifecycle Toolkit by Ralph Kimball, Laura Reeves, Margy Ross and Warren Thornthwaite; Building the Data Warehouse by Bill Inmon.
Chuck Kelley’s Answer: Good books that you should read are 1. The Corporate Information Factory (Bill Inmon, Claudia Imhoff, Ryan Sousa); 2. The Data Warehouse Lifecycle Toolkit (Kimball, Reeves, Ross, Thornthwaite); 3. Building the Data Warehouse (Bill Inmon).
Joe Oates’ Answer: You can consult Ralph Kimball’s Data Warehouse Lifecycle Toolkit or one of the several books that Bill Inmon has written.
Les Barbusinski’s Answer: Whoa! That’s a lot to ask for in a small column like this one, but here goes:
A typical data warehouse consists of the following components: a data acquisition subsystem, a storage management subsystem, an operational reporting subsystem, an analytics engine, an information delivery/distribution subsystem, meta data and infrastructure (see Figure 1).
The data acquisition subsystem is responsible for bringing new data into the data warehouse. It uses extract, transformation, load (ETL) tools such as Informatica, Ascential DataStage, Sagent, Data Junction and others to:
The resulting data is usually stored in two different data stores: a historical data store (which holds master, historical and transactional data in a normalized format) and an operational data store (which stores a selected subset of data in a denormalized format). The latter is used by the operational reporting subsystem to generate lists, transactional and exception reports using such tools as Crystal Reports and Actuate.
The analytics engine then puts this derived data through a variety of tools to generate analytical reports, alerts, statistical analyses, forecasts, classifications and segmentations, business models, rule and simulations. These tools include:
The storage management subsystem is responsible for controlling the storage and propagation of data within the data warehouse. This includes such functions as security administration, backup and recovery, archiving, replication, gateway administration, checking data integrity, etc.
This can involve a variety of "channels" including dedicated Web sites, Web portals (via a Web services API), e-mail distribution, message brokers (e.g., utilizing MOM or EAI technologies), wireless transmissions (e.g., alert messages and/or e-mails sent to PDAs, cell phones, pagers, etc.), embedded analytics in BI or neural net applications (e.g., fraud detection), etc.
Infrastructure refers to all those internal functions that make the data warehouse run smoothly, but are not directly involved with maintaining the data. This includes such things as job scheduling, workflow management, directory management, file versioning, event tracking, etc.
Scott Howard’s Answer: Two of my favorites for the DW beginner: Building the Data Warehouse for Decision Support by Vidette Poe which introduces how you may benefit from a DW and Barry Devlin's Data Warehouse, from Architecture to Implementation which is currently used in many universities' introduction to DW syllabi. Both contain many figures and a detailed justification for their recommended approaches.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access