Q: How is an information warehouse project different than a typical project for a transaction system. Additionally, how would the required skill set for an information warehouse architect differ from an enterprise data architect?
Larissa Moss' Answer:
A typical project for a transaction system has well defined requirements, a specific set of users, requires traditional skills and is usually delivered on known and proven technology platforms. Therefore, it can be organized and managed with a traditional waterfall methodology (although in my opinion all projects should convert to agile methodologies). On data warehouse projects, the requirements are usually a "moving target," the scope is usually too large for the deadline, data integration and data quality prove more difficult than expected, the staff often lacks business intelligence (BI) skills, communication between staff members takes too long, the tools do not live up to their expectations, and the roles and responsibilities assigned in a traditional way seem to result in too much rework. Therefore, a traditional methodology and a traditional project management approach does not work for controlling data warehouse (DW) project activities. An agile and spiral methodology is needed that is based on "extreme scoping," i.e., delivering an application in a series of short releases while "refactoring" the application deliverables (similar to the XP approach). This type of methodology requires a different project organization, such as a small self-organizing core team, shifted and shared roles and responsibilities, direct end-user involvement, etc. (Reference: Business Intelligence Roadmap by Larissa Moss and Shaku Atre, Addison Wesley, 2003).
Now to the second question. The term "architect" means different things to different people. Since the late 70s/early 80s, when data modelers started to use that term, data architects, especially enterprise data architects, were data administrators (DA, also known as IRM, DRM, EIM) who were formally trained in entity-relationship modeling, normalization, and data administration (DA) disciplines (taxonomy, generalization and specialization, abstraction, semantics, data lineage, naming standards, standards for business rules and data definitions, etc.). In companies that still adhere to the DA principles and DA disciplines (Reference: DAMA International, www.dama.org and books by Michael Brackett) or enterprise architecture (EA) or the Zachman Framework (Reference: John Zachman, www.zifa.com), enterprise data architects model the data from a business perspective (not a database design perspective). The term data warehouse architect (and sometimes the term "data architect," without the "warehouse" to make it really confusing) is used by database administrators or database designers, who create the schemas for DW databases. These schemas can be more or less denormalized "relational" structures (based on the original business view of an entity-relationship model) or, in most cases, are multidimensional structures, such as star schemas or snowflakes. In either case, these physical data models (aka logical database design schemas) are representations of database structures and not highly normalized business models.
Clay Rehm's Answer:
The end product of a typical transaction system is either a system of computer screens or batch reports (or both). The end product of a typical data warehouse is data! This means the users of a data warehouse must be much more involved with the end product including how they will write queries now and in the future. In a way, the end product of a data warehouse is much harder to visualize for most business people, and extra care is needed to educate people on data and concepts instead of how to navigate through a series of screens.
This means a DW project relies on people skills much more so. This includes most of the development team including the DBA. Where a DBA on a typical transaction project may never see a business user, on a DW project they will. For example, the data design team (data modelers and DBAs) have to be flexible to allow designs that may violate typical normalization forms. They need to understand from a business perspective how to load, store and retrieve the data in the easiest way possible.
The development team has to understand the business and business rules much more since the end users will need help in writing queries against the data warehouse. Not only will they need help, most likely the development team will be writing the initial queries and reports.
Anne Marie Smith's Answer:
Transaction processing systems are designed to capture business transactions, and the data relevant to them. The TP system is concerned with adding, updating, and deleting data, the same data, millions of times. Their primary function can be thought of as getting transaction data into the information system quickly and accurately for clerical usage. The purpose of the data warehouse is to provide information for managerial decision making that is distinct from, and does not interfere with the performance requirements of the OLTP systems. This is usually done by re-engineering transaction system data around business subjects (for example Inventory), and making it easily available to the managerial community in a separate database. The role of an architect in a data warehouse project requires that they understand the architectures of the source systems, the needs for extraction, transformation and loading of relevant data from the sources to the target, the architectures of a data warehouse that would support the business' needs for analytical data, and whether further architectures are needed for this particular data warehouse (operational data stores, data marts, etc.). Therefore, the skill sets for an information warehouse (or data warehouse) architect would include the skills for a traditional data architect with an understanding of the enterprise's data delivery needs. Many organizations use an enterprise data architect as their data warehouse architect, since an EDA has a higher level of skills and knowledge than a traditional data architect.
Adrienne Tannenbaum's Answer:
Transaction systems are more concerned with the support of a process or processes, and the data is an "also ran." Information warehouses on the other hand are more concerned with the data that results from these processes, with their creation processes being "also rans" to some degree.
When information is being organized for a transaction system, its format and database design are typically set up to efficiently result from or go into the transaction system, the transactions determine what is required. When information is organized in a warehouse, it is organized to be a fuel for reporting and analysis. Some of the warehouse data is already summarized, some represents a different perspective than that of its original source. Any warehouse-related "processing" of information data happens during the data's travel into the warehouse and is targeted toward the information's usage in combination with another set or sets of related information.
There are some similarities in the skills required to design and develop both types of information stores (requirements gathering, systems analysis, logical and physical data modeling, systems development). What varies between the two worlds is "perspective." Requirements gathering for a warehouse focuses almost entirely on the reporting needs of the target user community. Database design for a warehouse is dimensional. Data warehouse architectures include reporting and BI tools ... hope this helps.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access