Editor's note: We are please to welcome David Ray Fuller as an online columnist for dataWarehouse.com. His column entitled "Fundamentals of Data Warehousing" will appear monthly at www.dataWarehouse.com/iKnowledge .
You've just been assigned to build a data warehouse. The term is not totally unfamiliar, as you've heard it at user group meetings and seminars, in idle conversations around the shop, to say nothing of the articles on the topic in the trade press or on various Internet sites in recent years. Just the same, you're not sure what one is, and how to go about actually building such a thing is as foreign as the Antarctic ice fields.
You are an experienced project manager, so you know you need a team to complete your assignment, and you have a general idea of what kind of folks belong on the team. The questions you need answered, then, relate to the details. How many people will you need? What skill sets are required? What role or roles will each person play? This article will attempt to provide some information to assist in the definition of roles and responsibilities for a data warehousing/ business intelligence project team.
Evalute Scope of Project
The first step is to evaluate the size and scope of the project, the goals to be achieved, level of management support and commitment, and other related factors. If the project involves creation of an enterprise data warehouse (EDW) with analytical and business intelligence tools, possibly data mining capabilities, and so on, you are looking at a multiyear, multimillion dollar project with potentially several teams and a large staff. (See Figure 1.) Development of a departmental data mart is likely to be a much smaller effort and will usually be completed in a much shorter time frame than an EDW. Also, smaller organizations with fewer transaction processing applications and less need for historical information are typically going to have smaller data warehouse projects.
While the number of people involved in the development of a data warehouse varies rather significantly depending on the size of the firm and the size and complexity of the project, many or all of the roles discussed in this article will need to be performed by someone during the course of the project. The number of people staffing the project will be dependent on the resources and budget, of course. What is the minimum required? Bill Inmon has said four. I'm not sure the answer is that cut and dried. I can see the potential for a one- or two-person project in a smaller organization. Using some of the newer toolsets, especially where the entire toolset comes from a single vendor, it is possible to build a complete data warehouse application with a very small team. (See Figure 2.) This is probably more likely to work on a data mart project than an EDW.
Such a small team is not likely to be the norm, or anything approaching the norm, in the near future. For the most part, data warehouses require a lot of planning, consume significant resources and are not going to be built by anything less than a fairly good-sized team that includes a broad set of skills. In many cases, a team of four members is probably going to produce an overworked and overstressed group, one prone to mistakes and errors in the development stages.
The roles and responsibilities identified in Parts 1 and 2 relate to development of a new data warehouse or data mart, not the support of an existing warehouse or mart which is the subject of Part 3. Also, this is not intended to be a definitive list, but rather a guide to the basic roles you might expect to see. Some roles might be combined or other roles might be needed which aren't defined here based on the specifics of your organization and project.
Data Warehouse (DW) Manager: (The actual titles can range from vice president of data warehousing in very large firms with large DW organizations to project manager in smaller companies and/or where the level of commitment to data warehousing is very limited.) This team member is responsible for working with and informing top management of the usage and impact of data warehousing within the organization, identifying and proposing projects which utilize data warehousing and business intelligence to provide support for corporate goals, managing and staffing projects, defining and gaining approval for budgets and schedules, and educating users on DW/BI applications and capabilities.
In many cases the DW manager will coordinate the activities of multiple team leads and analysts as they perform the development for different areas and/or iterations of the EDW and/or dependent data marts. In cases where the company is beginning its initial forays into data warehousing and business intelligence activities, the DW manager will be responsible for the initial requirements definition and selection of the first DW team members.
The DW manager may also work with end users and business applications analysts to develop high-level business process documentation which can be used to define DW/BI project priorities and iterations. As the data warehousing team grows and development continues, the DW manager becomes the liaison with senior management, the coordinator of the efforts of different project teams and primary decision- maker for design issues. In large projects, the DW manager will do little or no actual development work. Rather, this person provides the broad vision and direction, and works with senior management and end users to ensure that the team's efforts will provide the desired results. In smaller teams, this person may also function in development roles.
The DW manager should be familiar with iterative forms of project management that are used in most modern data warehousing projects. This individual should also have some familiarity with the types of tools used in data warehouse development and deployment such as ETL tools, data quality and cleansing tools, query/reporting tools, OLAP, data mining, etc.
The Makeup of the Data Acquisition Team
DW Data Acquisition Manager: The person in this role is involved with obtaining data from the company's transaction processing systems, external systems, etc. These are usually referred to as "sources" or "source systems." The responsibilities include defining data load architecture and infrastructure, systems management standards and requirements, and related activities. This role is usually found in large data warehouse development projects where coordination and leadership of the data acquisition function is needed in addition to overall data warehouse team leadership. This individual may supervise the work of the business analyst(s), data analyst(s), data modeler (s), ETL developers and others involved in the processes required to obtain, cleanse and load data into the warehouse. This person will also often lead the preparation and execution of system, integration and acceptance testing associated with the data acquisition from source systems and loading into the data warehouse.
Business Analyst: The role of the business analyst is to perform research and possess knowledge of existing business applications and processes to assist in identification of potential data sources, business rules being applied to data as it is captured by and moved through the transaction processing applications, etc. Whenever possible, this role should be filled by someone who has extensive prior experience with a broad range of the organization's business applications. Another alternative is using business analysts from the various application areas as sources when those areas are added to the data warehouse. These analysts may only be associated with the DW team for relatively short periods when the applications they are responsible for are addressed in the data warehouse project. In many cases, the business analyst will work with end users and management to develop business process models which may be used in designing the data warehouse.
Data Analyst: This person focuses on evaluation and documentation of the data. This includes gathering and documenting the technical and business meta data about the data as well as providing insight into the data itself. Often this role will be merged with that of the business analyst, the data modeler or ETL analyst in small or mid-sized project teams.
Data Modeler: The person(s) in this role prepares data models for the source systems based on information provided by the business and/or data analysts. Additionally, the data modeler may assist with the development of the data model (s) for the EDW or a data mart guided by the data warehouse architect. This individual may also assist in the development of business process models, etc.
Data Warehouse Architect: These job responsibilities encompass definition of overall data warehouse architectures and standards, definition of data models for the data warehouse and all data marts, evaluation and selection of infrastructure components including hardware, DBMS, networking facilities, ETL (extract, transform and load) software, performing applications design and related tasks.
Data Warehouse Developer: This person in this role develops various functional components of the data warehouse applications including extract programs on source systems, ETL applications, data cleansing functions, system management functions including load automation, data acquisition functions and others. Often the extract functions will be developed by the applications support teams assigned to the various transaction processing applications. The transformation, data cleansing and load functions will usually be developed by other team members who are trained in the use of the ETL and data cleansing tools.
The Makeup of the Infrastructure Team
The roles just discussed are for those team members who are directly involved in the management of the data warehouse development project, the identification and acquisition of data from the firm's transaction processing applications and the loading that data into the data warehouse. In addition to roles played by the data acquisition team, another group of functionaries is needed. The roles of these team members are associated with the implementation and management of the infrastructure which forms the foundation of the data warehouse. These include, but are not limited to:
Data Warehouse Infrastructure Manager: If your project is large enough to require dedicated resources for system administration and database administrators (DBAs), it is possible you will want a person who will provide leadership and direction for these efforts. This would be someone who is familiar with the hardware and software likely to be used, experienced in administration of these areas and who can direct tuning and optimization efforts as warehouse development and use moves forward in the organization. Including the infrastructure team within the large data warehousing group helps ensure that the needed resources are available as needed to ensure that the project stays on track and within budget.
System Administrator: This position is responsible for maintaining hardware reliability, system level security, system level performance monitoring and tuning, and automation of production activities including extract and load functions, repetitively produced queries/reports, etc. The duties include the setup of user IDs and system access roles for each person or group which is given access to the data warehouse or data mart and monitoring the file system for space availability. In many cases, the system administrator is responsible for ensuring that appropriate disaster recovery functions such as system level backups are performed correctly and on an accepted schedule.
Database Administrator: The person in this role is involved in database design, especially the physical design used to implement the data warehouse or data marts. The DBA monitors and tunes DBMS performance, administers end-user access and security at the DBMS level, ensures that DW data is appropriately protected via database backup and related strategies, assists with load process automation, and evaluates and selects infrastructure components. The DBA should be proficient in the technology of both the DBMS and the operating system chosen for the data warehouse or data mart.
Data Administrator: This role is responsible for identifying, documenting and administering the corporate data asset. This includes working with end users and IT staff to gain consensus on standard definition of common data attributes, developing and maintaining the corporate data model, identifying and documenting data sources, maintaining the meta data repository, monitoring and reporting on data quality, and searching out causes of incompatibility between the various applications which collect and utilize the company's data. The data administrator may not be a reporting member of the data warehousing organization but is an integral part of the data identification and analysis needed to build a data warehouse or mart. In many organizations, the recognition of the need for data administration at this level accompanies the recognition of the need or value of a data warehouse. Sometimes the data administrator will start out reporting to the DW manager and later become an independent function within the IT organization.
The second part of the DW project team is a group whose purpose is to get data, or rather, information out of the data warehouse or a data mart. Depending on the design, the access functions may well be applied to a dependent data mart, which was loaded with data, obtained from the enterprise data warehouse and possibly summarized and/or aggregated as it was loaded into the data mart.
The Makeup of the Data Access Team
The data access team may actually be a separate group within a data warehouse development effort or may consist of members of the DW team who function in one or more of these roles while also performing other roles related to data warehouse development. In many environments some of these roles may be assigned to departmental power users.
The data access team will use any of a broad range of query and reporting tools, OLAP tools and other products to provide end users with access to information. This approach may include creating production reports which will be run on a regular basis from the data mart; providing production queries which are parameter-driven to return information from the data mart based on input of selection criteria from the end user; training end users in the use of the OLAP tools; and assisting the end users in development of their own queries.
The data access team will often include the following roles. As with the data acquisition team, this is not intended to be a definitive list, but rather to provide a guide to the basic roles you might expect to see. Some roles might be combined or other roles might be needed which aren't defined here based on the specifics of your organization and project.
Data Access Analyst: This team member performs a variety of tasks which help provide end users with the ability to access and utilize the data stored in the data warehouse or its dependent data marts. This includes evaluating and selecting business intelligence, OLAP and other query/reporting tools, identifying and prioritizing data access projects, analyzing data access requests to determine if and how the request can be met, educating end users on data warehouse capabilities and data availability, establishing data access standards, etc. This role can sometimes be filled by a power user, but most often will be filled by an IT business analyst familiar with the application data who has been involved with the data warehouse design and definition process.
Data Access Developer: The person in this position develops data access queries/programs using BI tools such as Cognos Impromptu, PowerPlay, Query to run in production on a regular basis and assists end users with development of complex ad hoc queries and analyses. This role may be a power user trained in the use of the query/reporting/analysis tool.
Data Access Tool Trainer: The responsibilities of this role include training end users to use data query/analysis/ reporting tools, assisting end users in development of queries and reports, assisting in development of production queries/reports and assisting with evaluation and selection of end-user data access tools. This role is more likely to be placed within the IT data warehouse team. The trainer(s) can then work with end users in many departments which might be utilizing the query/reporting/ analysis tools selected for use by the company's personnel.
Once the data warehouse has been built, a process that can easily take two or three years, and has become relatively stable, it enters a full production state. As with any other application, indeed even more than with most transaction processing applications, there is going to be a need for ongoing support of the warehouse. Changes to the transaction processing applications may have an impact on the data being collected by the data warehouse. New data may be requested which no one ever dreamed would be needed when the initial design work was performed. Infrastructure elements will need periodic upgrades applied. Etc. Etc. Etc.
A support team will need to be assigned. Depending on the organization, size and complexity of the warehouse, the infrastructure elements in place and other factors, this group can range in size from two or three to as many as six or more. A point to remember: in most cases, these folks are not involved in additional new development; they are going to have their hands full keeping up with the care and feeding tasks for the "completed" data warehouse.
The support team will probably include the following roles. As with the development teams, one person may fill more than one role or a role may be filled by more than one person.
DW Support Team Lead: The person in this role provides the coordination of requests from all interested end-user departments, evaluates the nature and cost of a request, assists in setting priorities, gathers groups of requests for development in an iterative manner similar to that used in the original development work and assists team members with problem solving, enhancement and maintenance efforts, etc.
Data Warehouse Analyst: This role involves a broader scope of analysis functions than may have been the case during development. This role will require some level of familiarity with the source systems, the extract processes placed within the transaction processing systems, the content and design of the data warehouse and the load processes and business rules used for data integration and summarization, and the use of the data warehouse by end users or data mart build processes which may have been developed in conjunction with the data warehouse. This person or persons may also be required to have a fundamental knowledge of the data access tools and work with end users to design and develop data access functions for use within the departments or by the company as a whole.
Data Warehouse Developer: The person or persons functioning within this role will need a substantial understanding of the data warehouse design, load function, etc. Potentially the DW developer may also be required to have some knowledge of the tools and programs used to extract data from the source systems and perform maintenance on those applications. Additionally the DE developer may be required to be knowledgeable in the data access tools and perform some data access function development.
The support team may or may not directly include a system administrator or DBA. However, these roles will be called on periodically to perform their own tasks on the data warehouse infrastructure components which are their responsibility. Likewise, the data administrator will probably exist independently of the support team, but will periodically be called on to assist with data analysis tasks, perform an impact analysis related to a proposed change, update the meta data repository with new or updated information about data and business rules, etc.
As previously mentioned, all of the different roles described may not correspond to specific positions within your organization as it is finally defined. One person may fill multiple roles or many people may play one role. For instance, in my current position, I am the project manager, data warehouse architect, infrastructure manager and occasionally an ETL designer and developer. I also have several other team members who are ETL developers, and a couple of these sometimes function as data access developers as well. However, most of the roles I have discussed are present to some extent on our team and probably will have a place on most DW/BI teams. It is possible that other roles will be found which are not included here, but it is more likely that you will find that two or more roles described were combined in some way to create what appears to be a different role.
Inmon, W.H. Building the Data Warehouse. John Wiley & Sons, Inc. 1996.
Kimball, Ralph. Reeves, Laura. Ross, Margy. Thornwaite, Warren. The Data Warehouse Lifecycle Toolkit: Tools and Techniques for Designing, Developing and Deploying Data Warehouses.John Wiley & Sons, Inc. 1998.
Humphries, Mark. Hawkins, Michael W. and Dy, Michelle C. Data Warehousing: Archtiecture and Implementation. Prentice Hall. 1999.
The Data Warehousing Institute. "2001 Data Warehousing Salary, Roles and Responsibilities Report." Spring 2001.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access