Continue in 2 seconds

Can you point me to some resources that would help me understand what needs to be considered when setting up a development factory for BI applications?

Published
  • August 26 2002, 1:00am EDT

Q:

Can you point me to some resources that would help me understand what needs to be considered when setting up a development factory for BI applications? I work at a very large corporation that wants to setup a centralized group to operate centralized hardware and also offer development.

A:

Sid Adelman’s Answer: Your biggest challenge will be to staff your organization with the right people who have the skills in their respective areas of expertise. The following is a set of roles and responsibilities taken from "Data Warehouse Project Management" by Sid Adelman and Larissa Moss.

User Support

User support is the first line of defense when the user has problems. These people must be customer/client oriented and know that the calls from the users are not just annoyances but their reason for being. They must be knowledgeable about the tools, must understand the database structure, must know the data and must be familiar with frequently executed queries and reports. They must know the users’ concerns and problems and know the frequently asked questions and the answers to them. They must have a profile of the users, know the power users and the more casual users. They must be patient and responsive. If time elapses between hearing about and resolving the problem, the users must be informed on how the resolution is progressing. Just as in operational systems, an incident tracking system should be in place.

In some organizations, user support creates some of the canned queries and reports and administers the libraries that store these queries and reports. Administration includes developing and implementing the standards for query admission to the library and for maintaining the query. Expect periodic changes. Queries and reports submitted to the library must have use beyond that of the submitter, must be well documented and must be thoroughly tested – this includes testing on any entries that have been subsequently modified. Information on the submission must be communicated to interested parties with an explanation of the purpose of the query, how it can be invoked, expectations of resources used, response time and what to expect in the report or result set.

Some of the problems that are seen by user support relate to the update/load process, such as whether it completed on time and whether it ran successfully. These processes must be monitored and any variance in availability or timeliness of the data should be communicated to the users. Other problems relate to performance. User support must have a feel for performance, monitor performance, report performance degradation to database administration, spot poor query techniques that could cause bad performance and help the users write more efficient queries and reports.

Some organizations assign training to user support, sometimes to organize training classes and bring in professional trainers and other times to teach the classes themselves.

User support will create and maintain frequently asked questions (FAQs) that could be accessible through the internet. User support should make extensive use of meta data to train, inform and answer user questions. A large percentage of questions that come to help desks involve problems of the users not understanding the meaning of the data, the timeliness of the data or the source of the data. Most of these questions can be answered with meta data.

Data Administrator

Data administration got a second wind from the data warehouse. The data warehouse did for Data administration what Lady Chatterley’s Lover did for gamekeeping. The roles and benefits of data administration were not well understood by most organizations and the role of data administration was relegated to writing documentation that was rarely used. With the advent of the data warehouse, it became apparent that the primary roles of data administration are key to a successful data warehouse.

Data administrators model the business data according to the business rules and policies governing the data relationships. They also document and maintain the mapping between the logical data model and the physical database design and between the source data and the data warehouse target data. Data administrators understand the source files and know which are the appropriate source files for extraction. Data administration establishes and maintains the naming standards. They communicate the availability of data and, when appropriate, suggest that departments share their data. Data administration is responsible for administering the tool used in data modeling and administering the tool libraries although in some organizations, this is the responsibility of the application development team.

Data administration is responsible for administering the meta data repository. Meta data will come from a number of sources: the modeling tools, the extract/transform/load tool, the query tools, from reengineering the source file data definitions and direct entry into a meta data repository tool. The meta data repository could potentially be updated as well as accessed from the each of these source systems. Tight controls are necessary to minimize corruption of the meta data. Data administration is responsible for controlling entry, updates and deletes.

Application Developer

Application development is split into two groups with different skill sets:

  1. Back-end extract/transform/load (ETL) application developers who need to be hard-core programmers with years of experience in serious programming languages (COBOL, C++, etc.), and
  2. Front-end delivery application developers who need to know client/server and modern languages (Visual Basic, Power Play, Java, etc.), the query, report writer and OLAP tools.

The back-end ETL application developer is responsible for the acquisition process that constitutes the major effort for any data warehouse – the extract/transform/load process. It has been estimated that these tasks are 60 to 80 percent of the total time and effort to implement a data warehouse.
Extract/transform/load will be performed either with an ETL tool, with conventional programming or with a combination of both. The back-end ETL application developers will be responsible for the process regardless of which approach is used. Do not assume a tool will eliminate the work or effort involved with this process. The required analysis is still time-consuming and describing the complicated transformation logic to the ETL tool can be difficult.

The back-end ETL application developers are responsible for tie-outs, which are the controls of numbers of records extracted matched to the number of records loaded, controls on dollar totals, on errors and on the types of errors.

Data cleansing, whether it is done with a cleansing tool, with an ETL product or with conventional coding, is also the responsibility of the back-end ETL application developers. Neither the data warehouse nor the back-end ETL application developers are responsible for fixing the operational systems that feed the data warehouse.

The ETL process is always time- and resource-consuming, even with a data warehouse of moderate size. Application development must be aware of the performance implications of their architecture, design and coding.

The front-end application developers play a critical role in delivering the data from the data warehouse. The front-end application developers are usually heavily involved with the users.

One of the selling points of the data warehouse is to empower users to write their own queries. Power users are capable of writing incredibly complex queries – often to the detriment of performance. However, there are many users who are not capable of writing all the queries they need. In these cases, the front-end application developers will be responsible to write the queries for them.

Security Officer and Auditor

Security is becoming more and more important as people outside the organization are being given access to reports, results sets and even the ability to query the organization’s data warehouse. Access is being given to suppliers in an effort to improve their ability to supply timely quality parts. Access is being given to commercial customers to tie them as closely as possible to the organization’s products and services. With access allowed outside the organization, security must be exact and uncompromising.

Many department heads are concerned about anyone looking at their data. If they have the authority to restrict access, the security officer must set up the procedures to implement the restriction, and to make the department heads feel comfortable that they have retained control on the access. This type of control is especially important when data has been classified as private even within the company. Examples are personnel records, sensitive client data and patient data in the healthcare industry.

Data can be secured at different levels of granularity. Data may be available at the summary level but not at the atomic or detailed level.

The security officer should work closely with those administering security (Web administrator, DBAs, query tool administrators, repository administrator, CASE tool administrator and ETL tool administrator) to understand the capabilities of the products and determine the optimal approach to establishing security. This would include exposures, difficulties in administration – we don’t want to make this a bureaucratic nightmare – and the productivity of administration.

The primary role of the security officer is the identification of exposures with recommendations and actions to plug security holes. Another important responsibility of the security officer is to understand the interaction of security features between the tools and the RDBMS. There have been some cases reported where the security feature of one tool negated that of another.

Database Administrator

The database administrator (DBA) is responsible for the physical aspects of the data warehouse. This includes physical design, performance, backup and recovery. Before starting, the DBAs must understand the users’ basic requirements and how the data warehouse databases will be accessed. Some changes to the design and configuration of the system are more easily made than others. The more that is known initially, the less disruptive and costly the changes will be.

Data warehouse DBAs are usually more closely involved with the users than are the DBAs responsible for operational (OLTP) systems. Data warehouse DBAs are constantly monitoring and tuning the data warehouse databases, adding indexes and summary tables. They monitor the SQL generated from the queries to help improve individual query performance.

The database administrator will typically use a modeling CASE tool to create the physical database design. The DBA will work closely with the data administrator in designing the database. The same CASE tool should be used for the physical database design that was used for the logical data model. The advantage is that the mapping between the logical and physical data models can be done on the CASE tool and exported to the central repository without having to write additional programs.

Database administrators will then create the data definition language or use the CASE tool to create the data definition language to build the database. Database administration will always have an eye to performance. Performance will dictate designs and the creation of keys and indexes.

If a distributed database environment is being considered, a good rule of thumb is the three (or four) to one complexity of a distributed database over one that is centralized. This does not mean data marts should not be distributed. There are some excellent reasons for distribution, but their cost, complexity, increased administration and availability risk must be factored into any distribution decision.

Database administration is sometimes responsible for capacity planning, always responsible for physical design for good performance, for monitoring response time and resources used (CPU, disk I/O), for evaluating performance problems, for tuning the databases and for reviewing the complicated SQL statements written by both application developers and power users.

Technical Services

Technical services – sometimes called system administration – is responsible for establishing the data warehouse technical architecture. This includes decisions about the hardware, the network, the operating system and the RDBMS although some of these decisions may fall to the data warehouse architect. Technical services should develop capacity plans and make plans that would allow the data warehouse to scale to a size that is much bigger than that which was originally planned – Data Warehouse Rule # 3, "Successful data warehouses always grow much bigger than expected." Technical services should monitor ETL and query performance at a high level, keeping track of increased resource usage. This will give them time to plan for an upgrade long before performance becomes unacceptable. Technical services will develop disaster contingency plans in line with the criticality of the data warehouse.

Data Warehouse Architect

The data warehouse architect will develop the architecture for the data warehouse that would include the data warehouse tools and how they work together, their interfaces and how they feed each other.

The architect will determine whether or not an enterprise data warehouse is to be used, how and by whom it would be accessed and how it would feed the data marts. If an operational data store (ODS) is to be used, he will specify how and by whom the ODS would be accessed and how it would feed the enterprise data warehouse or the data marts. The architect is heavily involved in understanding and determining data sources. The architecture will painstakingly detail how the data marts would be integrated and reconciled. The architecture will specify if the data warehouse would be two or three tiers.

The data warehouse architect will establish the standards and procedures for the data warehouse (not for each project). A major part of those standards would address the ETL process, how it was to flow, what tools would be used, responsibilities for the process, where tie-outs have to occur, walkthroughs and inspections, testing procedures and user sign-off. The standards and procedures would incorporate the use of external data and how it would be incorporated into the ETL process.

Data Warehouse Project Manager

The data warehouse project manager has overall responsibility for a project’s successful implementation. The project manager defines, plans, schedules, and controls the project. The project plan must include tasks, deliverables and resources – the people who will perform the tasks. The manager will monitor and coordinate the activities of the team, and will review their deliverables. If contractors and consultants are used, the project manager assigns the tasks, monitors activities and deliverables and assures that knowledge transfer is indeed taking place.

The project manager will estimate the cost of the project and monitor conformance to the cost estimates. The manager will also project the benefits, measure the effectiveness of the data warehouse and report on the benefits and costs.

The most important task for the manager is recruiting the right people; people with the requisite skills and people who work well with the users and each other. Getting the right people will require an understanding from management on the importance of the project and their cooperation in the recruiting process. The problem is that the good people are already taken and a smart manager outside of the data warehouse area is loath to give up an outstanding performer. Hiring people from the outside has its own risks and time delays. An outside hire will need some time to become familiar with the organization, to learn how things are done, to understand the minefields, and to learn about the source data.

Data Quality Analyst

Don’t assume that just because there is a data quality analyst, no one else has responsibility for quality in the data warehouse. Data quality is everyone’s job but it is the only focus of the data quality analyst. Although information about data quality problems often originate in IT, it most often comes from the users. The data quality analyst is responsible for finding and reporting problems relating to data quality, for tracking these problems and assuring that the resolution is assigned to a responsible party. Some of the discovered problems must be reported to data administration where the data exceptions can be properly incorporated into the logical data model. The data quality analyst can be involved in writing the programming specifications for the transformation logic that needs to occur during the ETL process.

Not all problems can be resolved nor should they be. It might cost more to fix the problem than it’s worth. The data quality analyst along with other interested parties, including the business unit, should determine the criticality of the problem. Analytical requirements could determine the quality requirements. The cost to fix the problem should be estimated and the priority would then be assigned.

The data quality analyst should be proactive in trying to find problems before they surface. Data quality analysis tools or simple queries could identify many of the problems.

The most important role for the data quality analyst will be evaluating and improving the processes in the data warehouse, specifically the ETL process. By improving the processes, the quality of data is likely to improve. The data quality analyst might also seek out cleaner sources of data to replace those that have proven troublesome.

Query Tool Administrator

The query tool administrator has responsibility for dealing with the tool vendor, assuring excellent support and getting answers to questions. The query tool administrator deals with problems associated with the tool including queries that produce incorrect results and queries that perform badly. The query tool administrator has responsibility for assigning passwords and, along with the security office, making sure the query tool environment is properly protected. The query tool administrator has responsibility for new releases.

Web Administrator

The Web administrator has the responsibility to establish an environment where result sets can be distributed on the Web, where reports can be made available on the Web and where queries can be launched from the Web. The Web administrator will work on the Web implementation of the query tool and will be responsible for securing the Web server from unauthorized access.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access