The National Institutes of Health (NIH), a part of the U.S. Department of Health and Human Services, is comprised of 26 separate organizations that conduct and support medical and scientific research with the goal of improving the nation's health. As a public agency, the NIH is often called upon to answer questions from NIH management, the public and Congress--and until recently the answers frequently required data that was not readily available. Each legacy source system had a different type of database--central accounting and payroll data, for example, were stored as flat files, while human resources employed DB2, information on research grants was in an Oracle database, and the administrative database employed IMS. Each organization had its own desktop standards as well, resulting in a heterogeneous mix of computing environments. A user might have had to access as many as five or even seven sources to answer a seemingly simple question such as, "How much money was spent for senior-level management to travel outside of the U.S. to attend conferences related to research?" And since it was difficult to find all the right information, the same question might receive several different answers at different times.

Finding a Better Way

Clearly, the NIH needed to find a better way for users to access and analyze enterprise information. Thus in 1992, upper management formed the NIH Information Committee and gave the NIH's Division of Computer Research and Technology (DCRT) the task of creating an enterprise-wide system that would bring all the NIH's administrative data together in one place. The system also had to provide easy desktop access so that users could provide accurate answers, to the same questions, every time.

In Phase 1 of the project, the DCRT determined that a data repository would meet the NIH's needs. The committee organized administrative information into subject areas such as budget and finance, personnel, procurements, property, travel, inventory, contracts and research grants. The NIH Information Committee then identified which areas were most critical for inclusion in the repository. Financial data was input first, since many future subject areas depended upon this data. The initial data repository employed DB2 on an IBM mainframe, while desktops ran a 3270-type application using ISPF to build screens and QMF to run queries. By 1995, we implemented a Windows-based graphical user interface employing Andyne Computing, Ltd.'s (now known as Hummingbird) Graphical Query Language (GQL).

Phasing up to a Data Warehouse

Phase 2 started in 1996, when the DCRT needed to start inputting data from the human resources' DB2 database, as well as data from three payroll systems representing NIH staff, researchers and public health officers. At this point, we realized the need for a data warehouse and a scalable data transformation solution to support the NIH's heterogeneous legacy systems.

The DCRT did a study of data warehouse software, paying particular attention to such important issues as MVS compatibility, extraction, transformation and loading, code generation, change data capture and portability. Stability and maturity of the vendor were also important. Software from Prism Solutions, Inc., best met the NIH's requirements.

Prism provides a common toolset for creating data warehouses or marts that deliver consistent information across departmental organizations. Prism Warehouse Executive is a core software component, allowing users to graphically model the flow of data from multiple, heterogeneous source databases into a target warehouse, including all required data mappings and transformations. A common layer of meta data lets users move from planning to actual warehouse implementation. Prism generates programs to extract and integrate data from the sources to the target warehouse database or data mart.

Today the NIH data warehouse resides on IBM MVS 9672 processors based on CMOS technology, each with one gigabyte of memory, running DB2 V4.1. The software architecture is comprised of an operational data store, multiple data marts and a GQL front end. Prism extracts data from the source into the atomic layer of the data warehouse, then uses atomic data to populate data marts for the different NIH subject areas. Marts are the areas accessed by users. Some data marts store information in a primary table, while a secondary table contains auxiliary information. Other data marts employ a star schema that enables users to view a subject's core data across a number of dimensions, such as time, work force and financial information. The NIH data warehouse team provides a GQL graphical user interface, but the NIH community also accesses the warehouse using other tools.

Growing Popularity Among Users

The NIH data warehouse has proven to be very popular with its more than 1,100 registered users, 800 of whom have attended the formal training sessions. Users represent all levels of the organization, from clerks who use it to track day-to-day figures, to top managers who use it for projections and highly summarized reports. Warehouse use is growing rapidly as staffers look for quicker, more efficient ways of accessing administrative business information.

At the click of a button, users can access standard reports on topics such as "all outstanding expenses," "current balance of funds" and "open procurements," or they can do ad hoc queries. Users can also extract information from the data warehouse and export it to another application, such as Excel. Access to information is quick and convenient; if someone questions how money was spent, for example, the user can get an answer immediately. Detailed information is up to date and reliable. In addition to data that is updated nightly, the warehouse contains historical data to facilitate comparisons and projections. Ease of use and intuitiveness are also big benefits.

The NIH data warehouse has proven its usefulness even in unexpected situations; recently the property data mart was used to assist with completing a police report for stolen property. Police needed the serial numbers of the stolen property--a user was able to obtain them quickly by searching the data warehouse using the item description, accountable user and building number.

Evolution Continues

The NIH's data warehouse has the capacity and flexibility to expand and to add new applications and will continue to evolve to meet the NIH's needs. An upgrade to the latest release of Prism Warehouse Executive is planned, as is migration from the client platform to the Web for warehouse access. Several data marts will be added during 1998 for such functions as staffing statistics, contracts, research grants and more. The DCRT is also investigating OLAP tools to enable users to analyze data on the fly and do cross-cut analyses, drilling up, down and across data to see different attributes. For example, OLAP would enable the user to easily see spending for the financial year, for the quarter and for the month, as well as drill down to the account level to see where money was spent.

The Basis for Success

Looking back on its experience in building a popular data warehouse that will stand in good stead for years to come, the NIH has been able to identify three important factors that contribute to a successful data warehouse experience. First, develop a strong working relationship with the business community. Second, build a strong development team with expertise in database architecture, data transformation and user interface technology. And finally, implement a warehouse architecture that is flexible, so decision support tools can change as technology changes.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access