Last month we introduced the concept of DRIP; that is the common business problem of being "Data Rich and Information Poor." Unless a company has only just recently been formed and is only now developing support systems that adhere to the accepted best practices of both operational IT development and business intelligence implementation standards, most companies have grown their data environments the same way. Historically, business solutions evolved at different times in a business' IT development cycle. Each business need was analyzed, designed and modeled with its own operational system. As new needs were identified, there may have been a cursory review of the current systems, but in most cases IT developed a new standalone operational system. Couple this with the recent rash of acquisitions and mergers and IT organizations are left with silos of data that do not communicate with, or relate to, each other. Oftentimes merging or acquired companies have different hardware/software infrastructures that are incompatible. This predominance of data silos is what leads directly to DRIP, and the difficulties companies are currently experiencing using their abundant data sources to create and formulate solid, informed decisions. Beginning with this month's column we will discuss and focus on tactics to alleviate this issue.

Many of the techniques we will present are recommended for any IT initiative, regardless of its business requirements. One of the worst, and most common, mistakes made when undertaking a technology initiative is to jump straight to the tactical aspects of the project without taking business processes and requirements into account. Often, IT organizations receive a set of business requirements from their business constituency and will rapidly move to the "I know how to fix this" mode. Rather than taking a step back to reflect and review the current request in light of the total corporate IT infrastructure and stated direction. This lack of strategic planning and validation is one of the leading causes of DRIP.

Understanding the application you are trying to create and its implications is essential to overall project success. Strategic planning for a BI initiative, for example, is a bit more complex than for standard operational-type IT projects. No doubt that the topic of planning can fill volumes, so this series will focus on the more tactical DRIP-mitigation techniques. Specifically we are going to focus on:

  • Gap Analysis - The practice of reviewing business requirements, translating those into data requirements and examining source systems to determine what exists vs. what is missing and how to make up the difference.
  • Systems Audit - Reviewing and "re-analyzing" existing systems to ensure they are still functioning as they were originally deployed and assessing the underlying data to ensure the IT and business users actually understand the what meaning is contained in the data maintained by the system.
  • Master Organizational Plan - The tactical outcome of a strategic plan that reworks the current systems to ensure they fit into the long-term vision for the organization. This plan also identifies personnel and/or process changes that are required to allow the current applications to work within the planned environment. This plan also dictates time intervals for reassessment of the systems taking into account that a company's IT environment is an evolving and changing environment dictated by changes in the business.

This month we will be covering the first item, the gap analysis in detail and talking about all of the steps that go into insuring a successful outcome for any enterprise trying to create a gap analysis.

The Gap Analysis

The gap analysis involves several steps that occur throughout the implementation of any application. It begins with the all-important requirements meetings. This series of meetings requires an individual or group of individuals to gather the business information requirements from all of the users who will have a stake in the application. During the process, the business users' requests for information are evaluated and refined, and possible candidate source systems that may contain the data required to satisfy the requirements are identified. Finally, the business requirements are documented and translated into data requirements that are mapped to the candidate source data with the target data elements modeled for the new application.

Business Data Requirements - A key function of the implementation team is to work with the business users to ensure all the data requirements are met and any missing elements identified and planned for. During these meetings, the implementation team should quantify the information necessary to satisfy each of the requirements brought up in the discussions. During the requirements gathering, the following questions should be considered:

  • What are the common ad hoc requests? How long to produce them?
  • Which source systems are used for frequently requested information?
  • Is there data used in the decision making process that resides in personal repositories, such as Excel or Access? Are there any personal business rules applied to the data when it is entered or manipulated on the desktops?

As the final outcome of this step, the implementation team will have identified the data elements needed to support the user's requirements based on available or obtainable data from the current source systems. This will form the basis for the implementation's data model - either a new database or for modifications to an existing repository. In parallel to the data modeling effort, the data requirements will be the basis for completion of the source data identification process.
Source Data Identification - Once the required data has been identified and validated, it is time to identify the candidate sources for the data. It is likely that the data will reside in several source systems within the organization ranging from enterprise-wide relational databases to personal data contained in end-user Excel spreadsheets. Regardless of the source, the data must be identified and analyzed for accuracy, completeness and viability. The following questions should be kept in mind when reviewing the sources:

  • How do the source systems relate to each other? Do any feed each other?
  • Are there data elements in the source systems left unpopulated or unvalidated?
  • Do any data elements contain business rules other than those identified in the data dictionary or application documentation and is there any information implied by the element name (i.e., the first character of a free-form field identifies if the remaining characters are a business name or a company name)?
  • Is there a current published data dictionary and are lookup tables available?

The end result of this lengthy but necessary process is a solid understanding of the possible data sources, and how they relate to one another and to the data requirements. Concurrently, at the end of this process, the data model should be complete, and it will be safe to proceed to the final stage of the gap analysis: source-to-target data mapping.
Source-to-Target Data Mapping - Once the data requirements have been gathered and the data model complete, the data mapping exercise starts. This process has two major functions. First is the actual mapping exercise, but more importantly, the second is resolution of any issues that are identified during the mapping. To facilitate the mapping, create a spreadsheet with the following columns:

  • Target Table
  • Target Column
  • Data Type/Length
  • Target Data Element Description
  • Source System
  • Source Table
  • Source Column
  • Data Transformations (if required)

When mapping each of the target data elements, be sure to include all required data in the table, even if no source has been identified. Also, it is quite likely that many of the target elements will have multiple candidate sources. Be sure to record all possible sources.
Once all of the target elements have been mapped, it will be necessary to meet with the business community to present the results and to resolve any issues identified. When the issues revolve around multiple sources, the results of the data source analysis need to be used to drive the resolution since this information can help determine which source is the most valid (i.e., validated, commonly populated, source for other systems, etc.). When the conversation turns to required data elements that cannot be mapped to source systems, resolution becomes much more of a business decision. It is always possible that new personal data repositories will be identified during this conversation since business users don't always think about all of the various ways they use company data in the course of their jobs. More commonly, target elements are identified that simply have no source within the organization today. In these cases it comes down to four questions that need to be answered by the business users:

  • Can the required information be derived from current data by applying transformations or calculations or concatenations of existing elements?
  • Can the information be built from existing data systems or data entry processes that can be modified to secure this information? In many cases, a source system, or the process used to gather the input information, can be modified fairly easily and provide the required data. For example, the order entry process could be modified to require the entry of a Social Security Number, rather than allowing it to be optional and, therefore, rarely populated.
  • Can the information be secured from an outside source? This last case is the hardest decision because it is also the most costly in terms of actual money expended by the company. An example of this is SIC code information. Companies have been doing business for years and are only now getting into detailed sales analysis. Companies were only concerned with operational information about inventory and sales dollars. Now, they need to delve deeper into their client base to understand how the sales data breaks down by industry (SIC, D&B (Dun & Bradstreet), etc.) or other demographic (income, education, etc.) and firmographic (revenue, population, locations, etc.) indicators. Many types of information like this are available for purchase from third-party data providers such as credit bureaus, D&B and the SEC.
  • Can we live without the information? The final question is do we really need this information to satisfy the business questions? If the answer is no, and the users are comfortable making decisions without the requested information, then the data element is removed from the list of required data. More often, however, the data is required, and the company needs to determine the most cost effective way to secure the required information.

As you can see from the detailed labor involved, a gap analysis is an intricate and detailed step of any implementation. But this should not negate that it must be done if you want a successful implementation for your application. The gap analysis is a technique applicable to any IT implementation. It is an especially invaluable tool when designing and implementing data intensive applications such as business intelligence/business performance management and data warehouses. The project manager should plan for this exercise as part of any successful project implementation. While it may seem to add time and overhead to the project the effort and expense will ensure that the resulting implementation will be meet the end user's expectations.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access