Getting Data In versus Getting Information Out

Author's note: This article is part of an upcoming book on implementing the corporate information factory by Claudia Imhoff and Jeff Gentry, to be published by John Wiley & Sons.

Most data warehouses start out small. Then, depending on the complexity and size of the business problems you tackle, you add more data, and your data warehouse grows and matures. Obviously, bigger data warehouses require larger, dedicated support teams. As your support team grows, though, you may notice that it begins to show signs of a split personality based on specialization.

The Problem

For many large support teams, the specialization is bipolar. One part of the team becomes very focused on the operational systems, the integration and transformation layer, and the data warehouse ­ getting the data in. Another part of the team tends to gravitate toward building cubes, marts, reports, etc., for the business community ­ getting the information out.

Even though both of these functions are necessary for a successful corporate information factory (CIF), it can be difficult to manage the situation and to synergize the two groups. Without an understanding of this polarizing tendency, you may be asking yourself why once-happy resources are now feuding with each other over issues such as project priorities. And without a plan for maximizing their cohesion, you may be stuck.

The Solution

So, what's the plan? How do you set up a team structure that accommodates the widening gap between those who see the world of decision support as getting data in (GDI) and those who see it as getting information out (GIO)? First, you need to understand the goals of each group. Figure 1 shows these broad goals.

Figure 1: The Goals of GDI and GIO Groups

Getting Data In

The GDI group focuses almost exclusively on the operational systems, the integration and transformation layer, the data warehouse and operational data store databases and, finally, the data management process. Although the focus is more technological at its center, the business community still plays a large role. Their involvement is critical to the success of this part of the architecture, because they verify sources of data, help develop algorithms, analyze quality and currency issues, validate data models and the business rules encapsulated therein.

The goals of the people responsible for this activity are quite different from those of the GIO group. The GDI goals are summarized in three broad categories:

Integration ­ First and foremost in the minds of the GDI group is the ability to capture, integrate, cleanse and transform the fractured data from the myriad operational systems to the integrated, enterprise-oriented standard formats. They need to do this as efficiently and quickly as possible.

Data Accessibility ­ To use the data efficiently and effectively, it must reside in convenient and available formats for easy extraction from the data warehouse into data marts or easily attainable form for the operational data store. The second goal of the GDI group is to extract the data from the older, more difficult and less understood technologies and get it into new, easily accessible technologies and documented formats. This may mean a change in the technological platform for many organizations ­ usually to a relational database on a highly scalable platform.

Quality ­ The last, but not least, concern of the GDI group is quality of the data they are loading into the corporate information factory. The first question from most users is, "How good is this data?" Until the quality is known and measured, this question cannot be answered. Unfortunately, the quality of the data in most operational systems is relatively unknown. The first indication of quality problems comes during the integration and extraction process. The goal for the GDI group is to correct the problems. It takes a great deal of effort from both the business community and IT personnel working together to cause major improvements in quality. It may even require an overhaul of existing business processes and/or incentive plans to show improvements.

If the GDI group has done its job successfully, the result is a normalized, nonredundant data structure. Unfortunately, this is not a structure easily used for GIO queries.

Getting Information Out

The people responsible for getting information out and into the hands of the business community have their own architectural concerns. They are focused on the data delivery process, data marts, meta data, and the interfaces into both the data marts and into the operational data store. Generally this group is made up of both line-of-business technology (LOB-IT) people and personnel from the business community.

The GIO group certainly has its own set of goals, very different from those of the GDI group. These consist of the following:

Capability ­ The GIO group must understand the business problems that the corporate information factory is solving. Each iteration of the GIO process should solve a specific business problem. For example, if you create a data mart to analyze product profitability, it is unreasonable to assume that data mart can support other analysis such as customer purchasing patterns. That type of analysis requires yet another capability and set of specifically formatted data ­ another data mart. All of these are technological capabilities for use by the business community constructed using the CIF architecture

Business Community Understand-ing ­ The business community cannot use these capabilities without first understanding their purpose, how and when to use them. The architecture is only as good as the meta data ­ the data about the data, activities, reports, usage, etc. ­ that is accessible to the business community. Meta data is the glue holding the architecture together. It is the information you turn to when you don't know what a calculation is, where a piece of data came from, who created what report, and so on. Without a solid understanding of the CIF, the users of this environment could use information inappropriately or ­ worse ­ not use the environment because they don't know how or didn't know it was there! The goal here then is to make the environment not only easy to use but easy to understand.

Security ­ The final goal of the GIO group is to ensure that this most valuable corporate asset ­ clean, accessible information ­ is not abused, stolen, inappropriately accessed, etc. Security becomes a highly political issue quickly, especially if you make your CIF accessible through your intranet to the business community and external entities. A well thought-out security plan, well implemented will be invaluable to the well being of your organization. This last goal of the GIO group is one that takes significant effort and input from the business community to specify security requirements and from the IT staff to implement them.

Just as the GDI group cannot forget about the business community in their quest to get data in, the GIO team cannot forget about the architecture in their zeal to get information into the hands of the users. They must ensure that the data they require exists in either the data warehouse or operational data store. They must ensure that the GDI team is informed of new requirements, new calculations, new data marts, etc., to keep the architecture in sync.

Ultimately you should consider splitting these two groups into formal teams ­ a centralized "getting data in" team and one or more "getting information out" teams serving the business community. And, once split into these two camps, you must ensure that they maintain cohesiveness. This can be accomplished by:

  • Educating the teams on the nature and importance of each other's goals.
  • Ensuring that each team keeps the interests of both teams in mind (for example, requiring that the GDI team supplies the GIO team(s) with clear, well-documented technical meta data explaining the source of data, its quality, format, content, etc.).
  • Requiring that the GIO team(s) constantly report user feedback on data quality and supply new data requirements as needed to the GDI team.

A natural evolution of most data warehouse teams is from a single team sharing all roles and responsibilities to that of a central "getting data in" team and one or more "getting information out" teams. The GDI team is generally part of central IT, and the GIO teams fall into a combination of line-of-business IT and business community members. Each team or set of teams have very distinct roles and responsibilities as well as skill sets. Using the goals described in this column, you can determine who should be assigned to the "getting data in" side of the house and the "getting information out" side. Each team focuses on different aspects of the corporate information factory architecture but must rely on each other for optimal effectiveness and efficiency.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access