Most of us have junk drawers at home – places where you throw things that don't seem to belong anywhere else. It probably isn't a very good thing that junk drawers make me think of some enterprise data warehouses.

Not everyone is a fan of enterprise data warehouses. Many think they are not feasible – that it is not reasonable to place all data from an enterprise into a single, monolithic data warehouse. I think those people are right.

Placing all data from an enterprise into a monolithic data warehouse leads to junk drawer mentality: the resulting data warehouse looks like an overstuffed drawer. The excessive amounts of data make it impossible to provide adequate attention to the details of data quality and the requisite transformations. Additionally, users feel overwhelmed and find the whole warehouse unusable.

The lesson to be learned is a version of the old "quality, not quantity" adage. If you want information for innovation, you can't try to capture all the data from an enterprise. Instead, you seek to capture only data that is designated for innovative uses that lead to bottom- line value to the enterprise. This leads to the obvious question: What kind of data is that?

The beginning of the answer is that if you want data that leads to bottom-line value, you need to target it for that purpose. Unfortunately, not every business user has the insight into data to be able to selectively target data. I have been designing query systems since the late seventies and can remember users telling me, "I need all the data, and I need it kept forever." Unfortunately, you can still see that mind-set in some companies today.

The ability to target the right data begins with data warehouse designers who seek out data-savvy business users with analytic skills and experience. They can identify the information that will truly lead to bottom-line improvements. They are the ones who can discuss the characteristics of success within various parts of the enterprise and identify the information that helps to drive that success.

Here's another way to consider what enterprise data should be: Think about a business function or use, and then work back to the data needed to perform that function optimally. For example, data warehouses are commonly used to deliver performance scorecards – high- level metrics that highlight enterprise performance. In order to drive bottom- line results, there needs to be enough supporting information to enable an analytic assessment of underlying causes and behaviors that might lead to innovations that will change future performance. Thus, the enterprise data for performance scorecards includes the underlying detail that supports analysis to drive future improvements in performance.

Consider data warehouses that support marketing analytics, such as the planning of marketing campaigns and the analysis of completed campaigns. As with scorecards, it is not enough to deliver high-level summarized results. The detailed information needed to change future performance of campaigns is the enterprise data: it includes demographic, behavioral, relationship and contact information. If such data has the potential to change future performance, then it is enterprise data.

Finally, data warehouses are commonly used to support supply chain information. This may include information about purchase orders and receiving that allows enhanced vendor performance or information on materials and supplies that allows improved product costing. Other relevant information may include market conditions to allow improved risk management or information so suppliers can manage on-hand quantities independently. To the extent that such information can support innovative analyses to improve bottom-line performance, it is enterprise data.

Thinking about the useful data in this way allows one to make some choices. A significant amount of data in enterprise databases cannot be used for the kinds of innovative analyses we're talking about here. Certainly, that data may be valuable within a certain domain, but it cannot contribute to the analyses that drive enterprise performance. Such data properly belongs in the operational databases that support its domain, and there it should stay.

One might argue that such data is merely waiting for the right clever analyst to figure out how to convert the data to innovative information. That's a valid argument. Good analysts will recognize the potential of data and will seek to store it in the warehouse, with the expectation of using the data soon. We need to tolerate and encourage such thinking within limits.

The challenge for us as information professionals is that we know that the drawer is large, but not infinitely large. Judgment and discernment are necessary so that we use enterprise resources to store data that could be used more fruitfully, but don't waste those resources by storing too much "junk." Just having these kinds of discussions early with analytical business users can help everyone find and maintain the appropriate balance.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access