Connecting the dots between the business and the underlying technology in the case of extract, transform and load (ETL) technology can be a tricky undertaking and ends up working similar to our example of passing that post card from Chicago to Mongolia through six degrees of separation. This will require a bit of set up and some work, but it is well worth considering given the need to build the business case for technologies that are not always intuitively or directly relevant to business operations.
At first glance, ETL looks like a technology that is more relevant to infrastructure than to a conversation about business results. The ETL platform is furnished with a design workstation at which a developer uses a high-level interface to drag and drop and point and click to generate an application. Often the ETL platform occupies a key point in the system architecture - a data hub - through which heterogeneous data must pass to map upstream data elements, typically from transactional systems, to downstream data elements in a data warehouse, data mart or your target data store of choice. Along the way, a wide variety of operations are applied to transform the data - including changes in format, look ups to adjust content as well as actions to affect data or information quality. Predefined connectors or adaptors enable the access of an extensive list of data source and targets, extending from relational databases to XML to enterprise resource planning (ERP) or other proprietary systems and sources. The rules of interoperation between different systems are captured to a centralized metadata repository for subsequent impact analysis, easing system maintenance. The metaphor of an information supply chain that unfolds in data stages is powerful and relevant. Yet, as described so far, ETL is not a technology that addresses what keeps most businesspeople up at night. How can we make the connection in order to build the business case?
What does keep businesspeople up at night? At the risk of over simplifying, the CFO is worried about the bottom line numbers and the integrity of the preceding lines from which it is derived; the marketing manager, about the coherence of the messaging going forth in the firm's communications; the sales staff, about meeting their quota; the product manager, about inventory levels (low but not too low) and the quality of the production process; the HR manager, about inspiring teamwork and collaboration; the executive function, about market trends, competitors, substitute products, legal issues, regulatory pitfalls and compensation plans. Of course, each of these roles is an over-simplification, and each of them is concerned with their own relevant metrics, messages, inventories, quality and teamwork. If the enterprise is to succeed as a whole, each of them must have a concept of serving the customer (or if they are in the public sector, the constituent) and a perspective on the overall enterprise. If they do not get their heads above the day-to-day struggle for survival, then a whole set of dysfunctional behaviors can result with damaging consequences.
Now let us (finally) return to our hypothesis about six degrees of separation. Any valid data point is separated from a business problem by a number of degrees of separation - sometimes six, sometimes less or even more. The first level is the transactional one - customer buys a product or service. This generates an atomic data point, a sale. Of course, the transaction itself may entail degrees of separation such as credit checking if the purchase is with a payment card or validation if an insurance claim is being processed. In turn, this data is aggregated at a second level for purposes of statutory accounting and regulatory reporting, categorized according to distinctions that mean something to government auditors. At a third level, the transactions are aggregated according to other relevant master data dimensions - which customer bought which product and when and where this occurred. A fourth level consists of adjustments to aggregates. In retail, products are returned and must be added into inventory and subtracted from revenue. In insurance, losses must be accrued, deducted from reserves and added to payouts. If the organization has a flat structure and consistent systems, we are done sooner rather than later. We have traced a path from interaction in the market to a fundamental business entity (such as customer or product) that can solve a business problem or answer a customer question. We are now able to answer basic business questions about trends in the market and related issues. In this example, we have four levels or three degrees of separation unless we count the one hidden in the transactional layer, in which case we have four degrees of separation. This is relatively simple, but rarely is this the case.
There is a "gotcha." In this discussion, "degrees of separation" is a proxy for "steps in a process" or "hand-offs between system interfaces." A quick and easy result such as the one we just obtained does not really correspond to our intuitions about the complexity of business operations. This result presumes we have intermediate access to something similar to a consistent representation of customers and products or other essential reference data in the form required. In general, if each system interface (hand-off between different representations of the same data) represents a degree of separation, then the task of merging a half dozen customer or product files across each of these interfaces represents a potentially astronomically large number of degrees of separation. The exponential fan out goes in the wrong direction - instead of converging on your next-door neighbor who hands off the post card to his second cousin in Bangalore, we get proliferating system interfaces. It is as though you need to get there by way of the planet Mars. This is the state of the spaghetti-like diagrams of system interrelations that represent the "before" state in data mart consolidation initiatives.