"Six degrees of separation" refers to the assertion that any person on the planet is connected to every other by five relationships with other people. For instance, if I needed to get a message to central Mongolia but did not have the phone number or address, I would communicate it to my neighbor, who know someone in Bangalore, India, who, in turn, knows someone in Singapore, who knows someone in Beijing, who knows someone who is neighbors with the recipient in Ulanbantor in Mongolia, and hands it off. Five hops, six degrees of separation.1

Connecting the dots between the business and the underlying technology in the case of extract, transform and load (ETL) technology can be a tricky undertaking and ends up working similar to our example of passing that post card from Chicago to Mongolia through six degrees of separation. This will require a bit of set up and some work, but it is well worth considering given the need to build the business case for technologies that are not always intuitively or directly relevant to business operations.

At first glance, ETL looks like a technology that is more relevant to infrastructure than to a conversation about business results. The ETL platform is furnished with a design workstation at which a developer uses a high-level interface to drag and drop and point and click to generate an application. Often the ETL platform occupies a key point in the system architecture - a data hub - through which heterogeneous data must pass to map upstream data elements, typically from transactional systems, to downstream data elements in a data warehouse, data mart or your target data store of choice. Along the way, a wide variety of operations are applied to transform the data - including changes in format, look ups to adjust content as well as actions to affect data or information quality. Predefined connectors or adaptors enable the access of an extensive list of data source and targets, extending from relational databases to XML to enterprise resource planning (ERP) or other proprietary systems and sources. The rules of interoperation between different systems are captured to a centralized metadata repository for subsequent impact analysis, easing system maintenance. The metaphor of an information supply chain that unfolds in data stages is powerful and relevant. Yet, as described so far, ETL is not a technology that addresses what keeps most businesspeople up at night. How can we make the connection in order to build the business case?

What does keep businesspeople up at night? At the risk of over simplifying, the CFO is worried about the bottom line numbers and the integrity of the preceding lines from which it is derived; the marketing manager, about the coherence of the messaging going forth in the firm's communications; the sales staff, about meeting their quota; the product manager, about inventory levels (low but not too low) and the quality of the production process; the HR manager, about inspiring teamwork and collaboration; the executive function, about market trends, competitors, substitute products, legal issues, regulatory pitfalls and compensation plans. Of course, each of these roles is an over-simplification, and each of them is concerned with their own relevant metrics, messages, inventories, quality and teamwork. If the enterprise is to succeed as a whole, each of them must have a concept of serving the customer (or if they are in the public sector, the constituent) and a perspective on the overall enterprise. If they do not get their heads above the day-to-day struggle for survival, then a whole set of dysfunctional behaviors can result with damaging consequences.

Now let us (finally) return to our hypothesis about six degrees of separation. Any valid data point is separated from a business problem by a number of degrees of separation - sometimes six, sometimes less or even more. The first level is the transactional one - customer buys a product or service. This generates an atomic data point, a sale. Of course, the transaction itself may entail degrees of separation such as credit checking if the purchase is with a payment card or validation if an insurance claim is being processed. In turn, this data is aggregated at a second level for purposes of statutory accounting and regulatory reporting, categorized according to distinctions that mean something to government auditors. At a third level, the transactions are aggregated according to other relevant master data dimensions - which customer bought which product and when and where this occurred. A fourth level consists of adjustments to aggregates. In retail, products are returned and must be added into inventory and subtracted from revenue. In insurance, losses must be accrued, deducted from reserves and added to payouts. If the organization has a flat structure and consistent systems, we are done sooner rather than later. We have traced a path from interaction in the market to a fundamental business entity (such as customer or product) that can solve a business problem or answer a customer question. We are now able to answer basic business questions about trends in the market and related issues. In this example, we have four levels or three degrees of separation unless we count the one hidden in the transactional layer, in which case we have four degrees of separation. This is relatively simple, but rarely is this the case.

There is a "gotcha." In this discussion, "degrees of separation" is a proxy for "steps in a process" or "hand-offs between system interfaces." A quick and easy result such as the one we just obtained does not really correspond to our intuitions about the complexity of business operations. This result presumes we have intermediate access to something similar to a consistent representation of customers and products or other essential reference data in the form required. In general, if each system interface (hand-off between different representations of the same data) represents a degree of separation, then the task of merging a half dozen customer or product files across each of these interfaces represents a potentially astronomically large number of degrees of separation. The exponential fan out goes in the wrong direction - instead of converging on your next-door neighbor who hands off the post card to his second cousin in Bangalore, we get proliferating system interfaces. It is as though you need to get there by way of the planet Mars. This is the state of the spaghetti-like diagrams of system interrelations that represent the "before" state in data mart consolidation initiatives.

At this point, an ETL platform comes in handy. Going forward we might call ETL a "degree separator undo-er." Businesspeople may not know what metadata or schema integration or data normalization or even a star schema are - nor should they have to know - but they do know what a consistent unified view of customers or products is. They do know what aggregate customer revenue, lifetime value or reduced inventory through superior demand planning and supply chain forecasting is. In order to obtain such break through business benefits, which can grow top-line revenue, it is useful - indeed essential - to perform data integration. Every step in data integration is a stage in traversing and reducing the degrees of separation between transactional data and business information. When made apart of a data hub, ETL technology can optimize the number of system interfaces. Instead of proliferating point-to-point connections, the logically minimal number of connections between data stores is implemented through a central hub. The end result is that ETL technology has finally caught up with the much-hyped 360-degree view of the customer and other master data components. The end result enables advanced applications and comprehensive business intelligence, connecting the dots between the business and the technology.


  1. There is some controversy as to whether this is an urban myth. According to the Wikipedia entry for Six Degrees of Separation: " The hypothesis was first proposed in 1929 by the Hungarian writer Karinthy Frigyes in a short story called Chains. The concept is based on the idea that the number of acquaintances grows exponentially with the number of links in the chain, and so only a small number of links is required for the set of acquaintances to become the whole human population."