Six degrees of separation is the theory that all of us can be connected to any other person on the planet through a chain of acquaintances that has no more than five intermediaries. The theory was first proposed in 1929 by the Hungarian writer Frigyes Karinthy in a short story called "Chains." In 1967, American sociologist Stanley Milgram devised a new way to test the theory, which he called "the small-world problem." He randomly selected people in the Midwest to send packages to a stranger located in Massachusetts. The senders knew the recipient's name, occupation and general location. They were instructed to send the package to a person they knew on a first-name basis who they thought was most likely, out of all their friends, to know the target personally. That person would do the same and so on, until the package was personally delivered to its target recipient. Although the participants expected the chain to include at least a hundred intermediaries, it only took (on average) between five and seven intermediaries to get each package delivered. Milgram's findings were published in Psychology Today and inspired the phrase "six degrees of separation."

In 2001, Duncan Watts, a professor at Columbia University, continued his own earlier research into the phenomenon and recreated Milgram's experiment on the Internet. Watts used an e-mail message as the "package" that needed to be delivered and, surprisingly, after reviewing the data collected by 48,000 senders and 19 targets (in 157 countries), Watts found that the average number of intermediaries was indeed, six. Watts' research, and the advent of the computer age, has opened up new areas of inquiry related to six degrees of separation in diverse areas of network theory such as power grid analysis, disease transmission, graph theory, corporate communication and computer circuitry. (Special thank you to What is.com for filling in the gaps.)

Let's bring it home to the corporation. We make the claim that there are millions of technical assets within the corporation. Obviously, the way in which we define "asset" can increase or decrease that number. Suppose we have 2,000 systems or applications within the corporation, with an average of 15 tables, 10 fields and 10 elements of meta data. This would generate 3 million data assets alone. Not to mention the relationships between assets, schemas, components, programs, interfaces, Web pages, metrics, business rules, etc. Is it any wonder we have so much trouble determining what we have, where it is, who uses it, when is it accessed and how you can access the same asset. Do you see where we are going with this logic? Yes, I could make the statement that any asset, yes, any asset that you select is only six degrees from another asset.

Let's test the theory on the movie industry. Can we find a relationship between Vivien Leigh from Gone with the Wind and Tobey Maguire of Spider Man fame?

Vivien Leigh was in Deep Blue Sea, The (1955) with Arthur (I) Hill
Arthur (I) Hill was in Amateur, The (1981) with Ed Lauter
Ed Lauter was in Seabiscuit (2003) with Tobey Maguire

Here is the link, try it yourself. http://oracleofbacon.org/oracle/star_links.html. The longest length I could find was four but I am sure there are longer ones. Kevin Bacon has an average of 2.946 for all of the 645,957 actors in the database. 13 of them actually require eight jumps but I challenge you to find one. Other than having fun, what does this say about our organization and the web of assets we have created?

The data warehouse provides an excellent application for impact analysis and our six degree test. Suppose we have a data warehouse that collects information from three to four sources and feeds a couple of data marts. See Figure 1.


Figure 1: Simple Example of Degree of Separation

In this example we can relate Customer_Name to CustName by a series of relationships (transformations).


Customer_Name from the CRM application is transformed (Transformation A) into CustomerName in the data warehouse.
CustomerName in the data warehouse is transformed (Transformation B) into CustName in the data mart.

In fact, ETL or field to field mappings are at the heart of impact analysis of a data warehouse. The problem, as with the movie database, is that it only contains a single type of relationship. (i.e., starring in a movie). What about actors that are related such as Kirk and Michael Douglas? How about marriage relationships such as Michael and Catherine Zeta-Jones? How about people that live on the same street in Hollywood or attend the same church? The magic of the degrees of separation package described at the beginning of this article was that all types of relationships were taken into account, not just family members or neighbors. The power wasn't in the detailed meta data but the diversity of relationships.

Data management provides not just one type of relationships but many including: domain, transformation, taxonomy, function and location. The real question isn't that we couldn't hire a consultant or assign an employee to the task of identifying these relationships, but how are they utilized. Does your meta data solution provide the functionality that is required to document these relationships? What value would come from having a system that can relate and document these relationships? The reality is that we haven't been very good at collecting and utilizing relationships. I have enjoyed watching the growth of the Internet over the past few years. The growth from a usage and content perspective has not surprised me. The ease at which organizations have jumped on the Web demonstrates that anyone that can understand HTML can publish a site. What has surprised me is that we have done a crappy job at defining relationships between these artifacts of information. The number of Web pages on the Internet may, only slightly, out number the quantity of assets in a major corporation. Thus, we have a similar problem.

How much longer are we going to determine who is using technology by simply turning it off and seeing who screams? Don't laugh, we all know that is exactly how it is done when you have a half-hearted effort to understand the meta data environment. What happens when a production application goes down and the CEO asks what the impact of the outage is? I hope your answer won't be "Well, only three people have called to complain." The repository isn't just about capturing information and loading into a meta-model. The relationship between assets is as important as the core descriptive information. When you consider the number of assets and the different types of relationships, you can see how complex this job can be. If we could solve the relationship problem then I would be going for an IPO with the world's best relationship engine. Sorry Google, that mathematical, keyword and linkage relationship business model will be destroyed by someone in the next five years.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access