In 2004, Chris Anderson wrote an interesting article on a concept referred to as the "Long Tail." The long tail is basically the products and services that have lost their "sale" ability within a geographical area. One of my favorite books on personal marketing was published in 1997 titled The Persona Principle: How to Succeed in Business with Image Marketing by Derek Armstrong and Kam wai Yu. You would be hard-pressed to locate this book in any bookstore, and even Amazon has it ranked number 1,140,052. Online retailers can carry a much larger inventory than physical stores, which allow them to generate more sales along the long tail of popularity.

For example, Barnes and Noble carries 120,000 titles while Amazon.com boasts 2.3 million titles. The obvious reason is that shelf space inside Wal-Mart, Blockbuster and Books-A-Million is limited and these organizations only have so much real estate to display their products to the customer. This physical constraint forces organizations to focus their marketing and promotion on the top-selling items. Online organizations do not carry shelf costs and, therefore, an additional item is simply an update to the online catalog. The advancements in technologies such as search, social software and product comparison sites allow for more fragmented channels and niche products. Interestingly, 57 percent of the sales at Amazon.com come from titles not available at your local book store. This phenomenon seems to be true from books to application software. Figure 1 presents a view of how this framework can be applied to the world of metadata. Instead of utilizing popularity and sales, we can use reusability and quantity to create our two dimensional model.

Figure 1: The Long Tail of Metadata

Enterprise Metadata

We define enterprise metadata as the core metadata captured that describes any asset within the technology portfolio that can be reused across various business units. These assets should be functionally unique and governed by a domain of subject matter experts. As always, enterprise assets should also be scalable, robust, supported 24x7 and have various metrics in place for continuous evaluation. In the data space, this definition would indicate data elements that are consolidated and cleansed as in a data warehouse or operational data store. Web services, XML schemas and open source-type objects would also be considered enterprise. Of course, only a small percentage of assets would fall into the category of enterprise. Based on the numbers presented in the opening paragraph, one can argue that only five percent of the assets within the corporation will be classified as enterprise (120k / 2,300k). Hence, figure one indicates that the enterprise asset classification is actually the head of the body and not the long tail. In our analogy, enterprise metadata is the retail store where only the highest value, most reusable and domain-centric assets are stored.

Traditional Metadata

Where does traditional metadata fit into this model? Traditional metadata focused on data and transformation-type assets with a few business rules tossed in on the side. Applying the "long tail" framework, we would include many of the enterprise metadata components as well as other data elements that might not be labeled as enterprise. The data transformations and business rules would also fall under the various classifications of enterprise and core. The one bit of metadata captured in the traditional sense that would always be enterprise is the system and interface definitions. In theory, all system definitions are considered enterprise due to the needs of disaster recovery and security. Basically, the data warehouse doesn't really concern itself with reuse; instead, it focuses on specific business requirements of the day. The data warehouse operates from a different paradigm, and the end result splits both the head and the tail of the asset classification.

The Long Tail

Back to the topic at hand: what about the zillions of technology assets that fall into the long tail and are not considered "enterprise"? Shouldn't those assets be cataloged and documented into a centralized repository or registry? The answer is yes, but like the shelf space constraint, we also have limitations that make the long tail a challenge for the value-add designation. First, we don't have a catalog standard (metamodel) that would allow all types of technology assets to be documented. Second, our discovery tools are limited in their ability to extract the metadata and understand the context of the asset itself. This indicates a universal problem of when is an asset an asset versus a component of an asset. For example, we discover a logical model through an asset scan; the logical model is an asset that needs to be loaded in the store, but what about the entities, attributes, relationships, tables and fields described within. In the online environment, the book is the asset and the table of contents, book cover and index are just components of the asset - not new assets themselves. Finally, assets must have stewards who can help us understand the context and role of the asset; blind discovery cannot accomplish this function alone.

Looking at the value equation of the long tail, we would need to be able to collect, catalog and utilize these assets within a smaller user base. The basic value of an enterprise asset is that you have large amounts of reuse and technology-based utility while the long tail is valued by a relative few. Currently, we are ignoring the long tail user base for the high-end ROI of the top end. In the retail space, technology overcame the limitations of space; perhaps technology will eliminate the limitations within the long tail of metadata in the future.

The conclusion one can gather from this concept is that you can create an ROI model for collecting metadata at the top end of the value/reuse curve (enterprise metadata) as well as a focused collection of assets where value is obtained from the collection itself (data warehouse). However, defining value for the long tail remains elusive. Looking into the future when standard metamodels exist, extreme integration is possible and services move outside the company, the long tail will begin to emerge as a value-generating option and perhaps a competitive advantage for those willing to forge ahead.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access