When Data Becomes Metadata
Not long ago, a friend of mine who is a partner at a large accounting firm shared this story with me. He had asked someone to prepare an archive of all of the important documents produced this past year. After three weeks of work, this person proudly showed his boss (my friend) a shelf containing 11 binders with all of the documents neatly printed. My friend opened one of the binders and quickly realized there was no order to these documents, not even by date or client.
"But how can I find all of the documents for Customer XYZ that I wrote in June?" my friend asked. Complete, painful silence was the only response.
Although this story was a source of frustration for my friend, it was fascinating to me. This was a perfect example of what is now happening all around us: data is becoming metadata. Data values like Customer XYZ are expected to become metadata tags to help with search and retrieval of other data such as documents, Web pages and images.
Search engines spoil us, and we expect everything around us to be neatly organized and at our fingertips after typing in a keyword or two. We type in data that we expect to behave as metadata by describing some other data. (Remember that "data about data" definition we like to give for the term metadata?) Data becoming metadata leads to more sophisticated requirements we need to address. What challenges does data becoming metadata raise for us data folks?
Searching for Answers
Several challenges with data becoming metadata directly influence our data management world.
More emphasis on usage. We will need to not only understand how the business works, but also have a keen focus on usage. Our users will be providing us with data requirements that have search engine-like characteristics. Phrases such as "I need Googling within the Invoices subject area" will be commonplace. Bob Mosscrop III, project manager of enterprise metadata management, describes this as "document management meets dimensional model." Both Barry Williams, principal consultant, and Norman Daoust, business analysis consultant and trainer, emphasize this increased responsibility for understanding how the data will be used. Norman says, "While this has always been a part of our role, it's become even more so as the expectations of our users have been raised by the ubiquitousness of Google searches."
More tech savvy. Data managers will be relied upon as experts in search engine technology. We will be asked how search engines work and will be held accountable for analyzing and modeling Web 2.0 components such as tags and ontologies. Users will expect similar results and response times as their search engines for all of their reports and queries. Therefore, there will be an increased focus for us on the physical data model to ensure rapid query response time to match search engine response time.
More comfortable with classifying. Metadata Specialist Bob Schork says classifying data into organized piles retrievable with metadata tags is going to play a larger role in our analysis and modeling. "As systems grow and integration challenges persist and get more complicated, classification of data will play a more prominent role. Those analysts that know how to classify data will be the successful ones." Data Modeler Georgia Prothero adds, "As data becomes metadata, I find I am increasingly implementing models that contain generic 'tag value pair' structures. I do this so that classification is handled consistently by applications and can be changed without impacting application code."
More data quality checks. There will be an increased emphasis on data quality, especially that subset of data that plays the role of metadata in the form of search tags. Data Architect John Nixon and BI Architect Michael Smith both highlight the integration of structured and unstructured as a huge challenge. Mark says, "One challenge might be how to unite our metadata and data into a single, managed analysis environment." Michael says, "The biggest challenge data folks have is influencing the workflow such that data is enriched with metadata tags as part of the workflow itself. Otherwise, it never gets tagged. Yes, Google-like technology can solve some of this after the fact, but where most of the detailed engineering occurs at a detail level." Data Management Lead Kevin Heinsey suggests a large challenge we will face is assigning stewardship to many of these new metadata structures such as tags and ontologies with the goal of having higher metadata quality.