We had yet another great DM Radio episode last week talking about metadata. 

As a keyword and knowledge focus, metadata might easily be as relevant as our preoccupation with analytics, because it says so much about how we understand, or plan to organize, all the information we want to put to use. 

Metadata management is in a sense the modern take on our history of corporate libraries and librarians, which has more recently led corporations in some cases to task people as metadata editors. When you want the official word on something, governed metadata ought to be (or will soon be) the true working knowledge hub of any organization. 

I co-host DM Radio with Eric Kavanagh, who had called in Malcolm Chisholm to be the guest analyst for last week's show. Malcolm is a longtime data expert who's also a highly valued contributor to our magazine.

What bubbled up quickly was how pervasive metadata is becoming, to the point, Malcolm says, of being as prolific as the primary data metadata refers to. Also, there are many rules and jurisdictions, Chisholm says, where there are rules and laws about what we can and cannot do with that data. 

"We need to flag records to say you can do [one thing] but you can't segment or disclose to certain people. And privacy issues mean that the quantity of metadata associated with a customer is now approaching or exceeding the actual data that describes [the customer]. So metadata is not only growing in complexity, it's growing in the volume. For a while we figured you'd have one byte of metadata for every 10 bytes of data but that's not true anymore. Metadata seems to be exceeding real data in certain areas."

Behind this fact, his point was that as volumes of metadata increase, not only in their special repositories and data dictionaries, but also in data models and physical tables of databases and operational resources we use every day, we're creating a huge new layer of governance challenges.

Malcolm is smarter about this stuff than I will ever be, but it reminded me of something I'd looked into nearly eight years ago, when (then imagined) growth in Web services might lead to a monstrous XML traffic glut on the Web. Back then, an expert reinforced my own belief with the fact that Web services were being taken to market to the degree that users were turning off security protocols bolted onto their XML traffic just to keep it flowing. 

That can't be a good idea and that was then, before lower cost and exponential scale caught up with the problem of being able to create virtually limitless data inexpensively.

We can pine for the days before lawyers and regulators demanded adherence to things like privacy. But now the volume of traffic is no longer our bottleneck. As noted a couple of weeks ago, IDC total data volume will reach 35,000 exabytes in 2020 compared to 1,200 exabytes in 2010. Something tells me we'll meet that goal easily. 

There is something neurotic or even pathological about this endless combining and recombining of data and the goal of (re)rationalizing it in ever more abstractions and layers of metadata. In the current model, as often as we try to gain order, we create more confusion.

We can't stop that but we need to be prepared to govern it. Another guest on the broadcast, David Dichmann from Sybase, said we might have a gap between our desire for 'a single version of the truth' versus a different desire for traceability of multiple versions of the truth. 

"It's not just about integrating data systems so this stuff and that stuff look the same, but why we are wanting the stuff in the first place?"

In terms of current demand, it's basic, fundamental decision support, Dichmann figures, not strategic planning or quarterly reports but to see what is happening in a few seconds to decide whether to execute a trade or optimize a business process for better customer self-service. 

"It's that kind of real-time experience that leads us to accelerate our understanding of the data, we can no longer take these gaps into account and metadata is the language that is going to get us that."

We all agreed it's time to stand by for serious metadata governance, and there will be plenty to talk about when we're serious about it.

Metadata Runs AmokThe volume of metadata is beginning to exceed the volume of primary data in some areas, according to a resident expert; time to get ready for serious metadata governance We had yet another great DM Radio episode last week talking about metadata. As a keyword and knowledge focus, metadata might easily be as relevant as our preoccupation with analytics, because it says so much about how we understand, or plan to organize, all the information we want to put to use. Metadata management is in a sense the modern take on our history of corporate libraries and librarians, which has more recently led corporations in some cases to task people as metadata editors. When you want the official word on something, governed metadata ought to be (or will soon be) the true working knowledge hub of any organization. I co-host DM Radio with Eric Kavanagh, who had called in Malcolm Chisholm to be the guest analyst for last week's show. Malcolm is a longtime data expert who's also a highly valued contributor to our magazine.What bubbled up quickly was how pervasive metadata is becoming, to the point, Malcolm says, of being as prolific as the primary data metadata refers to. Also, there are many rules and jurisdictions, Chisholm says, where there are rules and laws about what we can and cannot do with that data. "We need to flag records to say you can do [one thing] but you can't segment or disclose to certain people. And privacy issues mean that the quantity of metadata associated with a customer is now approaching or exceeding the actual data that describes [the customer]. So metadata is not only growing in complexity, it's growing in the volume. For a while we figured you'd have one byte of metadata for every 10 bytes of data but that's not true anymore. Metadata seems to be exceeding real data in certain areas."His point was, as volumes of metadata increase, not only in their special repositories and data dictionaries, but also in data models and physical tables of databases and operational resources we use every day, we're creating a huge new layer of governance challenges.Malcolm is smarter about this stuff than I will ever be, but it reminded me of something I'd looked into fully eight years ago, when (then imagined) growth in Web services might lead to a monstrous XML traffic glut on the Web. Back then, an expert reinforced my own belief with the fact that Web services were being experimented with to the point that users were turning off required security protocols bolted onto their traffic just to keep it flowing. And that was then, before lower cost and exponential scale caught up with the problem of being able to create virtually limitless data inexpensively.We can pine for the days before lawyers and regulators demanded adherence to things like privacy. But now the volume of traffic is no longer our bottleneck. As noted a couple of weeks ago, IDC total data volume will reach 35,000 exabytes in 2020 compared to 1,200 exabytes in 2010. Something tells me we'll meet that goal easily. There is something neurotic or even pathological about this endless combining and recombining of data and the goal of (re)rationalizing it in ever more abstractions and layers of metadata. In the current model, as often as we try to gain order, we create more confusion.We can't stop that but we need to be prepared to govern it. Another guest on the broadcast, David Dichmann from Sybase, said we might have a gap between our desire for 'a single version of the truth' versus a different desire for traceability of multiple versions of the truth. "It's not just about integrating data systems so this stuff and that stuff look the same, but why we are wanting the stuff in the first place?"In terms of current demand, it's basic, fundamental decision support, Dichmann figures, not strategic planning or quarterly reports but to see what is happening in a few seconds to decide whether to execute a trade or optimize a business process for better customer self-service. "It's that kind of real-time experience that leads us to accelerate our understanding of the data, we can no longer take these gaps into account and metadata is the language that is going to get us that."We all agreed it's time to stand by for serious metadata governance, and there will be plenty to talk about when we're serious about it.