How data catalogs support offensive and defensive intelligence
Organizations are increasingly likely to include data as part of their day-to-day strategy in order to derive business intelligence, but the value of that data is typically limited. Most organizations simply don’t have a good understanding of the underlying data they collect and store, and more time is spent searching for data rather than actually analyzing it.
While most organizations have huge expectations of how BI will deliver exciting, and potentially groundbreaking insights, they are often disappointed in the results unless they already know where to find the data they need to perform data analysis in the first place.
Managing Data as an Asset
Data is an asset and should be treated as such, which means that managing your data intelligence is just as important as managing your BI. If BI is understanding the key metrics to manage your business, then DI is understanding the key metrics to manage your data: Where is it located? What is its quality? Who is using it? What is its provenance?
Organizations can increase their data intelligence quotient (D-IQ) by developing an understanding of how to connect the right data to the people that need to use it. The implication is that if you’re going to treat data as an asset, you must treat it like you would physical inventory: that is, making the entire lifecycle measurable.
For example, if you needed a “1-4-20x1 Black Carriage Bolt” when building a structure, you wouldn’t buy spare parts, dump them into a big pile on the floor and then sort through that pile just for that bolt. Instead, you would organize it somewhere in your spare parts storage, so you can find that bolt quickly when you need to use it.
The same is true for data. If you want to properly manage it as an asset, you need to organize your data so it is easy to find when people need it.
Strategically managing your data in order to improve your organization’s D-IQ requires both an offensive and a defensive approach.
Offensive DI is concerned with how companies use their data to positively affect new business outcomes, such as increased revenues or improved critical business processes.
Meanwhile, defensive DI is what helps companies “stay out of jail” when it comes to collecting, managing and using their data sets. With new compliance mandates popping up frequently, data governance and defensive DI are key components of a company’s overall data management strategy.
But being smart about data entails identifying investments that are leveragable for both offense and defense.
Data Catalogs for Offensive Data Intelligence
The organizations today that are successfully raising their offensive DI quotient are moving from doing BI within a business unit to figuring out how to perform cross-business unit and cross-organization data analysis more effectively.
For example, if you are a large fast food chain and you are trying to figure out how to reduce the shelf life of your in-store inventory from a week to only a few days, you need to track your beef patty supply all the way back to the cows that were sourced for them. This means analyzing data from a myriad of sources, which includes your restaurants, the trucks that move the food, the warehouses that store the food and the farm data about the cows themselves.
The challenge in doing this kind of analysis that involves multiple data sources is that the data is not labeled consistently between those sources. Thus, as you move between the different data sets, the analysis becomes even less consistent when you go across business units and increasingly less consistent when you cross companies.
As it turns out, this is why most big data projects get stalled or fail. When data is dumped into a data warehouse or a data lake, the lack of consistency in the data prevents this kind of analysis from being done.
One must get the process of cataloging the data (connecting a business term down to the technical term used to describe a column of data) right to ensure the success of big data projects.
For example, what a trucking company might refer to as “freight item,” a restaurant might refer to as “food item,” but they may in fact be referring to the same thing. This might sound simple, but when you multiply this by thousands or tens of thousands of terms that are all called something different, this quickly becomes a problem of “death by a thousand paper cuts.”
Fortunately, data cataloging tools automate the process of figuring out that the two attributes with a different label are in fact the same.
Once this problem of identifying the same data item with different labels is solved, all kinds of revenue-generating and cost-savings analytics can be done. But once again, it all presumes that you are using a common, consistent language across different data sets, business units and companies.
Defensive Data Intelligence
The functions of a data catalog can also be used for defensive DI, which is focused on protecting companies from falling out of compliance with governmental regulations. Most, if not all of these regulations, have to do with the protection of sensitive data like personally identifiable information (PII) or personal health information (PHI).
This is true for the General Data Protection Regulation (GDPR ) out of the EU, which is the most recent compliance mandate affecting companies’ data assets. But to protect this information and properly control access, you first have to know where it is.
Once again, a data catalog can help by automatically identifying which items are the same, even though they have different labels. So even if a column name is called “C01,” which is not particularly helpful, the data catalog can automatically determine, by using machine learning techniques, that “C01” is a social security number, “C02” is a last name, and so on.
Leveraging Defense and Offense
The nice thing about data catalogs is that the effort that goes into a defensive project can also be leveraged for an offensive project, and vice versa. If you are going through the exercise of labeling data for the supply chain example above, at the same time you can also flag sensitive data that is sitting right next to it for compliance mandates.
Another example of an organization using both offensive and defensive data intelligence strategies comes from the financial services industry, in which a business used the data catalog to identify sensitive data as part of their GDPR process. As part of that exercise, they used the automatic discovery capabilities of their data catalog to tag all of their data, not just their sensitive data, and discovered that they are buying the same raw data from the same supplier in multiple countries.
Essentially, they were paying for the exact same data two or three separate times because different business units in different countries weren’t aware that another part of the company had already purchased that data. In the end, eliminating redundant purchasing of this data resulted in millions of dollars in savings.
Increase Your Data IQ to Ace Big Data
Data Value Acceleration is decreasing the amount of time it takes for companies and organizations to realize the value to be gained from their data assets. Simply stated, cataloging your data gives you a better understanding of what data you have, where it is located, as well as who should and shouldn’t have access to it.
If you plan carefully, the seemingly simple exercise of populating a data catalog can be leveraged to increase both your offensive and your defensive D-IQ, thereby providing value across multiple use cases for your organization.