Many organizations find that they cannot rely on the information that serves as the foundation of their business. Unreliable data – whether about your customers, your products or your suppliers – hinders understanding and affects your bottom line. It seems simple: better data leads to better decisions, which ultimately leads to better business, but many executives need to take data quality and data governance more seriously.
We must manage our data in the same fashion that we manage any process – with a defined, predictable methodology. This requires a data management lifecycle methodology that helps us understand how to manage, monitor and maintain our data to benefit our business.
The data management lifecycle has five steps: discover, design, enable, maintain and archive. Each step is integral to creating a strong business foundation.
One of the biggest mistakes that most organizations make is that they do not properly understand their data. A quick inspection of your data would probably find that it resides in many different databases, managed by many different systems, with many different formats and representations of the same data. Data can be useful only if you understand where it is, what it means to your organization and how it relates to other data in your organization. Discovery allows you to understand just that.
Data discovery is essentially about the beginning. It is a fundamental but often overlooked step that should begin every initiative that involves data. Every enterprise resource planning implementation, every customer relationship management deployment, every data warehouse development, every data migration or consolidation effort, and every applications rewrite should start with data discovery.
Estimates for ERP and data warehouse implementation failures or severe cost overruns often run as high as 65 to 75 percent. In almost every instance, the failures are due to a fundamental misunderstanding about the quality, meaning or completeness of the data that is essential to the initiative. These are the problems that should be identified and corrected prior to the beginning of the project. Data discovery is the set of activities that identify data, determine its role in the organization, and uncover problems, inconsistencies, inaccuracies and generally unreliable data.
Data discovery has several components to it, and each prepares you for your data initiatives: data exploration, data profiling and auditing, and data cataloging.
Data exploration involves the finding and compiling of information from your IT infrastructure. This is a compilation of metadata from various data sources into a single environment that provides a unified view of all available data. This can be done in a variety of ways – from documenting on a sheet of paper, to compiling information in a spreadsheet, to using data exploration tool to capture and manipulate the metadata. No matter how you do it, just be sure you do it.
Data profiling and auditing takes you to the next level – looking at the actual data within the source systems to understand the data elements and anomalies. A thorough data profiling exercise will alert you to data that does not match the characteristics defined in the metadata compiled during data exploration. But, more importantly, data profiling can also tell you if the data meets your business needs. (For example, does it match the business rules that the business users define for the data?)
The goal of data cataloging is to create a single source of information that has both technical and business information about data across your entire organization. You need a development environment where data sources can be combined and rationalized: a place where you can group data sources into projects to allow you to work across your data sources and develop a consistent environment for manipulating your data and executing your data-intensive applications. The data cataloging phase is also an opportunity to add metadata – especially business metadata – about your data sources.
Cataloging the data and creating the data dictionary creates one place that you can go to see all the data in your organization and its value to the organization. The ultimate goal of data cataloguing is to create a data dictionary – a comprehensive environment that documents data sources, technical characteristics of data, an understanding of who is responsible for the data, the business definition of data and any special information about how the data was derived or calculated. The data dictionary is a complete document for one of your most important corporate assets – your data.
After completing step one of the data management lifecycle, you will be able to identify sources, understand the underlying formats and structures, and assess the relationships and uses of data across the organization. Now you face another challenge – taking all of these different structures and formats, data sources and data feeds, and using them to create an environment that accommodates the needs of your business. This accommodation requires consolidation and coordination, all the while concentrating on three major areas: consistency of rules, consistency of the data model that describes your organization and consistency of business processes.
During the design phase, organizations tend to make two major mistakes. First, they try to accomplish too much at once. Remember, data governance projects are based on improving a business initiative. The scope of the initial design should be just enough to satisfy the needs of that project. After a single phase is completed, the scope of the design is broadened to accommodate the needs of the next business initiative. Keep in mind, you do not design in phases – you implement in phases. Think globally, act locally.
The second mistake companies make is to leave the design phase up to the IT professionals. Just because a project is titled “data model consistency” or “business rule creation” doesn’t mean that IT makes all the decisions. In reality, IT becomes more involved in step three: enablement.
As you move from the discovery/design phases to the enablement phase, the primary responsibility shifts from the business users to the IT staff. Now that the business users have established how the data and rules should be defined, it is up to the IT staff to ensure that databases and applications adhere to the definition.
There are many enabling architectures. The method and management of enabling data in any architecture is a decision that IT has to make in order to ensure the integrity and integration into the various systems.
One of the biggest mistakes companies make in the enablement phase is to duplicate the rules and standards that came out of step two for each application or data source. This would be analogous to having the telephone but not having the phone network. Although the telephone is a great invention, it is actually the telephone network that makes it practical. Without the network, each telephone would have to be connected directly to another telephone – ultimately making the telephone a cumbersome and impractical device.
For each data source, each business process and each application that is modified to the new data definitions, you will need to understand the requirements, validate that the new integration meets the requirements and deploy the interface into production.
Applications and systems have been built over the years to meet the requirements of users. When we move to modify the landscape to provide consistent data across our organizations, we can jeopardize these requirements. Before you begin any integration project, know the requirements, such as service-level requirements imposed by both IT and the business user community. Any modifications to the systems must be done without a negative impact on the service level.
Two types of validation must be done before deploying. First, you must ensure that all service-level requirements are met. Second, you must validate that the new rules and definitions you designed are being accurately reflected. One important aspect of this is to devote adequate time to testing. It is not glamorous, but it can be the difference between success and failure. The National Institute of Standards and Technology estimates that 80 percent of development time will be spent on finding and fixing errors. Do not underestimate this testing step.
After completing these tasks, you will be ready to deploy into the new environment, knowing that one more part of your organization is now operating with consistent, accurate, and reliable data and processes.
Your data systems are like anything else that you want to continue to run effectively. You must constantly monitor the environment, checking and validating that things are working correctly. When I take my car into the shop for the standard service, more than 50 checks are performed each time. But the checking does not stop there. Computers throughout the car are constantly monitoring and recording activity.
All too often, organizations take the attitude that all the work for a system or application is in the initial development. This attitude of “do it once and you’re done” is a major obstacle to an efficiently running data management environment. A successful data management lifecycle requires vigilance and continuous care.
The moment you deploy your coordinated and consistent data across your organization, your mantra for success needs to be: monitor, report, optimize. And like everything else discussed here, this needs to be automated and validated as it enters your organization to ensure that it is meeting your rules. And those rules need to be constantly monitored to ensure they are still meeting the needs of your business.
Sounds like a lot of work, but luckily, by adopting a methodical approach to data management, you are in the best possible position to meet the needs of change.
One thing is certain in today’s information age: data will continue to pour into your organization. However, it is also important to recognize when data is no longer of interest to your organization. It is just as important to methodically retire that data when appropriate.
Archiving may be a bit of a misnomer for the final step in the data management lifecycle methodology. In reality, the process is closer to reassessing data. You may recognize that the requirements on your data have changed, but the data is still useful. These observations will come directly from the monitoring activities. In the event the data is still useful, but no longer meets your immediate needs, you need to go back to step one and start the process over again. In the event the data is no longer useful to your organization, you must be able to retire the data appropriately so that you can free up resources that are being expended on maintaining the data environment.
A methodical, step-by-step approach to managing your data is the best approach. As your business needs dictate, systems must be incorporated into the quality culture that you’ve created. Any new data, system or application must be incorporated into this data management lifecycle methodology. Only then will your data truly become a corporate asset.
Your data lifecycle process may be a bit different from the one outlined here – it may have fewer steps, or it may have more. The important thing, though, is that you have a data management lifecycle process in place and that your organization executes according to that process. Only through repetition of the lifecycle will you gain full control over your data. And with that control, you will be able to use your data to improve your organization.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access