We must manage our data in the same fashion that we manage any process – with a defined, predictable methodology. This requires a data management lifecycle methodology that helps us understand how to manage, monitor and maintain our data to benefit our business.
The data management lifecycle has five steps: discover, design, enable, maintain and archive. Each step is integral to creating a strong business foundation.
Discover
One of the biggest mistakes that most organizations make is that they do not properly understand their data. A quick inspection of your data would probably find that it resides in many different databases, managed by many different systems, with many different formats and representations of the same data. Data can be useful only if you understand where it is, what it means to your organization and how it relates to other data in your organization. Discovery allows you to understand just that.
Data discovery is essentially about the beginning. It is a fundamental but often overlooked step that should begin every initiative that involves data. Every enterprise resource planning implementation, every customer relationship management deployment, every data warehouse development, every data migration or consolidation effort, and every applications rewrite should start with data discovery.
Estimates for ERP and data warehouse implementation failures or severe cost overruns often run as high as 65 to 75 percent. In almost every instance, the failures are due to a fundamental misunderstanding about the quality, meaning or completeness of the data that is essential to the initiative. These are the problems that should be identified and corrected prior to the beginning of the project. Data discovery is the set of activities that identify data, determine its role in the organization, and uncover problems, inconsistencies, inaccuracies and generally unreliable data.
Data discovery has several components to it, and each prepares you for your data initiatives: data exploration, data profiling and auditing, and data cataloging.
Data exploration involves the finding and compiling of information from your IT infrastructure. This is a compilation of metadata from various data sources into a single environment that provides a unified view of all available data. This can be done in a variety of ways – from documenting on a sheet of paper, to compiling information in a spreadsheet, to using data exploration tool to capture and manipulate the metadata. No matter how you do it, just be sure you do it.
Data profiling and auditing takes you to the next level – looking at the actual data within the source systems to understand the data elements and anomalies. A thorough data profiling exercise will alert you to data that does not match the characteristics defined in the metadata compiled during data exploration. But, more importantly, data profiling can also tell you if the data meets your business needs. (For example, does it match the business rules that the business users define for the data?)
The goal of data cataloging is to create a single source of information that has both technical and business information about data across your entire organization. You need a development environment where data sources can be combined and rationalized: a place where you can group data sources into projects to allow you to work across your data sources and develop a consistent environment for manipulating your data and executing your data-intensive applications. The data cataloging phase is also an opportunity to add metadata – especially business metadata – about your data sources.
Cataloging the data and creating the data dictionary creates one place that you can go to see all the data in your organization and its value to the organization. The ultimate goal of data cataloguing is to create a data dictionary – a comprehensive environment that documents data sources, technical characteristics of data, an understanding of who is responsible for the data, the business definition of data and any special information about how the data was derived or calculated. The data dictionary is a complete document for one of your most important corporate assets – your data.
Design
After completing step one of the data management lifecycle, you will be able to identify sources, understand the underlying formats and structures, and assess the relationships and uses of data across the organization. Now you face another challenge – taking all of these different structures and formats, data sources and data feeds, and using them to create an environment that accommodates the needs of your business. This accommodation requires consolidation and coordination, all the while concentrating on three major areas: consistency of rules, consistency of the data model that describes your organization and consistency of business processes.
During the design phase, organizations tend to make two major mistakes. First, they try to accomplish too much at once. Remember, data governance projects are based on improving a business initiative. The scope of the initial design should be just enough to satisfy the needs of that project. After a single phase is completed, the scope of the design is broadened to accommodate the needs of the next business initiative. Keep in mind, you do not design in phases – you implement in phases. Think globally, act locally.
The second mistake companies make is to leave the design phase up to the IT professionals. Just because a project is titled “data model consistency” or “business rule creation” doesn’t mean that IT makes all the decisions. In reality, IT becomes more involved in step three: enablement.










Be the first to comment on this post using the section below.