Meta data plays a vital role in the data quality process. A simple example will prove my point. Given the following list of values, which rows are valid?
The answer, of course, is that you don’t know. You can’t know data without context has no meaning. The more information (or meta data1) you are given about Field-1, the more value you can derive from the list of values. However, when you are provided with little or no meta data, not only do you not know if the data is valid, you don’t know enough about the data to reliably use it. Let me ask a more difficult question. In your organization, what or who is a customer? Most organizations I have seen struggle with this seemingly simple question. For example, a pharmaceutical firm deals with several parties which influence the sale of a drug product. Is their customer the person who consumes the drug product? Is it the doctor (or other qualified health professional) who writes the prescription? Is it the insurance company that manages the formulary indicating which drugs it will pay for? In those cases where a hospital supplies a patient with a drug product, is the hospital considered the customer? You begin to see my point. If I gave you a list of names and addresses, you couldn’t begin to determine which ones are valid customers without a clear definition of who a customer is.
When confronted with the question, "How good is your data?" one can see that the evaluation of data quality is a function both of the data and its corresponding Meta data. There is also corollary hidden in this statement. When attempting to determine the level of data integrity in a set of data, an indispensable will be meta data. Going through the process will yield characteristics about the data that, when validated, can either be mindlessly discarded or can be saved as meta data and subsequently used to add value during other business processes.
Tasks or Functions of Data Analysis During Data Migration
Business intelligence (BI), customer relationship management (CRM) and enterprise application integration (EAI) are all project initiatives that require data analysis (also known as data archeology or data survey) to be an activity scheduled in the early stages of the project. But, it is at this critical juncture that an opportunity exists that can reduce the overall project risk associated with not understanding the source data’s definitions and quality levels. Subsequently, surveys conducted by research firms such as Gartner and Meta Group indicate that these are primary reasons for project failure.
What activities should you conduct during data analysis to increase the chances for project success? At a minimum, there are four:
- Data requirements The first activity of any data migration project is to perform data requirements. Specifically, the team must determine what data is needed for the migration effort. Databases, tables and files of interest must be identified as well as individual fields and columns.
- Data analysis It is during this stage that the migration team must gather any and all meta data that exists2. The data analysis process allows the team to review the meta data and to refine the list of needed data with the SMEs.
Data quality assessment Most data analysts I have met insist on working with the data at this point in their analysis. Most like to browse through the data to get a sense of each field. Helpful as this task may be, it is often skipped because getting to the data may be no easy task. A place must be found to store it, security and privileges must be established, data must be converted, etc. I prefer to schedule a separate project activity called a data quality assessment (or data survey) that will yield a statistical analysis of the data’s quality.
- Subject matter expert validation If a data quality assessment is performed, it will inevitably lead to inconsistencies between the data values and the meta data. The SME must be scheduled to review the results of the assessment to determine if the anomalies are the result of poor data or erroneous meta data.
The Meta Data Supply Chain
Each step of the data analysis yields deliverables that are used as input to the next sequential step. Each deliverable is, in fact, a piece of meta data that should neither be discarded nor filed away where others can not benefit from it. As figure 2 indicates, the information included in the meta data package includes the following:
- The field name. Collectively, this consists of an inventory of all of the fields that are candidates for inclusion into the data migration.
- Meta data collected during the requirements and data analysis activities. This is typically high-level legacy meta data from relational DBMS DDL, Cobol Copybooks or system specifications. Its purpose is to provide an understanding of how the data store is structured (i.e., If it is a VSAM file, are there "redefines" or "occurs" clauses in the source?).
Definitions, format, size, domain values, key indicator and other business rules. I believe that even if the quality or timeliness of the meta data is in question, it should be used, validated and updated. There is useful information here and it provides the analyst and SME with a starting point. Occasionally, I hear data analyst’s complain that meta data gathered in this way is bound to be inaccurate or obsolete, so why bother? Like other enterprise architecture planners3, I believe a corporation’s data architecture is its most stable architectural component (i.e., its entities, attributes, etc. and their relationships change infrequently). If the meta data, it will provide relevant information about the current data. If the meta data is aged, then it becomes a valuable piece of intelligence as the analyst performs data archeology (to ascertain the characteristics of the data as they used to be while trying to understand existing data anomalies).
- Statistical reports and scorecards from the data quality assessment. These reports provide details about the domain sets and frequencies found in each field (except for free for text fields).
- Recommendations for transforming and editing the data during the transformation process. During the process of reviewing the data and meta data, rules for transforming data begin to emerge (e.g., if two sets a values exist for a field, then the team can facilitate an agreement for one or the other).
- Additions, corrections and comments from the SME. The data quality assessment is a process of reconciling the data to its meta data. During that process, the SME will make additions, corrections and amendments to the meta data.
- Comments or recommendations about data cleansing. A good SME can make or break the overall initiative. In my most successful engagement, the SME has the data cleansed while going through the validation process. In a situation where the data could not be cleansed, the source system owners would be given a heads up to the problem.
The Role of Knowledge Management
Once all of this meta data is collected and validated, it can be used to benefit the entire firm. You should target two user groups for the meta data supply chain the project team and the business community. If your organization has a knowledge management organization, then you will find a willing partner to store and provide a front end to the new or updated meta data.
Good meta data is a vital asset to a company trying to compete in a space where a well conceived BI, CRM or EAI initiative can provide a knockout blow to you or your competition. If you are so unfortunate as to be facing one of these initiatives with no meta data to start with, it is inevitable that you will need to discover the field-level definitions needed to understand your data and assess its quality and subsequent value. A byproduct of these types of migration efforts is accurate meta data. Don’t make the same mistake your predecessors did. Budget the time and resources to perform all aspects of the data analysis. With a little more time and effort, you could create a resource perhaps more valuable to the organization than the migration project itself.
1 For our purposes, meta data is defined as a collection of descriptions or definitions about the structure, format, size, contents, keys, indexes, etc. of its corresponding data.
2 This precludes the use of a data profiling tool like Evoke's Axio or Metagenix' Metarecon
3 As per frameworks like Information Engineering (IE) and Enterprise Architecture Planning
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access