Data validation and governance is of primary importance in any data project. It can make or destroy any project’s usability, acceptance and value derivation. I have seen many data projects not reach their full usability or value because of inadequate or incorrect importance given to validation and governance in the design, build and sustain processes of the projects. 
However, the importance of reporting and analytics becomes more and more critical due a) proximity to value derivation both in terms of type of usage and content criticality, and b) distance from the data entry points with respect to data modifications, and rules and algorithms written on the raw data entry points.

Reporting and Analytics Projects

All reporting and analytics projects have varied levels of identification and acquisition as well as integration and normalization components in them. They seldom rely on raw data without integration and normalization performed. I am not discussing the vanilla reporting projects where an online transactional processing (OLTP) code or query is written on any data base of application. Those projects are just reporting projects where a code is written and tested in the normal software development cycle. My focus is on reporting and analytics projects high in value derivation and benefits where there are substantial efforts in acquisition of data from raw data formats and significant efforts are done on normalization of the data before data is analyzed and decisions made. Figure 2 illustrates a categorization of analytics projects.

  • Low-hanging: These are typical vanilla reporting projects addressing one question or a group of questions. They rely on data in raw formats or already formatted data in the database. They are low on value delivered to customer and have least effort in development and sustainment.
  • Tech savvy: These are complex algorithms written on data sets. Uniform or raw data is used with substantial efforts in developing and testing a proper algorithm or rule in report writing. Their value relies on the accuracy of the rule developed.
  • Stragglers: These are hit-and-run analytics projects where huge benefits are derived with the least efforts. We hit these jackpots either due to the brilliance of the algorithm developed or because of better use of the existing data reports with minor modifications. A joint condition on existing analytics with other tables to take care of different business conditions falls into this category.
  • Strategic: These are high value analytics projects where there has to be concentrated effort in design, development and sustainment with a huge component of data acquisition and data normalization activities.

Different data analytics projects have varied success and failure points due to the insufficient importance given to data validation and governance on all stages of the project, starting from design to development to support and sustain. Reviewing examples of such projects helps us to understand the various pitfalls and turnarounds. 
1. Spend analytics for a Fortune 100 company (strategic): The organization charted an ambitious and high-value-generating procurement spend warehouse project bringing transactions from all the transactions systems (40 in number) with varied consistency and commonality to an analytical warehouse slicing and dicing the spend to derive future savings. Clearly the project was strategic in nature and had a huge component of data acquisition and rationalization. It took approximately eight months post go-live to validate the spend and get the concurrence from the stewards of the project that the analysis is useful. Had the validation and governance steps incorporated earlier in the project life cycle, the project could have derived the benefits much earlier.
2. Compliance reporting of regulatory information for a Fortune 100 company (strategic): The project was strategic in nature due to regulatory compliance. The emphasis of the organization was to develop the application with the right data acquisition program from the transaction systems. Testing was restricted to the correctness of data acquisition programs. Everything tested well in the technical sense, but this occurred without earmarking an effort on data validation. After go-live, substantial data validation efforts were necessary, resulting in rework of already-tested technical components of the project. Again, had the validation procedures gelled with the project development lifecycle, better returns on the capital spend of the project could have been a reality.
3. Reports on e-commerce transactions of a major CPG manufacturer (low-hanging): The organization did not have a uniform view of the e-procurement and e-payment gateway transactions resulting in maverick spend and transactions on the Web. The project was to consolidate the transactions and spend to a common e-procurement and e-payment gateway and to develop standard reports and analysis on the chosen application. The project efforts were focused on data acquisition and adoption into one common system and leveraging existing reports. Data validation was not a stumbling block in the success of the project as once standardized, the existing reports delivered value.

4. Developing the analytics of the consumer behavior for the midsized credit card company (tech savvy): The requirement of the company was to analyze the huge amount of credit card transactions with complex algorithms to analyze consumer behavior. The ultimate use of the reports and the value delivered was in analysis of different variables to develop algorithms and statistical models deriving different facets of consumer behavior. Success of the project was dependent on the regression testing of the models developed with different cross sections of data and modifying the algorithms to ensure confidence levels which was in line with the normal technical object testing. However, the organization failed to do data validation in the sustain mode for the upkeep of the algorithms with the new data patterns. This resulted in major rework effort which could have been avoided if a proper data validation and stewardship was established initially for the sustain environment post go-live.

The importance of data validation and ensuring that a proper process is in place to champion this varies across different types of reporting and analytics projects. As we go to the strategic quadrant, organizations should embark upon serious data validation techniques to ensure longevity and acceptance of the project. 

Data Validation and Governance

Although the process looks to be very easy, I have seen this jargon misunderstood in content and process in many organizations. Probably the reason is that the steps and process to do validation varies with different project types. The attempt here is to arrive at guiding principles to design and run the data validation process.
The amount of data validation to be done for successful analytics projects is based upon the following factors. If the intensity of the following factors is high, we need to have innovative and special data validation steps included before the project can be deemed successful. If the intensity of these factors is low, traditional testing procedures of the technical object and signoff are sufficient to make the project use worthy.

  • Intensity of data rationalization and normalization.
  • Intensity of data acquisition and diversity in data entry points.
  • Intensity and importance of derived decisions.

Integrating Data Validation and Governance to the Normal Project Cycle

Having understood the categorization of analytics projects and the importance of validation and governance of the high-value analytics projects, let us see how best we can do this in the normal project execution cycle. Although there might be some instances where a focused standalone effort for data validation can bring benefits, it is better to integrate these activities into the normal project management cycle. This is not to undermine the efforts and focus we have put into this process but to effectively manage these activities by the project team. 
Business and IT teams in all companies might have mature project management and control practices. My effort is to align and integrate with that process to maximize the benefits.  Design/prototype/ blueprint phase. During this phase, focus will be on the problem statement, solution discovery, team discovery and delivery guidelines including planning. However, the following factors regarding validation and governance have to be finalized during this phase:

  1. Validation guidelines (goal statement) from the sponsor community: This includes a statement from the sponsor community such as 80 percent of the spend to be captured, forecasting volumes of the 50 biggest customers need to be accurate by 90 percent , Top regulatory items must be 100 percent compliant, etc.
  2. Identify members the team including a validation champion and validators from all interest groups.
  3. Include and capture the validation governance time lines in the high-level project plan.
  4. High-level goal statement of the governance process post go-live.

The realization phase. This includes development, unit testing and integration testing. During this phase of the project, validation guidelines have to be constantly used in fine-tuning design and development. Also it might be worthwhile to develop separate analytical reports to aid data validation. Successful organizations have integrated this into the normal project cycle in the following ways.

  1. The upstream design and development of technical objects related to extraction and normalization have to be fine-tuned, taking care of the validation inputs. 
  2. Unit testing of the technical objects should include scripts to validate the contribution of the data validation goals. We might not be able to validate these goals in unit testing. However, the test scripts should align to the validation goal. If the goal is to have 90 percent accuracy in the top 20 customers’ sales, the unit test of technical objects should have a way to validate the sales figures from the transaction tables. Accuracy of 90 percent can be achieved only if we have 95 to 100 percent accuracy with respect to the transaction tables.
  3. During integration testing, data validators have to be heavily involved in validating the data (key fields and attributes) and project sponsors validating the validation goals in the test scripts.
  4. During error resolution, the test script results from data validation have to be taken into account for redesign or redevelopment efforts. Retesting of technical objects has to be followed with the data validation retest.

Go-live phase. This includes user acceptance testing and actual cut-over/go-live preparation activities.

  1. User acceptance testing needs to have complete signoff on the data validation test results.
  2. All open points need to be captured as post go-live data governance process and procedures.
  3. All the sustain data governance and periodic validation guidelines have to be finalized with clear ownership and accountability.

Successful high-value analytics will be able to use these guidelines to integrate steps into the project development cycle. Data governance procedures must be in place for ongoing maintenance of the projects. There are instances when the whole project capital costs were recovered in three months due to the value delivered from projects which have carefully done this. On the other hand, some organizations grappled for two years after go-live, reworking and fine-tuning the design to take care of the validation and governance gaps in their projects. A well-conceived and planned validation and governance process can deliver the intended results faster, bigger and better in a less painful and more streamlined way. This will instill confidence for the business to go for more ambitious analytics projects.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access