Conventional wisdom among information architects is that solving the data quality problem is really a matter of solving the information architecture problem.  It is generally agreed that unifying the business and technology architectures of an organization is the way to provide the necessary contextual environment to solve data issues and ensure the integrity of information.  In A Practical Guide to Enterprise Architecture, James McGovern et al write that data integrity suffers from compromises in the enterprise architecture because people are pressured to produce more in less time and the proper cross-checks on the data are not performed.  Information architects agree that addressing faulty data only at the physical data level cannot work because the data layer does not capture the requisite semantics to accurately understand data spanning business processes.  
More than 25 years ago, Steven Spewak made the case for architecting a stable business model before designing the information systems that support it.  In his enterprise architecture planning methodology, Spewak provided guidance for implementing the top two rows of the Zachman Framework:  Scope (Planner) and Business Model (Owner).  EAP was innovative for its data-driven approach emphasizing data dependencies being defined before system implementation and the order of implementation activities based on the data dependencies.  Layer 3 of the EAP framework plans the future architecture, defining the data dependencies by understanding the major kinds of data needed by the business.  Application and technology architectures are then designed specifically to manage an appropriate environment for the data supporting business processes.  This planned approach defines a consistent method for centrally collecting, migrating and storing data, leading to improved data quality by making enterprise information accessible and timely for any business need, all with commensurate cost reductions and efficiencies.
One of the modern implementations of the EAP framework is the Federal Enterprise Architecture, an initiative that aims to provide a common methodology for IT acquisition in the federal government.  The primary purpose of the FEA is to identify opportunities to simplify processes and unify work within similar lines of business by developing a common taxonomy for describing data and IT resources.  Federal agencies have adopted the guidance laid out in the five FEA reference models to build out their corresponding architecture layers (performance, business, data, application and services, and technology). This guidance is intended to help establish a well-architected enterprise business model and the data dependencies to guide business and IT modernization efforts.  
Yet even by adopting these and similar EAP-like approaches, federal agencies and other large organizations continue to experience data quality problems:  overlapping data within functional areas, costly interfaces between incongruent systems, reengineered modes of information sharing not working as planned and new system development contributing more of the same.  It has now become clear that despite a quarter century of advice on better business architecture and data-driven approaches, “information politics” and other factors inhibit coordinated approaches to building out the enterprise frameworks required to transition organizations to the Information Age. 
Given this reality, it might be time to rethink the notion that effective information architecture development will solve the data quality problem.  In recent years, a handful of large organizations and federal agencies have established mature data quality practices, and it is now possible to see that the impact of these initiatives goes far beyond data management and information exchange improvements.  Whereas a data quality improvement program’s principal goal is to identify and standardize the quality of performance-related data - reassuring data consumers of the credibility of information upon which they base their decisions - a byproduct of the practice is the establishment of new, enterprise-wide practices that can be leveraged for other organizational initiatives, most notably enterprise architecture.  This is, then, a somewhat radical bottom-up approach, viewing data quality improvement as a key enabler of effective information architecture development rather than the other way around.
Data quality principles and initiatives can enable better delivery of integrated services, the cornerstone of the FEA goals.  In the sections that follow, the layers of the FEA and value proposition of the DQIP are briefly explained in reference to how data quality provides a foundation to build an organization’s enterprise architecture. 

DQIP and the Performance Reference Model

The FEA performance reference model is a standardized framework to measure the performance of major IT investments and their contribution to program performance.  By utilizing a number of existing approaches to performance measurement, the PRM identifies performance improvement opportunities that span traditional organizational structures and boundaries.  Data quality initiatives can lay the groundwork for PRM development by discovering the systems and data most responsible for high-priority business performance reporting.  The DQIP certifies data tied to performance at rigorous data quality standards, giving the organization confidence that the data is fit for use and will contribute to program performance across the organization.  
Data quality’s information value cost chain supports the difficult task of estimating the value of IT investments.  This process maps performance data’s complete lifecycle to include the logistics of their creation and the steps of their transformation into “finished” information products.  The process also includes detailed descriptions of the servicing of the data (their maintenance as well as support to customers using the data).  Costs are attached to the data at each stage of its lifecycle.  These costs can then be compared against the real and intrinsic value of the data to support the organization’s adherence to a five-year strategic plan or other key objective.  Information products that do not yield a profit (i.e., their costs of production and maintenance over their lifecycle exceed their value to the organization’s bottom line) would be prime targets for reprocessing.  
Effectively reporting an organization’s performance goals and objectives may also require developing new data systems and retiring old ones (legacy systems).  This is not easy to accomplish, due to the entrenched interests stakeholders have in holding onto their legacy data and not sharing it with the enterprise.  At least the DQIP can help organizations make more informed choices between the old and the new by identifying where definitive performance information on the accomplishment of enterprise-wide program goals exists, and by documenting the costs of data input, data handling and data transfer for all the alternative performance-reporting “pathways” in which the organization might have a stake.  

DQIP and the Business Reference Model

The business reference model provides a framework that facilitates a functional (rather than organizational) view of the organization’s lines of business, including its internal operations and its services for citizens, independent of the program offices that perform them.  Data quality initiatives support the BRM framework by encouraging data originators and data consumers who are integral to the smooth functioning of their line of business to become more involved in the business context and conditions.  Data archeology (discovering data through forensics), data cleansing (correcting bad data), data quality enforcement (preventing data defects at the source) and knowledge of authoritative data sources are, after all, business objectives requiring the integration of technical people with businesspeople.  Business management should be in alignment with proven quality management practices, and the DQIP can support these practices.
The DQIP establishes data governance groups (in many cases for the first time in the organization’s history) staffed with internal data administrators metadata administrators, and data stewards.  These stewards are then in place to act as the subject matter experts for developing the organization’s enterprise architecture – they will be called upon for their knowledge of enterprise data, information sharing and ability to make the bridge between the technology and business levels.  It is important to recognize that existing organizational business processes could already have in place some form of quality checks and balances, regardless of whether or not these are termed data quality.  Therefore, formal data quality initiatives can learn about and tap into existing quality processes, ultimately strengthening their reach and effect throughout the enterprise.

DQIP and the Service Component Reference Model

The service component reference model supports the discovery of government-wide business and application service components in IT investments and assets.  The SRM is structured across horizontal and vertical service domains that, independent of the business functions, can leverage the reuse of applications, application capabilities, components and business services.
Data quality initiatives aid an agency’s desire to automate customer service and business management services by focusing the creation of clean, high-quality data at the source.  Data quality deployments in large organizations are increasingly addressing the data in systems that drive day-to-day operations at the agency.  The shift to “operational” data quality also increases the need for the data quality environment to interoperate with the overall enterprise IT environment in a seamless, service-oriented manner.  As organizations increasingly turn to service-oriented architecture, they are seeking data quality solutions that can deliver data services adaptively and on the fly: via multiple protocols and platforms that are easily consumable by a wide variety of downstream needs.

DQIP and the Data Reference Model 

The data reference model categorizes enterprise information into greater levels of detail.  It also 1) establishes a classification for enterprise data, 2) streamlines information exchange processes, 3) identifies duplicative data resources and 4) describes artifacts generated from segment data architectures.  The DQIP’s key contribution to DRM development is the promotion of enterprise data standards by publishing the artifacts and general cultural effects of data quality improvement organization-wide.  Data quality initiatives ferret out inconsistencies in data naming standards across the enterprise and begin the journey toward the application of consistent standards.  Once established across the enterprise, data standards can better support data integrity, accuracy and objectivity.
Outputs of data quality initiatives often become the basis for building an organization’s first enterprise metadata repository, wherein the DQIP’s inputs and outputs are inventoried and stored.  The types of metadata captured through the DQIP can be business metadata (business functions, processes, entities, attributes and rules, technical metadata (physical architectural components, such as programs, models, schemas, scripts, databases, tables, columns, keys and indices), process metadata (program logic that manipulates data during data capture, data movement or data retrieval), or usage metadata (statistical information about how systems are used by business people). The EMDR is an essential tool for standardizing data, for managing and enforcing data standards, and reducing the amount of rework performed by developers who are not aware of what already exists and therefore lack the knowledge for reusing architectural components.

DQIP and the Technical Reference Model 

The technical reference model is a component-driven, technical framework used to categorize the standards, specifications and technologies that support the delivery of service components.  It provides a foundation to categorize these technologies to support the exchange of business and application components (service components) that may be used and leveraged in a component-based or SOA environment. 
Data quality initiatives evaluate production databases based on certain quantitative and information-preserving transformation measures, such as data integrity, normalization and performance.  In large distributed systems, many decisions are made at design time based on the need for improved performance, but the tools for capturing a system’s performance measurement at run time (performance metadata) and for using this information to adaptively configure the system are often missing.  Database performance assessments, which are part of the DQIP, can provide those systems with the performance metadata they will need to become optimized after they have gone into production.  These assessments support the TRM’s goal of optimizing service platform and infrastructure, component framework, and service interface and integration. 
Enterprise architecture puts a face on an organization’s strategic plan, facilitates change, sets interoperability standards, coordinates technical investments and encourages the selection of proven technologies.  Yet achieving a successful EA implementation that untangles and makes sense of myriad data is a daunting task for any organization.  Organizations can get the most out of their EA initiatives by leveraging best practices and processes arising from a well-structured DQIP.  The fruits of a successful DQIP do not deserve to gather dust on some shelf in a corner of the organization, but can be used to jumpstart an organization’s EA reference model aligned with the organization’s business and information segments. Data quality outcomes provide the foundation for creating an architecture from all perspectives to successfully transition to the Information Age. 

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access